This is much less well-written and thus needing a lot of polishing. It is okay if this leaves an impression that my mind is a circus of ridiculous thoughts.
To rephrase the title, I haven't figured out the right way to think about the question of whether learning is computation.
What is learning? The first time I asked this daunting question was Dec 2024. Quickly, I found a satisfying answer from Tom Mitchell's book.
But today this question resurfaced.
To provide a bit more context, I watched a P vs NP lecture by Professor Michael Sipser.
I saw it instantly that, for theoretical computer scientists, the very problem they study was a projection of how they live their own professional lives.
Theorem proving is a computational problem.
Then proving P≠NP is computational problem.
If this computational problem is one that requires irreducible search, then don't all theoretical computer scientists have to wait until the end of universe to see the answer?
Many of them must be unwilling to accept this. Then they must reflect upon how they search for solutions to a problem, and why their search is much better than brute-force. Reflecting upon how themselves search will inform on the search problems they study.
Part I: Good Heuristics
One might say scientists search much faster because they have very good heuristics.
Then the question becomes one about how they acquired the heuristics. If we incorporate the developmental/training time of a scientist, then that is still much less time complexity than the lifespan of the universe.
But there are many really hard theorem-proving problems successfully figured out by mathematicians. Did mathematicians accomplish them out of pure luck? Or did their heuristics matter a lot? If there weren't a big mistake with abstracting the developmental/training process of a scientist as performing computations, then, we should be able to program a computer to acquire useful heuristics in a simulated developmental/training process and solve the hard search problems. But all we know is that the conclusion describes something really hard. AI scientists already taste how hard it is. But being hard doesn't mean being impossible, or does it? Let's suppose the conclusion is true, i.e. “we should be able to …” is true. Then why do AI scientists struggle so much at doing so? Where did they go wrong? 🤯🤯
Now let's switch to the opposite side, in which the conclusion is false, i.e. “it is impossible to program a computer to …”. Then it must follow that the premise is false (cuz the deduction step is valid). That the premise is false translates to “there is a big mistake with abstracting the developmental/training process of a scientist as performing computations”. Then how can this mistake be articulated? What part of the developmental/training process of a scientist is fundamentally distinct from computation? 🤯🤯
It appears that the discussion so far has been hitting “deadends”, raising questions that are so profound and scary.
Part II: A Non-vanishing Volume of Needles
There is another way of thinking about explanations for why mathematicians can accomplish endeavors that may require a time complexity more than the lifespan of the universe if brute-force search is adopted —— the number of valid proofs can be numerous.
So there is not just one needle in a haystack, there are many needles scattered in a valley, and you can find one as long as you "stick around for a bit". From here, I am not sure how to proceed. Because I don't know whether there exists a theorem like "all valid proofs of a theorem are essentially the same", or equivalently "once you found one valid proof to a theorem, you automatically find all other valid proofs". (Such a theorem may or may not assume a fixed language in which proofs are written. But I shall tentatively assume that the language doesn't matter, because there might be a "universal language" and reducing theorem-proving from other languages to this universal language doesn't change the complexity class.) If such theorems do exist, then the fact that there are many valid proofs brings zero help. We must (frustratingly) go back to pondering about the two “deadends” in PART I.
If such theorems do not exist, then perhaps the fact that there are many valid proofs will actually make a difference. And one might be interested in what proportion of the search space is occupied by needles. If the distribution and/or the total volume of needles satisfy a certain property, the search problem will be actually P. Then no wonder many mathematicians with the right amount of developmental/training process, aided by a bit of correct deduction, reached the needles they were looking for.
Statement: the search problem faced by mathematicians in theorem-proving differs from the search problem faced by a computer-based theorem prover.
I can see a potential argument for why this statement is true: A mathematician is simultaneously learning about the landscape the problem lies in, and trying out attempts. But a computer-based theorem prover (at least those implementing brute-force search), does not learn from past attempts and does not remember anything from its past attempts. Learning from past attempts is an elusive task, because apparently remembering all the past traces does not work. One has to “make sense” of past experiences. Then what's involved in “making sense” and why does it help transcending the complexity class of brute-force search?
I apologize for my bad wording. What I'm getting at are the nature of learning and whether learning = computation. In the deep learning case, by definition, learning = computation. Then what might give a deep learning model the advantage over brute-force search? Right, data, objective, and the “inductive biases” which could have numerous sources. In this regard, the aforementioned statement invites thinking about how mathematicians and computer-based theorem-provers differ in their data, objective, and inductive biases. And the path ahead for AI scientists lies exactly in finding the good data, objective, and inductive biases.
Daring to cast doubt on this argument may leave us staring down a scary possibility that learning ≠ computation, where we would have no choice but enduring the necessary pain required to characterize their difference.