Big questions that puzzle me:
1. What are hard to learn?
2. Why are they hard? (quantify their complexity)
3. To what extent "learning = computation" is true?
4. To what extent "cognition = computation" is true?
My research, in broad terms, engages with concepts that defy easy definition but
embody the shared struggle of these large communities: Computation, Learning, and Cognition.
Examples of "elusive concepts" include: Generalization, Abstraction, and Reasoning.
The research landscape on the nature of these concepts tends to be tightly interwoven.
What further complicates the picture is the learning component: generalization, abstraction and reasoning are not enough.
I tend to focus on learning to generalize, learning to abstract, and learning to reason.
The challenge lies in that building the computational foundation of each one of them depends on one another.
Because we are living in a nascent stage of this field, substantial effort on formalization, quantification, categorization and unification is needed.
This is why I'm so intellectually invested in these areas and want to dedicate a career to them 🌟.
My PhD thesis examines factors shaping generalization in deep learning, spanning individual components (data, architecture) to the organizational level of learning (the learning paradigm). work studies generalization to unseen domains in deep learning, where an unseen domain is a collection of instances systematically unsupported by training data. I contribute to an expanding collection of insights on this subject from multiple angles: categorization, formalization, and identification of performance indicators. My work is structured around three concepts: Composition, Cardinality, and Frame. A series of investigations uncovered three factors that shape generalization in deep learning: data, architectural bias, and the learning paradigm.
Important Decision: I will write my thesis with zero occurrence of the word “reasoning”.
I settled on a title for my thesis:
Shaping Generalization in Deep Learning: Data, Bias, and Paradigm
1. Mathematically understand machine reasoning
✧ Role-filler binding with learned roles in neural networks.
✧ Function induction from input-output pairs.
2. Mechanistically understand machine reasoning.
E.g. How do Transformers:
✧ form circuits that accomplish tasks sequentially or in parallel?
✧ (approximately) implement and execute memory?
✧ treat functions differently from handling primitive concepts?
I'm interested in mechanistic interpretability of how neural networks perform multimodal reasoning/grounding. Topics: Transformer circuits, how composing circuits develop, multimodal neurons, modality gap.
Key to my research goal is answering the following questions:
1. How to endow machines with robust skills that can be compositionally built up to achieve systematic generalization?
2. What appropriate roles can language play in (1)?
We can draw inspiration from how humans use language to:
✧ acquire knowledge (language as instruction, human → computer)
✧ externalize thoughts (language as explanation, computer → human)
✧ exchange information (language as communication, computer → computer)
Key to my research goal is answering the following questions:
1. How to endow machines with reusable skills that can be compositionally built up to achieve systematic generalization?
2. What appropriate roles can language play in (1)?
We can draw inspiration from how humans use language to:
✧ acquire knowledge (language as instruction, human → computer)
✧ externalize thoughts (language as explanation, computer → human)
✧ exchange information (language as communication, computer → computer)