My research, in broad terms, engages with concepts that defy easy definition but
embody the shared struggle of these large communities: Computation, Learning, and Cognition.
Examples of "elusive concepts" include: Generalization, Abstraction, and Reasoning.
The research landscape on the nature of these concepts tends to be tightly interwoven.
What further complicates the picture is the learning component: generalization, abstraction and reasoning are not enough.
I tend to focus on learning to generalize, learning to abstract, and learning to reason.
The challenge lies in that building the computational foundation of each one of them depends on one another.
Because we are living in a nascent stage of this field, substantial effort on formalization, quantification, categorization and unification is needed.
This is why I'm so intellectually invested in these areas and want to dedicate a career to them 🌟.
Important Decision: I will write my thesis with zero occurrence of the word “reasoning”.
I settled on a title for my thesis:
Shaping Generalization in Deep Learning: Data, Bias, and Paradigm
1. Mathematically understand machine reasoning
✧ Role-filler binding with learned roles in neural networks.
✧ Function induction from input-output pairs.
2. Mechanistically understand machine reasoning.
E.g. How do Transformers:
✧ form circuits that accomplish tasks sequentially or in parallel?
✧ (approximately) implement and execute memory?
✧ treat functions differently from handling primitive concepts?
I'm interested in mechanistic interpretability of how neural networks perform multimodal reasoning/grounding. Topics: Transformer circuits, how composing circuits develop, multimodal neurons, modality gap.
Key to my research goal is answering the following questions:
1. How to endow machines with robust skills that can be compositionally built up to achieve systematic generalization?
2. What appropriate roles can language play in (1)?
We can draw inspiration from how humans use language to:
✧ acquire knowledge (language as instruction, human → computer)
✧ externalize thoughts (language as explanation, computer → human)
✧ exchange information (language as communication, computer → computer)
Key to my research goal is answering the following questions:
1. How to endow machines with reusable skills that can be compositionally built up to achieve systematic generalization?
2. What appropriate roles can language play in (1)?
We can draw inspiration from how humans use language to:
✧ acquire knowledge (language as instruction, human → computer)
✧ externalize thoughts (language as explanation, computer → human)
✧ exchange information (language as communication, computer → computer)