Greetings! I am a 4th year PhD student at the Language Technology Institute of Carnegie Mellon University. I am very fortunate to be advised by Prof. Yonatan Bisk. I received my Bachelor's degree in Computer Science & Mathematics with first class honors from Hong Kong University of Science and Technology.
Research Keywords: Algorithmic Reasoning, Mechanistic Interpretability, Generalization
- Mathematically understand machine reasoning
- Role-filler binding with learned roles in neural networks.
- Algorithm induction from input-output pairs.
- Mechanistically understand machine reasoning
- How Transformers form circuits that accomplish tasks sequentially or in parallel
- How Transformers (approximately) implement and execute memory.
- How Transformers treat functions differently from handling primitive concepts.
✨ I'm actively looking for collaborators on mechanistically understanding Transformers' expressivity and their limits. ✨
Publications
Yingshan Chang and Yonatan Bisk. "Language Models Need Inductive Biases to Count Inductively" arXiv:2405.20131 (under review)
Jimin Sun, So Yeon Min, Yingshan Chang, Yonatan Bisk "Tools Fail: Detecting Silent Errors in Faulty Tools" EMNLP 2024
Shaurya Dewan, Rushikesh Zawar, Prakanshul Saxena, Yingshan Chang, Andrew Luo, Yonatan Bisk. "DiffusionPID: Interpreting Diffusion via Partial Information Decomposition" Neurips 2024
Yingshan Chang, Yasi Zhang, Zhiyuan Fang, Yingnian Wu, Yonatan Bisk, Feng Gao. "Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation" European Conference on Computer Vision (ECCV) 2024.
Akter, Syeda Nahida, Sangwu Lee, Yingshan Chang, Yonatan Bisk and Eric Nyberg. “VISREAS: Complex Visual Reasoning with Unanswerable Questions” In Findings of the Association for Computational Linguistics: ACL 2024.
Liangke Gui, Yingshan Chang, Qiuyuan Huang, Subhojit Som, Alexander G Hauptmann, Jianfeng Gao and Yonatan Bisk. “Training Vision-Language Transformers from Captions” In Transactions on Machine Learning Research, pp. 2835-8856. 2023.
Yingshan Chang, Mridu Narang, Hisami Suzuki, Guihong Cao, Jianfeng Gao, and Yonatan Bisk. “Webqa: Multihop and Multimodal QA” In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16495-16504. 2022. Oral.
Yingshan Chang, and Yonatan Bisk. “WebQA: A Multimodal Multihop NeurIPS Challenge” In NeurIPS 2021 Competitions and Demonstrations Track, pp. 232-245. PMLR, 2022.
Current Research
1. What architectural innovations address the dispersion effect of softmax in self-attention, making it friendlier to length extrapolation?
2. Can we enable extrapolation by letting the model "adjust its reference frame" dynamically? Taking reference from adaptive contrast in the retina.
Previous Research
Language Models Need Inductive Biases to Count Inductively
×
- ▸ We define counting as the ability to map a number word to the cardinality of a set containing a corresponding number of items.
- ▸ The inductive counting principle: If a word in an ordered number word list refers to sets with cardinality n, then the next word refers to sets with cardinality n + 1.
- ▸ This work provides extensive empirical results on training language models to count. We experiment with architectures ranging from RNNs, Transformers, State-Space Models and RWKV.
- ▸ Transformers have to rely on positional embeddings (PEs) to count out-of-domain.
- ▸ Detailed analysis isolates the inductive biases useful for counting that are enabled by PEs and not otherwise encoded in self-attention.
- ▸ Modern RNNs largely underperform traditional RNNs in generalizing counting inductively, revealing a potential downside of their lauded parallel training.
- ▸ Under review
- ▸ Paper
- ▸ Talk (@FlaNN)
- ▸ Github
Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation
×
- ▸ Why do text-to-image models fail to learn entity-relation compositions effectively?
- ▸ We conceptualize the text-to-image generation pipeline into three stages and examine the potential sources of error at each stage.
- ▸ We argue that the middle stage -- communication channel -- is the most error-prone, leading to issues of faithfully composing entities with relations.
- ▸ We believe a less performant communication channel is caused by skews in the underlying phenomenological coverage presented by the training data.
- ▸ We propose two statistical metrics to quantify such phenomenological skews, under the formal framework of role-filler bindings.
- ▸ We conduct experiments showing that our metrics are predictive of generalization performance.
- ▸ Accepted to Eccv 2024
- ▸ Paper
- ▸ Slides
- ▸ Video
- ▸ Github
Efficient Visual Grounding via Patch Affinities
×
- ▸ We ask the research question: Are token and patch representations alone sufficient to perform visual reasoning?
- ▸ We show that a single Transformer backbone can more efficiently perform language-conditioned visual reasoning than bespoke architectures.
- ▸ We show that well-aligned patch-token representations naturally translate to downstream performance without unimodal backbones or output-specific components.
- ▸ We use Referring Expression Comprehension & Segmentation as case studies and demonstrate how patch affinity scores can be leveraged to achieve competitive recognition and segmentation with an order to magnitude fewer parameters and inference time.
- ▸ Joint work with Liangke Gui on efficient vision-language pretraining.
- ▸ Accepted to TMLR 2023
- ▸ Paper
WebQA: Multihop and Multimodal QA
×
- We curated a dataset for open-domain, multihop and multimodal question answering.
Then, We evaluated the extent to which models can answer open-domain web search queries leveraging both the explicit knowledge in retrieved sources and the implicit knowledge in pre-trained parameters.
- ▸ Crowdsourced a dataset with knowledge-seeking QA pairs and multimodal (image+snippets) knowledge sources.
- ▸ Mined hard negatives which have high lexical overlap with the question or positive sources, while lacking reference to the answer.
- ▸ Adversarially created the train/test split such that the majority answers concluded from the training set cannot carry over to testing, thus suppressing purely statistical approaches.
- ▸ Implemented baseline models for WebQA under both fine-tuning (finetune a vision-and-language Transformer) and few-shot (prompt GPT-3 with engineered prefixes) settings.
- ▸ Designed a metric for WebQA that measures both fluency and accuracy, and is hard to game by guessing a long list of entities.
- ▸ Accepted to NeurIPS 2021 Competition Track.
- ▸ Accepted to CVPR2022 (Oral)
- ▸ Paper
- ▸ Leaderboard
- ▸ Website
Low-Light Video Enhancement Using Deep Learning
×
- This project explored possibilities of reducing noises and motion blurs in low-light RGB video frames using deep learning.
Our contributions are three-folds:
- 1. A novel approach to collect data of dark and blurry video frames with corresponding bright and sharp ground-truth images. Video frames are collected in video mode with short exposure time while ground-truth images are collected in image mode with long exposure time.
- 2. An end-to-end fully convolutional network pipeline with a fine-tuning strategy for low-light video enhancement.
- 3. Experiments show that, on the newly-collected dataset, our proposed pipeline outperforms previous video enhancement methods in terms of both numerical evaluation metrics and human perceptual evaluation.
- ▸ Project Video
- ▸ Project Report
- ▸ View Github
Event-to-Sentence Using BERT in Automated Story Generation
×
- ▸ Proposed an event-to-sentence method based on pre-trained BERT by inserting words between event tokens.
- ▸ Designed and Implemented an Editing-Writing Network consisting of a BERT encoder and an RNN decoder, which iteratively generates sentences and revises its own output.
- ▸ Demonstrated the extent to which the general linguistic knowledge acquired through unsupervised training can be applied to the particular task of story generation, with appropriate adjustment.
- ▸ View Github
A Blockchain and Smart Contract Application
×
- ▸ Implemented a smart contract based on Ethereum framework.
- ▸ Developed a web-based interface for transaction creation, manipulation and approval.
Selected Course Projects
Neuro-Concepts
CMU 85707 Spring 2022
×
-
The False Belief (FB) task was originally proposed to testify the Theory-of-Mind (ToM) competence in young children. It was believed that kids younger than 3 tend to fail the FB task because they have not yet developed ToM, i.e. a representational conception of mental states that entail beliefs, desires, emotions and intentions.
-
However, as investigations go deeper, the field starts to observe conflicting results. Researchers have found indirect ways to verify that kids much younger than 3 should in theory already be competent enough to pass the FB task, yet on the surface they somehow don't seem to show the competence.
-
▸ This paper makes an argument: Researchers might have been mistaking a 3-year-old's difficulty in understanding the task for a lack of the ability to accomplish the task.
-
▸ This paper proposes a new testing approach to disentangle two possible reasons for the failure of typically-developing 3-year-olds on the FB task: difficulty in understanding the task vs. insufficient competence to succeed at the task.
- ▸ Final Paper
Conlanging
CMU 11823 Spring 2022
×
-
Skyming is a conlang (constructed language) that I created throughout the course. Skyming is designed to be spoken by human beings living on a spaceship drifting in the outer space after the destruction of the Earth.
-
▸ The pronoun and time systems are designed based on important cultural concepts among Skyming's speakers.
-
▸ An intersting part of creating my conlang is that I've decided to get rid of verbs. However, I did not completely get rid of verbs. I managed to devise the syntax using four "dummy verbs": 1) "Dzaeki" denoting the Agent-Patient relationship, 2) "Oto" denoting the Figure-on-the-Ground relationship, 3) "Iminna" denoting Equivalence, and 4) "Ati" denoting the Causatives.
- ▸ Reference Grammar
- ▸ Slides
- ▸ Vocabulary
Sociolinguistics
CMU 11724 Fall 2021
×
-
Biber introduced five major dimensions of variation in English in his . This paper takes the first dimension --- Involved vs. Informational --- as the linguistic variable of interest, and studies its relationship with the social variable: age.
-
The Involved style is marked by features which typically show interactions between the writer and the reader. On the other hand, the Informational style usually goes hand-in-hand with formalness.
-
▸ I hypothesize that the degree of involvedness linearly increases as age grows.
-
▸ I use computational methods to quantify the strength of the Involvedness for blog articles in the Blog Authorship Corpus.
The strength of Involvedness can be signified by twelve low-level linguistic features, namely 1) Nouns, 2) Attributive adjectives, 3) Private verbs, 4) Contractions, 5) Analytic negation, 6) Pronoun 'it', 7) Causative subordination, 8) Present tense verbs, 9) First pronouns, 10) Second pronouns, 11) Indefinite pronouns, 12) Amplifiers.
-
▸ I fit a linear regression model between age and each involvedness-signifying linguistic feature. Eleven out of those twelve linguistic features have statistically significant anti-correlations with age.
-
▸ The results strongly support my hypothesis that English writings authored by elder people are more informational and less involved than those by younger people.
- ▸ Final Paper
Internet Computing
HKUST COMP4021 Fall 2019
×
-
The project's name is "World Travel". It's a game that borrows some ideas from "monopoly". The player can select an icon on the menu page to represent the player. On the map page, the player rolls a die and then walks around a map according to the die outcome. Then a card will show up and describe about a destination (a city). Once a city is visited, the corresponding cell on the map will disappear. This means, the player cannot repeatly visit the same cell. The game is finished when all cities are visited.
- ▸ Client Side Techniques:
HTML, Javascript, jQuery, timer, bootstrap, animation
- ▸ Server Side Techniques:
PHP, jQuery ajax, XML & TXT file for storage and "memory"
- ▸ Client Side Functions:
Choose player icon, play the game, switch between menu page and map page, reload unfinished trip
- ▸ Server Side Functions:
Store game status, check game status upon page reloading
- ▸ View Github
Machine Learning
Georgia Tech CS4641 Spring 2019
×
- ▸ Supervised Learning:
This project explored multiple techniques
(knn, svm, dt, gbdt, mlp) in supervised learning tasks
(binary classification, multi-class classification).
- ▸ Randomized Optimization:
This project gave a comparative analysis of four RO algorithms
(hill climbing, simulated annealing, genetic algorithm, MIMIC)
and discussed how they behave on three search problems
(traveling salesman problem, multiple-optimum problem, single-optima problem).
- ▸ Unsupervised Learning:
This project used some of the clustering
(k-means, expected maximization)
and dimentionality reduction algorithms
(PCA, ICA, randomized projection, info gain)
on datasets I had previously analyzed in Supervised Learning and discussed how they affect the classification results.
- ▸ Markov Decision Process:
This project explored Markov Decision Process by designing maze problems, and solving them using Value Iteration, Policy Iteration and Reinforcement Learning.
- ▸ View Github
Information Visualization
Georgia Tech CS4460 Spring 2019
×
- ▸ This project implemented an interactive visualization of data for the web, using D3 and Javascript.
- ▸ We implemented a multiple view visualization consisting of a scattermatrix and a bar chart, showing the financial status and ethic distributions of college students in US.
- ▸ Our design supports overview & detail, burshing & linking and real-time filtering.
- ▸ View Github
Language Modelling
HKUST COMP4901K Fall 2018
×
- ▸ This project aims at building a neural network language model, which, given the previous words in a sentence, is capable of predicting the last word.
- ▸ I explored various techniques to boost performance, including hyperparameter tuning and better architecture design
(Skip connections, CNN, Bi-LSTM)
and attention mechanism.
- ▸ Finally, an RNN model with skip-connections produced the best result. Hyperparameter tuning and pre-training methods such as Word-to-Vec also added to further improvement. Neverthelss, I found that Bi-LSTM and attention mechanisms easily caused overfitting in this particular problem.
- ▸ View Github
Customer Revenue Prediction with Spark
HKUST COMP4651 Fall 2018
×
- ▸ This project aims at a Kaggle challenge for predicting the amount of money spent by a GStore customer. We explored visualization, data processing and machine learning tools on Spark platform.
- ▸ Since this is a Cloud Computing course project, we utlized Intel DevCloud for big data cleaning and Amazon S3 for data storage
- ▸ We built three machine learning models, namely GBT regressor, RandomForest regressor and Linear Regression, among which RandomForest regressor produced the lowest RMSE (0.07) on validation data.
- ▸ View Github
Education
Georgia Institute of Technology
Exchange Spring 2019
Honors
Carnegie Mellon University Research Fellowship 2020 - 2022
Academic Achievement Medal Hong Kong University of Science and Technology 2020
Bachelor Degree First Class Honor Hong Kong University of Science and Technology 2020
Dean’s List Hong Kong University of Science and Technology 2016 - 2020
University’s Scholarship Scheme for Continuing Undergraduate Students 2016-2019
The Cheng Foundation Scholarships for Chinese Mainland Undergraduate Students 2018 - 2019
The Hong Kong Electric Co.Ltd. Scholarship 2017 - 2018
Mingxi Youth Award Scheme 2017 - 2018