Computer Science Department, PhD Defense , Yichuan Li " Text-Centric Representation Learning: Integrating Multimodal Knowledge to Overcome Labeled Data Scarcity "

Wednesday, March 5, 2025
10:00 a.m. to 11:00 a.m.

 

 

Yichuan Li

PhD Candidate 

WPI – Computer Science Department 

 

Wednesday, March 5, 2025

Time: 10:00 a.m. – 12:00 p.m. 

Location: Gordon Library , Room 303 

Zoom Link: https://wpi.zoom.us/j/4166606557?omn=96360839200

 Dissertation Committee:

Dr. Kyumin Lee,  PhD Advisor, WPI – Computer Science 

Dr. Nima Kordzadeh, WPI – Business School 

Dr. Xiangnan Kong, WPI – Computer Science 

Dr. Xiaozhong Liu, WPI – Computer Science 

Dr. Kaize Ding, Assistant Professor, Northeastern University

 

Abstract:

Advancements in textual representation learning often face the challenge of requiring extensive labeled data, which is resource-intensive and impractical in dynamic fields like fake news detection and social media analytics. This dissertation introduces a paradigm shift from traditional "text-only" approaches to a "text-centric" methodology, integrating auxiliary knowledge from diverse sources and modalities to enhance text representation without relying heavily on labeled data. The first contribution emphasizes incorporating human prior knowledge and domain expertise. A novel framework employs multi-source domain adaptation and weak supervision to detect fake news early, transferring knowledge from well-labeled source domains to target domains with limited labeled data. 

Additionally, meta-learning and contrastive learning techniques are utilized to reduce noise in augmented data, improving text classification by reweighting and refining feature representations. The second contribution explores the fusion of numerical features with textual data in causal analyses of fake news dissemination on social media. A causal inference model combining textual and numerical covariates identifies key lexicons and posts that drive misinformation spread, informing more effective intervention strategies. The final contribution presents GRENADE, a groundbreaking graph-centric language model designed for self-supervised representation learning on text-attributed graphs. By aligning pre-trained language models with graph neural networks, GRENADE captures both textual semantics and structural context, demonstrating efficacy in learning expressive, generalized representations in citation and product networks.