Computer Science Department, PhD Defense , Yichuan Li " Text-Centric Representation Learning: Integrating Multimodal Knowledge to Overcome Labeled Data Scarcity "
10:00 a.m. to 11:00 a.m.
Yichuan Li
PhD Candidate
WPI – Computer Science Department
Wednesday, March 5, 2025
Time: 10:00 a.m. – 12:00 p.m.
Location: Gordon Library , Room 303
Zoom Link: https://wpi.zoom.us/j/4166606557?omn=96360839200
Dissertation Committee:
Dr. Kyumin Lee, PhD Advisor, WPI – Computer Science
Dr. Nima Kordzadeh, WPI – Business School
Dr. Xiangnan Kong, WPI – Computer Science
Dr. Xiaozhong Liu, WPI – Computer Science
Dr. Kaize Ding, Assistant Professor, Northeastern University
Abstract:
Advancements in textual representation learning often face the challenge of requiring extensive labeled data, which is resource-intensive and impractical in dynamic fields like fake news detection and social media analytics. This dissertation introduces a paradigm shift from traditional "text-only" approaches to a "text-centric" methodology, integrating auxiliary knowledge from diverse sources and modalities to enhance text representation without relying heavily on labeled data. The first contribution emphasizes incorporating human prior knowledge and domain expertise. A novel framework employs multi-source domain adaptation and weak supervision to detect fake news early, transferring knowledge from well-labeled source domains to target domains with limited labeled data.
Additionally, meta-learning and contrastive learning techniques are utilized to reduce noise in augmented data, improving text classification by reweighting and refining feature representations. The second contribution explores the fusion of numerical features with textual data in causal analyses of fake news dissemination on social media. A causal inference model combining textual and numerical covariates identifies key lexicons and posts that drive misinformation spread, informing more effective intervention strategies. The final contribution presents GRENADE, a groundbreaking graph-centric language model designed for self-supervised representation learning on text-attributed graphs. By aligning pre-trained language models with graph neural networks, GRENADE captures both textual semantics and structural context, demonstrating efficacy in learning expressive, generalized representations in citation and product networks.