Name: Computer Science Department, PhD Defense , Yichuan Li " Text-Centric Representation Learning: Integrating Multimodal Knowledge to Overcome Labeled Data Scarcity "
Start: 2025-03-05T10:00:00-0500
End: 2025-03-05T11:00:00-0500
Location: Worcester Polytechnic Institute

Yichuan Li

PhD Candidate

WPI – Computer Science Department

Wednesday, March 5, 2025

Time: 10:00 a.m. – 12:00 p.m.

Location: Gordon Library , Room 303

Zoom Link: https://wpi.zoom.us/j/4166606557?omn=96360839200

Dissertation Committee:

Dr. Kyumin Lee, PhD Advisor, WPI – Computer Science

Dr. Nima Kordzadeh, WPI – Business School

Dr. Xiangnan Kong, WPI – Computer Science

Dr. Xiaozhong Liu, WPI – Computer Science

Dr. Kaize Ding, Assistant Professor, Northeastern University

Abstract:

Advancements in textual representation learning often face the challenge of requiring extensive labeled data, which is resource-intensive and impractical in dynamic fields like fake news detection and social media analytics. This dissertation introduces a paradigm shift from traditional "text-only" approaches to a "text-centric" methodology, integrating auxiliary knowledge from diverse sources and modalities to enhance text representation without relying heavily on labeled data. The first contribution emphasizes incorporating human prior knowledge and domain expertise. A novel framework employs multi-source domain adaptation and weak supervision to detect fake news early, transferring knowledge from well-labeled source domains to target domains with limited labeled data.

Additionally, meta-learning and contrastive learning techniques are utilized to reduce noise in augmented data, improving text classification by reweighting and refining feature representations. The second contribution explores the fusion of numerical features with textual data in causal analyses of fake news dissemination on social media. A causal inference model combining textual and numerical covariates identifies key lexicons and posts that drive misinformation spread, informing more effective intervention strategies. The final contribution presents GRENADE, a groundbreaking graph-centric language model designed for self-supervised representation learning on text-attributed graphs. By aligning pre-trained language models with graph neural networks, GRENADE captures both textual semantics and structural context, demonstrating efficacy in learning expressive, generalized representations in citation and product networks.

Computer Science Department, PhD Defense , Yichuan Li " Text-Centric Representation Learning: Integrating Multimodal Knowledge to Overcome Labeled Data Scarcity "

Department(s)