Computer Science Department, MS Thesis Presentation, Nicholas Pulsone " Budget-Aware Entity Matching Across Domains"

Tuesday, April 21, 2026
2:30 p.m. to 3:30 p.m.


Nicholas Pulsone

MS Student
WPI – Computer Science Department


Tuesday, April 21, 2026

Time:  2:30 p.m. – 4:00 p.m.
Location:  Fuller Labs  141

Zoom Link: https://wpi.zoom.us/j/97178831714

Committee members :
Advisor:  Prof. Roee Shraga 
Reader:   Prof. Fabricio Murai

Abstract:

Entity Matching (EM)--the task of determining whether two data records refer to the same real-world entity--is a core task in data integration. Recent advances in deep learning have set a new standard for EM, particularly through fine-tuning Pretrained Language Models (PLMs) and, more recently, Large Language Models (LLMs). However, fine-tuning typically requires large amounts of labeled data, which are expensive and time-consuming to obtain.
In the context of e-commerce matching, labeling scarcity varies widely across domains, raising the question of how to intelligently train accurate domain-specific EM models with limited labeled data. In this work we assume users have only a limited amount of labels for a specific target domain but have access to labeled data from other domains. 
We introduce BEACON, a distribution-aware, budget-aware framework for low-resource EM across domains. BEACON leverages the insight that embedding representations of pairwise candidate matches can guide the effective selection of out-of-domain samples under limited in-domain supervision. We conduct extensive experiments across multiple domain-partitioned datasets derived from established EM benchmarks, demonstrating that BEACON consistently outperforms state-of-the-art methods under different training budgets.