CS/DS Colloquium ft. Dr. Fatemeh Nargesian

Tuesday, April 11, 2023
1:00 pm to 2:00 pm

United States

Floor/Room #
4th floor, Room 405

Image removed.

Fatemeh Nargesian, Ph.D.

Assist. Professor of Computer Science

University of Rochester


Tuesday, April 11, 2023

1:00PM - 2:00PM

Unity Hall 405


Data Lakes: Discovery, Data Debiasing, and Query Answering

Data for AI is increasingly reliant on the integration of multiple sources – sometimes obtained from open data repositories or data marketplaces. Despite decades of research in data integration and cleaning, we are still not sure how to construct AI-ready structured datasets – data with descriptive features and representative distribution. In this talk, first, I will describe how to discover relevant datasets, based on join and union operations from large-scale data repositories, by designing efficient index structures. Next, I will show how to tailor a dataset with desired distribution requirements from multiple sources, in order to construct unbiased datasets. We will also see how to obtain an IID sample over normalized data, to improve the efficiency of model training and perform approximate query answering. Finally, I will conclude by discussing distribution-aware and human-centric aspects of the management of data lakes.  

Bio: Fatemeh Nargesian is an assistant professor of computer science at the University of Rochester. She obtained her PhD at the University of Toronto. Her research interests are in data management for AI-based data analytics and scientific time-series management. Her work has appeared at top-tier venues including VLDB, SIGMOD, and ICDE, and has won the best demo award of VLDB

Host: Prof. Elke Rundensteiner, Data Science



Data Science
Contact Person
Kelsey Briggs