DS Ph.D. Dissertation Proposal | Dongyu Zhang | Tuesday, Nov. 28th @ 10:00AM

DATA SCIENCE

Ph.D. Dissertation Proposal

Dongyu Zhang, Ph.D. Candidate

Tuesday, Nov. 28th, 2023 | 10:00AM - 12:00PM EST

Location: Campus Center, Mid-Century Room

Dissertation Committee:

Dr. Elke A. Rundensteiner, Worcester Polytechnic Institute, Advisor

Dr. Xiangnan Kong, Worcester Polytechnic Institute

Dr. Nima Kordzadeh, Worcester Polytechnic Institute

Dr. Liang Wang, Visa Research

Title: Learning with Incomplete, Inaccurate, and Multi-level Labeled Data

Abstract:

Deep learning models excel in various tasks but require large amounts of accurate labels. Unfortunately, acquiring quality labels is costly and requires domain expertise. Hence, datasets tend to have missing or noisy labels. Additionally, data might be labeled on multiple levels. For instance, in detecting foodborne illness incidents from a tweet, the aim at the tweet level is to predict illness indication, while at the word level, it is to identify relevant slots like location or food group. However, both levels may have missing and noisy labels with label quality and completeness potentially varying across levels.

This dissertation explores three directions for handling incomplete, noisy, and multi-level labeled data. Direction 1 aims to learn from two-level task datasets where one task has complete labels and the other has incomplete labels. We propose a novel solution that integrates joint learning of tasks at both levels and strikes a balance between the fully labeled and incompletely labeled tasks. Direction 2 focuses on learning with noisy labeled data. We propose a method that harnesses the Local Intrinsic Dimensionality (LID) score to detect and correct noisy labels. Direction 3 aims to learn with two-level labeled data exhibiting both incomplete and noisy labels. We plan to capitalize on the relationship between tasks and integrate weak labels obtained from Large Language Models (LLMs) to achieve better performance.

To validate the effectiveness of our proposed methods, we have conducted preliminary experimental studies on real-world domains comparing them with state-of-the-art methods. Our experimental results demonstrate that our proposed methods outperform state-of-the-art methods across these label-related challenges.

DS Ph.D. Dissertation Proposal | Dongyu Zhang | Tuesday, Nov. 28th @ 10:00AM

DEPARTMENT(S):

PHONE NUMBER:

EMAIL: