DS MS Thesis Defense | Aviv Nur | Wednesday, August 7, 2024 @ 11:00AM via Zoom
11:00 am to 12:00 pm
DATA SCIENCE
MS Thesis Defense
Aviv Nur
Wednesday, August 7, 2024 | 11:00AM - Noon
Location: Hosted Via Zoom
(Contact the DS Department for the Zoom Link)
Thesis Committee:
Advisor: Professor Chun-Kit Ngan, Worcester Polytechnic Institute, USA
Co-Advisor: Dr. Rolf Bardeli, thyssenkrupp Materials Services GmbH, Essen, Germany
Reader: Professor Randy Paffenroth, Worcester Polytechnic Institute, USA
Title: FSL-LFMG: Few-shot Learning with Augmented Latent Features and Multitasking Generation for Enhancing Multiclass Classification on Tabular Data: Existing and New Concepts
Abstract:
In my Master's thesis, I propose advancing ProtoNet that employs augmented latent features (LF) by an autoencoder and multitasking generation (MG) by STUNT in the few-shot learning (FSL) mechanism. Specifically, the achieved contributions to this work are fourfold. First, I propose an FSL-LFMG framework to develop an end-to-end few-shot multiclass classification workflow on tabular data. This framework is composed of three main stages that include (i) data augmentation at the sample level utilizing autoencoders to generate augmented LF, (ii) data augmentation at the task level involving self-generating multitasks using the STUNT approach, and (iii) the learning process taking place on ProtoNet, followed by various model evaluations in my FSL mechanism. Second, due to the outlier and noise sensitivity of K-means clustering and the curse of dimensionality of Euclidean distance, I enhance and customize the STUNT approach by using K-medoids clustering that is less sensitive to noisy outliers and Manhattan distance that is the most preferable for high-dimensional data. Third, I conduct an extensive experimental study on four diverse domain datasets—Net Promoter Score segmentation, Dry Bean type, Wine type, and Forest Cover type—to prove that my FSL-LFMG approach on the multiclass classification outperforms the Tree Ensemble models and the One-vs-the-rest classifiers by 7.8% in 1-shot and 2.5% in 5-shot learning. Finally, I demonstrate the adaptation of new concept task on the model obtained from the FSL-LFMG framework — from the NPS segmentation (the existing concept) and obtain a level of customer’s loyalty (the new concept) — to assess the power of generalization of this framework by significant results of the mean test accuracy in both 1-shot setting (83.95%) and 5-shot setting (103.52%).
Keywords: Few-shot Learning, Machine Learning, Deep Learning, Multiclass Classification, New Concept Learning, Autoencoders, Random Forest, CatBoost, One Vs Rest Classifier, STUNT, Prototypical Network, Tabular Data.