Data Science | MS Thesis Presentation | Melanie Jutras | Dimension Reduction and LASSO using Pointwise and Group Norms

Tuesday, December 11, 2018
12:00 pm
Floor/Room #: 
Chairman's Room

DATA SCIENCE

Master of Science Thesis Presentation
by
Melanie Jutras

 

Tuesday, December 11, 2018
12:00 pm
WPI Campus Center Chairman’s Room

Thesis Advisor: Dr. Randy Paffenroth
Thesis Reader: Dr. Lane Harrison

 

 

Title: 
Dimension Reduction and LASSO
using
Pointwise and Group Norms 

Abstract:
Principal Components Analysis (PCA) is a statistical procedure commonly used for the purpose of analyzing high dimensional data. It is often used for dimensionality reduction, which is accomplished by determining orthogonal components that contribute most to the underlying variance of the data. While PCA is widely used for identifying patterns and capturing variability of data in lower dimensions, it has some known limitations. In particular, PCA represents its results as linear combinations of data attributes. PCA is therefore, often seen as difficult to interpret and because of the underlying optimization problem that is being solved it is not robust to outliers. In this thesis, we examine extensions to PCA that address these limitations. Specific techniques researched in this thesis include variations of Robust and Sparse PCA as well as novel combinations of these two methods which result in a structured low-rank approximation that is robust to outliers. Our work is inspired by the well known machine learning methods of Least Absolute Shrinkage and Selection Operator (LASSO) as well as pointwise and group matrix norms. Practical applications including robust and non-linear methods for anomaly detection in Domain Name System network data as well as interpretable feature selection with respect to a website classification problem are discussed along with implementation details and techniques for analysis of regularization parameters.