'Large Scale Matrix Completion and Recommender Systems,' by Lily Amadeo.


Data Science


The goal of this thesis is to extend the theory and practice of matrix completion algorithms, and how they can be utilized, improved, and scaled up to handle large data sets. Matrix completion involves predicting missing entries in real-world data matrices using the modeling assumption that the fully observed matrix is low-rank. Low-rank matrices appear across a broad selection of domains, and such a modeling assumption is similar in spirit to Principal Component Analysis. Our focus is on large scale problems, where the matrices have millions of rows and columns. In this thesis we provide new analysis for the convergence rates of matrix completion techniques using convex nuclear norm relaxation. In addition, we validate these results on both synthetic data and data from two real-world domains (recommender systems and Internet tomography). The results we obtain show that with an empirical, data-inspired understanding of various parameters in the algorithm, this matrix completion problem can be solved more efficiently than some previous theory suggests, and therefore can be extended to much larger problems with greater ease.

Lily Amadeo, Adviser: Randy Paffenroth. Reader: Andrew Trapp