DS Ph.D. Dissertation Proposal | Harsh Pathak | Monday, Dec. 4th, 12:00PM EST
12:00 pm to 1:00 pm
MA
United States
DATA SCIENCE
Ph.D. Dissertation Proposal
Harsh Pathak, Ph.D. Candidate
Monday, December 4th, 2023 | 12:00PM - 1:00PM EST
Zoom Link: https://wpi.zoom.us/my/rcpaffenroth
Dissertation Committee:
Professor Randy Paffenroth, Advisor, WPI
Professor Jacob Whitehill, WPI
Professor Oren Mangoubi
Dr. Wei Lee Woon, External Committee Member, Expedia Group
Title: Continuation Methods for Deep Neural Networks: Theory and Practice
Abstract:
This proposal explores the landscape of training methods and non-convex optimization
in deep neural networks through the lens of continuation methods. While deep learning
has achieved remarkable success across domains, optimization remains a pivotal step in
shaping network performance. We focus on the interplay between network architecture,
training techniques, solvers, and hyper-parameters that give rise to complex optimization
landscapes.
When we apply the main idea of continuation methods which is gradually moving from simple
to more complex functions, we can come up with potential ways to devise novel training
routines for neural networks. Our proposed training methods can be combined with
popular solvers such as ADAM and RMSProp. We demonstrate accelerated convergence,
and improved generalization across tasks and network types.
Continuation Methods have wide applications to solve various Iterative Dynamical
Systems. Through the lens of iterative maps, we study and reformulate various neural
network architectures such as feedforward and recurrent neural networks. As a result, we
introduce the Sequential2D a generalized iterative map to model architectures that allows
more pathways of information flow through single or hybrid models. In this proposal,
we use Sequential2D to systematically add skip connections to GPT-2 with only 1% more
parameters. Experiments show boosted fine-tuning, highlighting Sequential2D’s potential.
Overall, this research advances both theoretical and practical methods to train deep
models. The proposed techniques offer pathways to faster, more efficient network training
with strong generalization performance.