Worcester Polytechnic Institute Electronic Theses and Dissertations Collection

Title page for ETD etd-051107-154052


Document Typemasters report
Author NameHu, Qing
URNetd-051107-154052
TitlePredictor Selection in Linear Regression: L1 regularization of a subset of parameters and Comparison of L1 regularization and stepwise selection
DegreeMS
DepartmentMathematical Sciences
Advisors
  • Ryung Kim, Advisor
  • Jayson Wilbur, Co-Advisor
  • Bogdan M. Vernescu , Department Head
  • Keywords
  • L1 regularization
  • Lasso
  • Feature selection
  • Covariate selection
  • Date of Presentation/Defense2007-05-14
    Availability unrestricted

    Abstract

    Background: Feature selection, also known as variable selection, is a technique that selects a subset from a large collection of possible predictors to improve the prediction accuracy in regression model. First objective of this project is to investigate in what data structure LASSO outperforms forward stepwise method. The second objective is to develop a feature selection method, Feature Selection by L1 Regularization of Subset of Parameters (LRSP), which selects the model by combining prior knowledge of inclusion of some covariates, if any, and the information collected from the data. Mathematically, LRSP minimizes the residual sum of squares subject to the sum of the absolute value of a subset of the coefficients being less than a constant. In this project, LRSP is compared with LASSO, Forward Selection, and Ordinary Least Squares to investigate their relative performance for different data structures. Results: simulation results indicate that for moderate number of small sized effects, forward selection outperforms LASSO in both prediction accuracy and the performance of variable selection when the variance of model error term is smaller, regardless of the correlations among the covariates; forward selection also works better in the performance of variable selection when the variance of error term is larger, but the correlations among the covariates are smaller. LRSP was shown to be an efficient method to deal with the problems when prior knowledge of inclusion of covariates is available, and it can also be applied to problems with nuisance parameters, such as linear discriminant analysis.

    Files
  • abstract.pdf
  • acknowledgments.pdf
  • contents.pdf
  • listoffigures.pdf
  • listoftables.pdf
  • main.pdf
  • title.pdf

  • Browse by Author | Browse by Department | Search all available ETDs

    [WPI] [Library] [Home] [Top]

    Questions? Email etd-questions@wpi.edu
    Maintained by webmaster@wpi.edu