Heather Gunn, University of California, Los Angeles
Fitting a LASSO to multiply imputed data: A missing discussion
Date & Time: Wednesday, July 21 at 12:00PM (Noon) EST
Behavioral science researchers often use standard linear regression to identify relevant predictors of an outcome of interest. Testing all predictors simultaneously can lead to overfitting and inflation of standard errors. Regularization methods like the LASSO reduce the risk of overfitting, increase model interpretability, and improve prediction in future samples; however, handling missing data when using regularization-based variable selection methods is complicated. Typically, researchers use listwise deletion or ad-hoc single-imputation strategies like mean imputation to handle missing data when fitting the LASSO, which can lead to loss of precision, substantial bias, and a reduction in predictive ability. In this talk, we describe three approaches for fitting a LASSO when using multiple imputation to handle missing data: a separate approach, a stacked approach, and the MI-LASSO. In the separate approach, a LASSO is fit to each imputed data set, resulting in a different selection of variables in each imputed data set. In the stacked approach, a single LASSO is fit to the stacked set of imputed data sets. Finally, the MI-LASSO uses the group LASSO to fit a LASSO to each imputed data set simultaneously, resulting in consistent variable selection. We illustrate how to implement these approaches in practice using an applied example, highlighting the different decision points needed for each approach. We end with a discussion of the implications for using each approach and additional research needed to solidify recommendations for best practices.
About the Speaker
Dr. Gunn earned her PhD in Quantitative Psychology from Arizona State University in 2019. She completed her T32 postdoctoral fellow position at UCLA’s Center for HIV Identification, Prevention, and Treatment Services in May of this year and has recently started a Research Associate position at Mayo Clinic in Rochester, MN. Her methodological research interests include measurement invariance/DIF, missing data, and machine learning.