IPF-LASSO: integrative \(L_1\)-penalized regression with penalty factors for prediction based on multi-omics data. (English) Zbl 1370.92016

Summary: As modern biotechnologies advance, it has become increasingly frequent that different modalities of high-dimensional molecular data (termed “omics” data in this paper), such as gene expression, methylation, and copy number, are collected from the same patient cohort to predict the clinical outcome. While prediction based on omics data has been widely studied in the last fifteen years, little has been done in the statistical literature on the integration of multiple omics modalities to select a subset of variables for prediction, which is a critical task in personalized medicine. In this paper, we propose a simple penalized regression method to address this problem by assigning different penalty factors to different data modalities for feature selection and prediction. The penalty factors can be chosen in a fully data-driven fashion by cross-validation or by taking practical considerations into account. In simulation studies, we compare the prediction performance of our approach, called IPF-LASSO (integrative LASSO with penalty factors) and implemented in the R package ipflasso, with the standard LASSO and sparse group LASSO. The use of IPF-LASSO is also illustrated through applications to two real-life cancer datasets. All data and codes are available on the companion website to ensure reproducibility.


92B15 General biostatistics
62P10 Applications of statistics to biology and medical sciences; meta analysis
62J02 General nonlinear regression


PMA; ipflasso; R; glmnet; glasso
Full Text: DOI


[1] Ioannidis, J. P. A., Expectations, validity, and reality in omics, Journal of Clinical Epidemiology, 63, 9, 945-949, (2010)
[2] Hatzis, C.; Pusztai, L.; Valero, V.; Booser, D. J.; Esserman, L.; Lluch, A.; Vidaurre, T.; Holmes, F.; Souchon, E.; Wang, H.; Martin, M.; Cotrina, J.; Gomez, H.; Hubbard, R.; Chacón, J. I.; Ferrer-Lozano, J.; Dyer, R.; Buxton, M.; Gong, Y.; Wu, Y.; Ibrahim, N.; Andreopoulou, E.; Ueno, N. T.; Hunt, K.; Yang, W.; Nazario, A.; DeMichele, A.; O’Shaughnessy, J.; Hortobagyi, G. N.; Symmans, W. F., A genomic predictor of response and survival following taxane-anthracycline chemotherapy for invasive breast cancer, JAMA, 305, 18, 1873-1881, (2011)
[3] The Cancer Genome Atlas Research Network, Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia, New England Journal of Medicine, 368, 22, 2059-2074, (2013)
[4] Acharjee, A.; Kloosterman, B.; Visser, R. G. F.; Maliepaard, C., Integration of multi-omics data for prediction of phenotypic traits using random forest, BMC Bioinformatics, 17, 5, article 180, (2016)
[5] Vazquez, A. I.; Veturi, Y.; Behring, M.; Shrestha, S.; Kirst, M.; Resende, M. F. R.; de los Campos, G., Increased proportion of variance explained and prediction accuracy of survival of breast cancer patients with use of whole-genome multiomic profiles, Genetics, 203, 3, 1425-1438, (2016)
[6] Zhao, Q.; Shi, X.; Xie, Y.; Huang, J.; BenShia, C.; Ma, S., Combining multidimensional genomic measurements for predicting cancer prognosis: observations from TCGA, Briefings in Bioinformatics, 16, 2, 291-303, (2015)
[7] Fuchs, M.; Beißbarth, T.; Wingender, E.; Jung, K., Connecting high-dimensional mRNA and miRNA expression data for binary medical classification problems, Computer Methods and Programs in Biomedicine, 111, 3, 592-601, (2013)
[8] De Bin, R.; Sauerbrei, W.; Boulesteix, A.-L., Investigating the prediction ability of survival models based on both clinical and omics data: two case studies, Statistics in Medicine, 33, 30, 5310-5329, (2014)
[9] Witten, D. M.; Tibshirani, R.; Hastie, T., A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis, Biostatistics, 10, 3, 515-534, (2009)
[10] Tibshirani, R., Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society, Series B: Methodological, 58, 1, 267-288, (1996) · Zbl 0850.62538
[11] Boulesteix, A.-L.; Schmid, M., Machine learning versus statistical modeling, Biometrical Journal, 56, 4, 588-593, (2014) · Zbl 1441.62290
[12] Tibshirani, R., Regression shrinkage and selection via the lasso: a retrospective, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73, 3, 273-282, (2011) · Zbl 1411.62212
[13] Park, T.; Casella, G., The Bayesian lasso, Journal of the American Statistical Association, 103, 482, 681-686, (2008) · Zbl 1330.62292
[14] Zou, H., The adaptive lasso and its oracle properties, Journal of the American Statistical Association, 101, 476, 1418-1429, (2006) · Zbl 1171.62326
[15] Yuan, M.; Lin, Y., Model selection and estimation in regression with grouped variables, Journal of the Royal Statistical Society. Series B. Statistical Methodology, 68, 1, 49-67, (2006) · Zbl 1141.62030
[16] Meier, L.; Van De Geer, S.; Bühlmann, P., The group Lasso for logistic regression, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70, 1, 53-71, (2008) · Zbl 1400.62276
[17] Simon, N.; Friedman, J.; Hastie, T.; Tibshirani, R., A sparse-group lasso, Journal of Computational and Graphical Statistics, 22, 2, 231-245, (2013)
[18] Gross, S. M.; Tibshirani, R., Collaborative regression, Biostatistics, 16, 2, 326-338, (2015)
[19] Ward, J.; Rakszegi, M.; Bedő, Z.; Shewry, P. R.; Mackay, I., Differentially penalized regression to predict agronomic traits from metabolites and markers in wheat, BMC Genetics, 16, article 19, (2015)
[20] van de Wiel, M. A.; Lien, T. G.; Verlaat, W.; van Wieringen, W. N.; Wilting, S. M., Better prediction by use of co-data: adaptive group-regularized ridge regression, Statistics in Medicine, 35, 3, 368-381, (2016)
[21] Boulesteix, A.-L.; Richter, A.; Bernau, C., Complexity selection with cross-validation for lasso and sparse partial least squares using high-dimensional data, Algorithms from and for Nature and Life, 261-268, (2013), Berlin, Germany: Springer, International Publishing, Berlin, Germany
[22] Friedman, J.; Hastie, T.; Tibshirani, R., Regularization paths for generalized linear models via coordinate descent, Journal of Statistical Software, 33, 1, 1-22, (2010)
[23] Simon, N.; Friedman, J.; Hastie, T.; Tibshirani, R., Fit a GLM (or cox model) with a combination of lasso and group lasso regularization
[24] Graf, E.; Schmoor, C.; Sauerbrei, W.; Schumacher, M., Assessment and comparison of prognostic classification schemes for survival data, Statistics in Medicine, 18, 17-18, 2529-2545, (1999)
[25] Boulesteix, A.-L.; Hable, R.; Lauer, S.; Eugster, M. J., A statistical framework for hypothesis testing in real data comparison studies, The American Statistician, 69, 3, 201-212, (2015)
[26] Boulesteix, A.-L., Ten simple rules for reducing overoptimistic reporting in methodological computational research, PLoS Computational Biology, 11, 4, (2015)
[27] Ternès, N.; Rotolo, F.; Heinze, G.; Michiels, S., Identification of biomarker-by-treatment interactions in randomized clinical trials with survival outcomes and high-dimensional spaces, Biometrical Journal, (2016) · Zbl 1369.62306
[28] Meinshausen, N.; Bühlmann, P., Stability selection, Journal of the Royal Statistical Society. Series B. Statistical Methodology, 72, 4, 417-473, (2010) · Zbl 1411.62142
[29] Sauerbrei, W.; Buchholz, A.; Boulesteix, A.-L.; Binder, H., On stability issues in deriving multivariable regression models, Biometrical Journal, 57, 4, 531-555, (2015) · Zbl 1329.62035
[30] Zou, H.; Hastie, T., Regularization and variable selection via the elastic net, Journal of the Royal Statistical Society. Series B. Statistical Methodology, 67, 2, 301-320, (2005) · Zbl 1069.62054
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.