zbMATH — the first resource for mathematics

Random subspace method for high-dimensional regression with the R package regRSM. (English) Zbl 1347.65033
Summary: Model selection and variable importance assessment in high-dimensional regression are among the most important tasks in contemporary applied statistics. In our procedure, implemented in the package regRSM, the Random Subspace Method (RSM) is used to construct a variable importance measure. The variables are ordered with respect to the measures computed in the first step using the RSM and then, from the hierarchical list of models given by the ordering, the final subset of variables is chosen using information criteria or validation set. Modifications of the original method such as the weighted Random Subspace Method and the version with initial screening of redundant variables are discussed. We developed parallel implementations which enable to reduce the computation time significantly. In this paper, we give a brief overview of the methodology, demonstrate the package’s functionality and present a comparative study of the proposed algorithm and the competitive methods like lasso or CAR scores. In the performance tests the computational times for parallel implementations are compared.

65C60 Computational problems in statistics (MSC2010)
62-04 Software, source code, etc. for problems pertaining to statistics
62J05 Linear regression; mixed models
Full Text: DOI
[1] Breiman, L, Random forests, Mach Learn, 45, 5-32, (2001) · Zbl 1007.68152
[2] Chen, J; Chen, Z, Extended Bayesian information criteria for model selection with large model spaces, Biometrika, 95, 759-771, (2008) · Zbl 1437.62415
[3] Cheng, J; Levina, E; Wang, P; Zhu, J, A sparse Ising model with covariates, Biometrics, 70, 943-953, (2014) · Zbl 1393.62057
[4] Donoho DL (2000) High-dimensional data analysis: the curses and blessings of dimensionality. Aide-memoire of a lecture at AMS conference on math challenges of the 21st century
[5] Fan, J; Lv, J, Sure independence screening for ultra-high dimensional feature space (with discussion), J R Stat Soc B, 70, 849-911, (2008) · Zbl 1411.62187
[6] Fan, Y; Tang, CY, Tuning parameter selection in high dimensional penalized likelihood, J R Stat Soc Ser B (Stat Methodol), 75, 531-552, (2013) · Zbl 1411.62216
[7] Feldman B (2005) Relative importance and value. http://ssrn.com/abstract=2255827
[8] Friedman, JH, Multivariate adaptive regression splines, Ann Stat, 19, 1-67, (1991) · Zbl 0765.62064
[9] Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1-22
[10] Gentle JE (2007) Matrix algebra: theory, computations, and applications in statistics. Springer, New York · Zbl 1133.15001
[11] Grömping U (2006) Relative importance for linear regression in R: the package relaimpo. J Stat Softw 17(1):1-27
[12] Hannum, G; Guinney, J; Zhao, L; Zhang, L; Hughes, G; Sadda, S; Klotzle, B; Bibikova, M; Fan, JB; Gao, Y; Deconde, R; Chen, M; Rajapakse, I; Friend, S; Ideker, T; Zhang, K, Genome-wide methylation profiles reveal quantitative views of human aging rates, Mol Cell, 49, 359-367, (2013)
[13] Hao Y (2002) Rmpi: parallel statistical computing in R. R News 2(2):10-14. http://cran.r-project.org/doc/Rnews/Rnews_2002-2.pdf
[14] Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference and prediction. Springer. http://www-stat.stanford.edu/tibs/ElemStatLearn/ · Zbl 1273.62005
[15] Ho, TK, The random subspace method for constructing decision forests, IEEE Trans Pattern Anal Mach Intell, 20, 832-844, (1998)
[16] Huang, J; Ma, S; Zhang, C-H, Adaptive lasso for high-dimensional regression models, Stat Sin, 18, 1603-1618, (2008) · Zbl 1255.62198
[17] Jolliffe, IT, A note on the use of principal components in regression, J R Stat Soc Ser C (Appl Stat), 31, 300-303, (1982)
[18] Kuhn, M, Building predictive models in R using the caret package, J Stat Softw, 28, 1-26, (2008)
[19] Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml
[20] Lindemann R, Merenda P, Gold R (1980) Introduction to bivariate and multivariate analysis. Scott Foresman, Glenview · Zbl 0455.62039
[21] Martens, H, Reliable and relevant modelling of real world data: a personal account of the development of PLS regression, Chemom Intell Lab Syst, 58, 85-95, (2001)
[22] Mielniczuk, J; Teisseyre, P, Using random subspace method for prediction and variable importance assessment in regression, Comput Stat Data Anal, 71, 725-742, (2014) · Zbl 06975420
[23] Rencher AC, Schaalje GB (2008) Linear models in statistics. Wiley, Hoboken · Zbl 1136.62045
[24] Revolution Analytics, Weston S (2013) doParallel: foreach parallel adaptor for the parallel package. http://CRAN.R-project.org/package=doParallel. R package version 1.0.6 · Zbl 1007.68152
[25] Shao, J; Deng, X, Estimation in high-dimensional linear models with deterministic covariates, Ann Stat, 40, 812-831, (2012) · Zbl 1273.62177
[26] Tibshirani, R, Regression shrinkage and selection via the lasso, J R Stat Soc B, 58, 267-288, (1996) · Zbl 0850.62538
[27] Wold, S, Personal memories of the early PLS development, Chemom Intell Lab Syst, 58, 83-84, (2001)
[28] Zhang, C-H; Zhang, T, A general theory of concave regularization for high-dimensional sparse estimation problems, Stat Sci, 27, 576-593, (2012) · Zbl 1331.62353
[29] Zhang, Y; Lia, R; Tsaia, C-L, Regularization parameter selections via generalized information criterion, J Am Stat Assoc, 105, 312-323, (2012) · Zbl 1397.62262
[30] Zheng, X; Loh, W-Y, A consistent variable selection criterion for linear models with high-dimensional covariates, Stat Sin, 7, 311-325, (1997) · Zbl 0880.62068
[31] Zou, H; Hastie, T, Regularization and variable selection via the elastic net, J R Stat Soc B, 67, 301-320, (2005) · Zbl 1069.62054
[32] Zuber, V; Strimmer, K, High-dimensional regression and variable selection using car scores, Stat Appl Genet Mol Biol, 10, 301-320, (2011) · Zbl 1296.92082
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.