×

Projective inference in high-dimensional problems: prediction and feature selection. (English) Zbl 1476.62058

The paper discusses predictive inference and feature selection for generalised linear models, when the data is highly dimensional but with a low number of samples, e.g. as in the case of microarray records.
The article demonstrates that a two-step approach consisting of a reference model and a procedure for finding a minimal subset of features characteristic of the predictions is beneficial. Several techniques in this category are reviewed.
A new clustered projection approach that is a combination of two other methods and has both speed and good accuracy is proposed. A novel LOO cross-validation method for evaluating the feature selection process is put forward. Also, a theorem for the conditions under which the projective approach is advantageous is proved.
An R package is provided, where all the discussed methods are implemented. Several simulation studies performed and the application on five microarray data sets demonstrate also from the practical perspective the effectiveness of the approach.

MSC:

62F15 Bayesian inference
62F07 Statistical ranking and selection procedures
62J12 Generalized linear models (logistic models)
PDFBibTeX XMLCite
Full Text: DOI arXiv Euclid

References:

[1] Afrabandpey, H., Peltola, T., Piironen, J., Vehtari, A. and Kaski, S. (2019). Making Bayesian predictive models interpretable: a decision theoretic approach., arXiv:1910.09358 . · Zbl 1525.68105
[2] Ambroise, C. and McLachlan, G. J. (2002). Selection bias in gene extraction on the basis of microarray gene-expression data., Proceedings of the National Academy of Sciences 99 6562-6566. · Zbl 1034.92013
[3] Armagan, A., Clyde, M. and Dunson, D. B. (2011). Generalized beta mixtures of Gaussians. In, Advances in Neural Information Processing Systems 24 (J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira and K. Q. Weinberger, eds.) 523-531.
[4] Bair, E., Hastie, T., Paul, D. and Tibshirani, R. (2006). Prediction by supervised principal components., Journal of the American Statistical Association 101 119-137. · Zbl 1118.62326
[5] Barbieri, M. M. and Berger, J. O. (2004). Optimal predictive model selection., The Annals of Statistics 32 870-897. · Zbl 1092.62033
[6] Bernardo, J. M. and Juárez, M. A. (2003). Intrinsic Estimation. In, Bayesian Statistics 7 (J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith and M. West, eds.) 465-476. Oxford University Press.
[7] Bernardo, J. M. and Smith, A. F. M. (1994)., Bayesian Theory. John Wiley & Sons. · Zbl 0796.62002
[8] Bhadra, A., Datta, J., Polson, N. G. and Willard, B. (2017). The horseshoe \(+\) estimator of ultra-sparse signals., Bayesian Analysis 12 1105-1131. · Zbl 1384.62079
[9] Bhattacharya, A., Pati, D., Pillai, N. S. and Dunson, D. B. (2015). Dirichlet-Laplace priors for optimal shrinkage., Journal of the American Statistical Association 110 1479-1490. · Zbl 1373.62368
[10] Breiman, L. (1995). Better subset regression using the nonnegative garrote., Technometrics 37 373-384. · Zbl 0862.62059
[11] Bucila, C., Caruana, R. and Niculescu-Mizil, A. (2006). Model compression. In, Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’06 535-541. ACM.
[12] Bürkner, P.-C. (2017). brms: An R Package for Bayesian Multilevel Models Using Stan., Journal of Statistical Software 80 1-28.
[13] Candes, E. and Tao, T. (2007). The Dantzig selector: statistical estimation when \(p\) is much larger than \(n\)., The Annals of Statistics 35 2313-2351. · Zbl 1139.62019
[14] Carvalho, C. M., Polson, N. G. and Scott, J. G. (2009). Handling sparsity via the horseshoe. In, Proceedings of the 12th International Conference on Artificial Intelligence and Statistics (D. van Dyk and M. Welling, eds.). Proceedings of Machine Learning Research 5 73-80.
[15] Carvalho, C. M., Polson, N. G. and Scott, J. G. (2010). The horseshoe estimator for sparse signals., Biometrika 97 465-480. · Zbl 1406.62021
[16] Castillo, I. and van der Vaart, A. (2012). Needles and straws in a haystack: posterior concentration for possibly sparse sequences., The Annals of Statistics 40 2069-2101. · Zbl 1257.62025
[17] Cawley, G. C. and Talbot, N. L. C. (2010). On over-fitting in model selection and subsequent selection bias in performance evaluation., Journal of Machine Learning Research 11 2079-2107. · Zbl 1242.62051
[18] Dupuis, J. A. and Robert, C. P. (2003). Variable selection in qualitative models via an entropic explanatory power., Journal of Statistical Planning and Inference 111 77-94. · Zbl 1033.62066
[19] Efron, B. (2010)., Large-scale inference: empirical Bayes methods for estimation, testing, and prediction. Institute of Mathematical Statistics (IMS) Monographs 1. Cambridge University Press. · Zbl 1277.62016
[20] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression., The Annals of Statistics 32 407-499. · Zbl 1091.62054
[21] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties., Journal of the American Statistical Association 96 1348-1360. · Zbl 1073.62547
[22] Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space., Journal of the Royal Statistical Society. Series B (Methodological) 70 849-911. · Zbl 1411.62187
[23] Friedman, J., Hastie, T. and Tibshirani, R. (2010). Regularization Paths for Generalized Linear Models via Coordinate Descent., Journal of Statistical Software 33.
[24] Gabry, J., Simpson, D., Vehtari, A., Betancourt, M. and Gelman, A. (2018). Visualization in Bayesian workflow., Journal of the Royal Statistical Society. Series A 182 389-402.
[25] Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A. and Rubin, D. B. (2013)., Bayesian Data Analysis, third ed. Chapman & Hall. · Zbl 1279.62004
[26] George, E. I. and McCulloch, R. E. (1993). Variable selection via Gibbs sampling., Journal of the American Statistical Association 88 881-889.
[27] Goodrich, B., Gabry, J., Ali, I. and Brilleman, S. (2018). rstanarm: Bayesian applied regression modeling via Stan. R package version, 2.17.4.
[28] Goutis, C. and Robert, C. P. (1998). Model choice in generalised linear models: A Bayesian approach via Kullback-Leibler projections., Biometrika 85 29-37. · Zbl 0903.62061
[29] Hahn, P. R. and Carvalho, C. M. (2015). Decoupling shrinkage and selection in Bayesian linear models: a posterior summary perspective., Journal of the American Statistical Association 110 435-448. · Zbl 1373.62036
[30] Harrell, F. E. (2015)., Regression modeling strategies: with applications to linear models, logistic and ordinal regression, and survival analysis, second ed. Springer. · Zbl 1330.62001
[31] Hastie, T., Tibshirani, R. and Friedman, J. (2009)., The Elements of Statistical Learning, second ed. Springer-Verlag. · Zbl 1273.62005
[32] Hastie, T., Tibshirani, R. and Wainwright, M. (2015)., Statistical learning with sparsity: the Lasso and generalizations. Chapman & Hall. · Zbl 1319.68003
[33] Hernández-Lobato, D., Hernández-Lobato, J. M. and Suárez, A. (2010). Expectation propagation for microarray data classification., Pattern Recognition Letters 31 1618-1626.
[34] Hinton, G., Vinyals, O. and Dean, J. (2015). Distilling the knowledge in a neural network., arXiv:1503.02531 .
[35] Ishwaran, H., Kogalur, U. B. and Rao, J. S. (2010). spikeslab: Prediction and variable selection using spike and slab regression., The R Journal 2 68-73.
[36] Ishwaran, H. and Rao, J. S. (2005). Spike and slab variable selection: frequentist and Bayesian strategies., The Annals of Statistics 33 730-773. · Zbl 1068.62079
[37] Johnson, V. E. and Rossell, D. (2012). Bayesian model selection in high-dimensional settings., Journal of the American Statistical Association 107 649-660. · Zbl 1261.62024
[38] Johnstone, I. M. and Silverman, B. W. (2004). Needles and straw in haystacks: empirical Bayes estimates of possibly sparse sequences., The Annals of Statistics 32 1594-1649. · Zbl 1047.62008
[39] Lee, K. E., Sha, N., Dougherty, E. R., Vannucci, M. and Mallick, B. K. (2003). Gene selection: a Bayesian variable selection approach., Bioinformatics 19 90-97.
[40] Li, Y., Campbell, C. and Tipping, M. (2002). Bayesian automatic relevance determination algorithms for classifying gene expression data., Bioinformatics 18 1332-1339.
[41] Lindley, D. V. (1968). The choice of variables in multiple regression., Journal of the Royal Statistical Society. Series B (Methodological) 30 31-66. · Zbl 0155.26702
[42] McCullagh, P. and Nelder, J. A. (1989)., Generalized linear models, second ed. Monographs on Statistics and Applied Probability. Chapman & Hall. · Zbl 0744.62098
[43] Meinshausen, N. (2007). Relaxed Lasso., Computational Statistics & Data Analysis 52 374-393. · Zbl 1452.62522
[44] Narisetty, N. N. and He, X. (2014). Bayesian variable selection with shrinking and diffusing priors., The Annals of Statistics 42 789-817. · Zbl 1302.62158
[45] Neal, R. and Zhang, J. (2006). High dimensional classification with Bayesian neural networks and Dirichlet diffusion trees. In, Feature Extraction, Foundations and Applications (I. Guyon, S. Gunn, M. Nikravesh and L. A. Zadeh, eds.) 265-296. Springer.
[46] Nott, D. J. and Leng, C. (2010). Bayesian projection approaches to variable selection in generalized linear models., Computational Statistics and Data Analysis 54 3227-3241. · Zbl 1284.62461
[47] Paananen, T., Piironen, J., Bürkner, P.-C. and Vehtari, A. (2020). Implicitly adaptive importance sampling., arXiv:1906.08850. · Zbl 1475.62053
[48] Paul, D., Bair, E., Hastie, T. and Tibshirani, R. (2008). “Preconditioning” for feature selection and regression in high-dimensional problems., The Annals of Statistics 36 1595-1618. · Zbl 1142.62022
[49] Peltola, T. (2018). Local interpretable model-agnostic explanations of Bayesian predictive models via Kullback-Leibler projections. In, Proceedings of the 2nd Workshop on Explainable Artificial Intelligence (D. W. Aha, T. Darrell, P. Doherty and D. Magazzeni, eds.) 114-118.
[50] Peltola, T., Havulinna, A. S., Salomaa, V. and Vehtari, A. (2014). Hierarchical Bayesian survival analysis and projective covariate selection in cardiovascular event risk prediction. In, Proceedings of the 11th UAI Bayesian Modeling Applications Workshop. CEUR Workshop Proceedings 1218 79-88.
[51] Piironen, J. and Vehtari, A. (2016). Projection predictive model selection for Gaussian processes. In, 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP) 1-6. IEEE.
[52] Piironen, J. and Vehtari, A. (2017a). Comparison of Bayesian predictive methods for model selection., Statistics and Computing 27 711-735. · Zbl 1505.62321
[53] Piironen, J. and Vehtari, A. (2017b). Sparsity information and regularization in the horseshoe and other shrinkage priors., Electronic Journal of Statistics 11 5018-5051. · Zbl 1459.62141
[54] Piironen, J. and Vehtari, A. (2017c). On the hyperprior choice for the global shrinkage parameter in the horseshoe prior. In, Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (A. Singh and J. Zhu, eds.). Proceedings of Machine Learning Research 54 905-913. · Zbl 1459.62141
[55] Piironen, J. and Vehtari, A. (2018). Iterative supervised principal components. In, Proceedings of the 21st International Conference on Artificial Intelligence and Statistics (A. Storkey and F. Perez-Cruz, eds.). Proceedings of Machine Learning Research 84 106-114.
[56] Polson, N. G. and Scott, J. G. (2011). Shrink globally, act locally: sparse Bayesian regularization and prediction. In, Bayesian statistics 9 (J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith and M. West, eds.) 501-538. Oxford University Press, Oxford.
[57] Raftery, A. E., Madigan, D. and Hoeting, J. A. (1997). Bayesian model averaging for linear regression models., Journal of the American Statistical Association 92 179-191. · Zbl 0888.62026
[58] Reid, S., Tibshirani, R. and Friedman, J. (2016). A study of error variance estimation in Lasso regression., Statistica Sinica 26 35-67. · Zbl 1372.62023
[59] Reunanen, J. (2003). Overfitting in making comparisons between variable selection methods., Journal of Machine Learning Research 3 1371-1382. · Zbl 1102.68635
[60] Ribeiro, M. T., Singh, S. and Guestrin, C. (2016). “Why should I trust you?” Explaining the predictions of any classifier. In, Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’16 1135-1144. ACM.
[61] Snelson, E. and Ghahramani, Z. (2005). Compact approximations to Bayesian predictive distributions. In, Proceedings of the 22nd International Conference on Machine Learning. ICML ’05 840-847. ACM.
[62] Stan Development Team (2018). Stan modeling language users guide and reference manual, Version, 2.18.0.
[63] Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso., Journal of the Royal Statistical Society. Series B (Methodological) 58 267-288. · Zbl 0850.62538
[64] Tran, M.-N., Nott, D. J. and Leng, C. (2012). The predictive Lasso., Statistics and Computing 22 1069-1084. · Zbl 1252.62075
[65] van der Pas, S. L., Kleijn, B. J. K. and van der Vaart, A. W. (2014). The horseshoe estimator: posterior concentration around nearly black vectors., Electronic Journal of Statistics 8 2585-2618. · Zbl 1309.62060
[66] Vehtari, A., Gelman, A. and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC., Statistics and Computing 27 1413-1432. · Zbl 1505.62408
[67] Vehtari, A. and Ojanen, J. (2012). A survey of Bayesian predictive methods for model assessment, selection and comparison., Statistics Surveys 6 142-228. · Zbl 1302.62011
[68] Vehtari, A., Simpson, D., Gelman, A., Yao, Y. and Gabry, J. (2019). Pareto smoothed importance sampling., arXiv:1507.02646 .
[69] Yao, Y., Vehtari, A., Simpson, D. and Gelman, A. (2018). Using stacking to average Bayesian predictive distributions (with discussion)., Bayesian Analysis 13 917-1003. · Zbl 1407.62090
[70] Zanella, G. and Roberts, G. (2019). Scalable importance tempering and Bayesian variable selection., Journal of the Royal Statistical Society. Series B (Methodological) 81 489-517. · Zbl 1420.62059
[71] Zou, H. (2006). The adaptive Lasso and its oracle properties., Journal of the American Statistical Association 101 1418-1429. · Zbl 1171.62326
[72] Zou, H. · Zbl 1069.62054
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.