zbMATH — the first resource for mathematics

Gibbs posterior for variable selection in high-dimensional classification and data mining. (English) Zbl 1274.62227
Summary: In the popular approach of Bayesian variable selection (BVS), one uses prior and posterior distributions to select a subset of candidate variables to enter the model. A completely new direction will be considered here to study BVS with a Gibbs posterior originating in statistical mechanics. The Gibbs posterior is constructed from a risk function of practical interest (such as the classification error) and aims at minimizing a risk function without modeling the data probabilistically. This can improve the performance over the usual Bayesian approach, which depends on a probability model which may be misspecified. Conditions will be provided to achieve good risk performance, even in the presence of high dimensionality, when the number of candidate variables $$K$$ can be much larger than the sample size $$n$$. In addition, we develop a convenient Markov chain Monte Carlo algorithm to implement BVS with the Gibbs posterior.

MSC:
 62F99 Parametric inference 62J02 General nonlinear regression 82-08 Computational methods (statistical mechanics) (MSC2010)
HdBCS
Full Text:
References:
 [1] Brown, P. J., Fearn, T. and Vannucci, M. (1999). The choice of variables in multivariate regres- sion: A non-conjugate Bayesian decision theory approach. Biometrika 86 635-648. JSTOR: · Zbl 1072.62510 · doi:10.1093/biomet/86.3.635 · links.jstor.org [2] Devroye, L., Györfi, L. and Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition . Springer, New York. · Zbl 0853.68150 [3] Dobra, A., Hans, C., Jones, B., Nevins, J. R., Yao, G. and West, M. (2004). Sparse graphical models for exploring gene expression data. J. Multivariate Anal. 90 196-212. · Zbl 1047.62104 · doi:10.1016/j.jmva.2004.02.009 [4] Friedman, J., Hastie, T., Rosset, S., Tibshirani, R. and Zhu, J. (2004). Discussion on boosting. Ann. Statist. 32 102-107. · Zbl 1105.62314 · euclid:aos/1105988581 [5] Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Machine Intell. 6 721-741. · Zbl 0573.62030 · doi:10.1109/TPAMI.1984.4767596 [6] George, E. I. and McCulloch, R. E. (1997). Approaches for Bayesian variable selection. Statist. Sinica 7 339-373. · Zbl 0884.62031 [7] Gerlach, R., Bird, R. and Hall, A. (2002). Bayesian variable selection in logistic regression: Predicting company earnings direction. Aust. N. Z. J. Statist. 44 155-168. · Zbl 1184.62036 · doi:10.1111/1467-842X.00218 [8] Greenshtein, E. (2006). Best subset selection, persistency in high dimensional statistical learning and optimization under \ell 1 constraint. Ann. Statist. 34 2367-2386. · Zbl 1106.62022 · doi:10.1214/009053606000000768 [9] Horowitz, J. L. (1992). A smoothed maximum score estimator for the binary response model. Econometrica 60 505-531. JSTOR: · Zbl 0761.62166 · doi:10.2307/2951582 · links.jstor.org [10] Kleijn, B. J. K. and van der Vaart, A. W. (2006). Misspecification in infinite-dimensional Bayesian statistics. Ann. Statist. 34 837-877. · Zbl 1095.62031 · doi:10.1214/009053606000000029 [11] Jiang, W. (2007). Bayesian variable selection for high dimensional generalized linear models: Convergence rates of the fitted densities. Ann. Statist. 35 1487-1511. · Zbl 1123.62026 · doi:10.1214/009053607000000019 [12] Lee, K. E., Sha, N., Dougherty, E. R., Vannucci, M. and Mallick, B. K. (2003). Gene selection: A Bayesian variable selection approach. Bioinformatics 19 90-97. [13] Lindley, D. V. (1968). The choice of variables in multiple regression (with discussion). J. Roy. Statist. Assoc. Ser. B 30 31-66. JSTOR: · Zbl 0155.26702 · links.jstor.org [14] Smith, M. and Kohn, R. (1996). Nonparametric regression using Bayesian variable selection. J. Econometrics 75 317-343. · Zbl 0864.62025 · doi:10.1016/0304-4076(95)01763-1 [15] Tanner, M. A. (1996). Tools for Statistical Inference : Methods for the Exploration of Posterior Distributions and Likelihood Functions , 3rd ed. Springer, New York. · Zbl 0846.62001 [16] Tanner, M. A. and Wong, W. H. (1987). The calculation of posterior distributions by data augmentation (with discussion). J. Amer. Statist. Assoc. 82 528-550. JSTOR: · Zbl 0619.62029 · doi:10.2307/2289457 · links.jstor.org [17] Zhang, T. (1999). Theoretical analysis of a class of randomized regularization methods. In COLT 99. Proceedings of the Twelfth Annual Conference on Computational Learning Theory 156-163. ACM Press, New York. [18] Zhang, T. (2006a). From \epsilon -entropy to KL-entropy: Analysis of minimum information complexity density estimation. Ann. Statist. 34 2180-2210. · Zbl 1106.62005 · doi:10.1214/009053606000000704 [19] Zhang, T. (2006b). Information theoretical upper and lower bounds for statistical estimation. IEEE Trans. Inform. Theory 52 1307-1321. · Zbl 1320.94033 · doi:10.1109/TIT.2005.864439 [20] Zhou, X., Liu, K.-Y. and Wong, S. T. C. (2004). Cancer classification and prediction using logistic regression with Bayesian gene selection. J. Biomedical Informatics 37 249-259.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.