×

Bayes model selection with path sampling: factor models and other examples. (English) Zbl 1332.62089

Summary: We prove a theorem justifying the regularity conditions which are needed for Path Sampling in Factor Models. We then show that the remaining ingredient, namely, MCMC for calculating the integrand at each point in the path, may be seriously flawed, leading to wrong estimates of Bayes factors. We provide a new method of Path Sampling (with Small Change) that works much better than standard Path Sampling in the sense of estimating the Bayes factor better and choosing the correct model more often. When the more complex factor model is true, PS-SC is substantially more accurate. New MCMC diagnostics is provided for these problems in support of our conclusions and recommendations. Some of our ideas for diagnostics and improvement in computation through small changes should apply to other methods of computation of the Bayes factor for model selection.

MSC:

62F15 Bayesian inference
62H25 Factor analysis and principal components; correspondence analysis
62M05 Markov processes: estimation; hidden Markov models

Software:

tsbridge; BayesDA
PDFBibTeX XMLCite
Full Text: DOI arXiv Euclid

References:

[1] Akaike, H. (1987). Factor analysis and AIC. Psychometrika 52 317-332. · Zbl 0627.62067 · doi:10.1007/BF02294359
[2] Anderson, T. W. (1984). An Introduction to Multivariate Statistical Analysis , 2nd ed. Wiley, New York. · Zbl 0651.62041
[3] Andrieu, C., Doucet, A. and Robert, C. P. (2004). Computational advances for and from Bayesian analysis. Statist. Sci. 19 118-127. · Zbl 1062.62043 · doi:10.1214/088342304000000071
[4] Bartholomew, D. J., Steele, F., Moustaki, I. and Gabbrith, J. I. (2002). The Analysis and Interpretation of Multivariate Data for Social Scientists . Chapman & Hall, Boca Raton, FL. · Zbl 1033.62108
[5] Berger, J. O., Ghosh, J. K. and Mukhopadhyay, N. (2003). Approximations and consistency of Bayes factors as model dimension grows. J. Statist. Plann. Inference 112 241-258. · Zbl 1026.62018 · doi:10.1016/S0378-3758(02)00336-1
[6] Bunke, O. and Milhaud, X. (1998). Asymptotic behavior of Bayes estimates under possibly incorrect models. Ann. Statist. 26 617-644. · Zbl 0929.62022 · doi:10.1214/aos/1028144851
[7] Chen, M.-H., Shao, Q.-M. and Ibrahim, J. G. (2000). Monte Carlo Methods in Bayesian Computation . Springer, New York. · Zbl 0949.65005 · doi:10.1007/978-1-4612-1276-8
[8] Chib, S. (1995). Marginal likelihood from the Gibbs output. J. Amer. Statist. Assoc. 90 1313-1321. · Zbl 0868.62027 · doi:10.2307/2291521
[9] Clyde, M. and George, E. I. (2004). Model uncertainty. Statist. Sci. 19 81-94. · Zbl 1062.62044 · doi:10.1214/088342304000000035
[10] DiCiccio, T. J., Kass, R. E., Raftery, A. and Wasserman, L. (1997). Computing Bayes factors by combining simulation and asymptotic approximations. J. Amer. Statist. Assoc. 92 903-915. · Zbl 1050.62520 · doi:10.2307/2965554
[11] Drton, M. (2009). Likelihood ratio tests and singularities. Ann. Statist. 37 979-1012. · Zbl 1196.62020 · doi:10.1214/07-AOS571
[12] Edwards, W., Lindman, H. and Savage, L. J. (1984). Bayesian statistical inference for psychological research. In Robustness of Bayesian Analyses. Stud. Bayesian Econometrics 4 1-62. North-Holland, Amsterdam. · Zbl 0173.22004
[13] Fan, Y., Wu, R., Chen, M.-H., Kuo, L. and Lewis, P. O. (2011). Choosing among partition models in Bayesian phylogenetics. Mol. Biol. Evol. 28 523-532.
[14] Friel, N. and Pettitt, A. N. (2008). Marginal likelihood estimation via power posteriors. J. R. Stat. Soc. Ser. B Stat. Methodol. 70 589-607. · Zbl 05563360 · doi:10.1111/j.1467-9868.2007.00650.x
[15] Gamarnik, D., Shah, D. and Wei, Y. (2010). Belief propagation for min-cost network flow: Convergence & correctness. In Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms 279-292. SIAM, Philadelphia, PA. · Zbl 1288.90116
[16] Gamerman, D. and Lopes, H. F. (2006). Markov Chain Monte Carlo : Stochastic Simulation for Bayesian Inference , 2nd ed. Chapman & Hall/CRC, Boca Raton, FL. · Zbl 1137.62011
[17] Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper). Bayesian Anal. 1 515-533 (electronic). · Zbl 1331.62139 · doi:10.1214/06-BA117A
[18] Gelman, A. and Meng, X.-L. (1998). Simulating normalizing constants: From importance sampling to bridge sampling to path sampling. Statist. Sci. 13 163-185. · Zbl 0966.65004 · doi:10.1214/ss/1028905934
[19] Gelman, A., Carlin, J. B., Stern, H. S. and Rubin, D. B. (2004). Bayesian Data Analysis , 2nd ed. Chapman & Hall/CRC, Boca Raton, FL. · Zbl 1039.62018
[20] Ghosh, J. K., Delampady, M. and Samanta, T. (2006). An Introduction to Bayesian Analysis : Theory and Methods . Springer, New York. · Zbl 1135.62002
[21] Ghosh, J. and Dunson, D. B. (2008). Random Effect and Latent Variable Model Selection. Lecture Notes in Statistics 192 . Springer, New York. · Zbl 1145.62003 · doi:10.1007/978-0-387-76721-5
[22] Ghosh, J. and Dunson, D. B. (2009). Default prior distributions and efficient posterior computation in Bayesian factor analysis. J. Comput. Graph. Statist. 18 306-320. · doi:10.1198/jcgs.2009.07145
[23] Green, P. J. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82 711-732. · Zbl 0861.62023 · doi:10.1093/biomet/82.4.711
[24] Jeffreys, H. (1961). Theory of Probability , 3rd ed. Clarendon Press, Oxford. · Zbl 0116.34904
[25] Lartillot, N. and Philippe, H. (2006). Computing Bayes factors using thermodynamic integration. Syst. Biol. 55 195-207.
[26] Lee, S.-Y. and Song, X.-Y. (2002). Bayesian selection on the number of factors in a factor analysis model. Behaviormetrika 29 23-39. · Zbl 1014.62034 · doi:10.2333/bhmk.29.23
[27] Lefebvre, G., Steele, R., Vandal, A. C., Narayanan, S. and Arnold, D. L. (2009). Path sampling to compute integrated likelihoods: An adaptive approach. J. Comput. Graph. Statist. 18 415-437. · doi:10.1198/jcgs.2009.07019
[28] Liang, F., Paulo, R., Molina, G., Clyde, M. A. and Berger, J. O. (2008). Mixtures of \(g\) priors for Bayesian variable selection. J. Amer. Statist. Assoc. 103 410-423. · Zbl 1335.62026 · doi:10.1198/016214507000001337
[29] Liu, J. S. (2008). Monte Carlo Strategies in Scientific Computing . Springer, New York. · Zbl 1132.65003
[30] Lopes, H. F. and West, M. (2004). Bayesian model assessment in factor analysis. Statist. Sinica 14 41-67. · Zbl 1035.62060
[31] Lynch, S. M. (2007). Introduction to Applied Bayesian Statistics and Estimation for Social Scientists . Springer, New York. · Zbl 1133.62093 · doi:10.1007/978-0-387-71265-9
[32] Meng, X.-L. and Wong, W. H. (1996). Simulating ratios of normalizing constants via a simple identity: A theoretical exploration. Statist. Sinica 6 831-860. · Zbl 0857.62017
[33] Neal, R. M. (2001). Annealed importance sampling. Stat. Comput. 11 125-139. · doi:10.1023/A:1008923215028
[34] Nielsen, F. B. (2004). Variational approach to factor analysis and related models. Master’s thesis, Institute of Informatics and Mathematical Modelling, Technical Univ. Denmark.
[35] Raftery, A. E., Newton, M. A., Satagopan, J. M. and Krivitsky, P. N. (2007). Estimating the integrated likelihood via posterior simulation using the harmonic mean identity. In Bayesian Statistics 8 371-416. Oxford Univ. Press, Oxford. · Zbl 1252.62038
[36] Robert, C. P. and Casella, G. (2004). Monte Carlo Statistical Methods , 2nd ed. Springer, New York. · Zbl 1096.62003
[37] Rue, H., Martino, S. and Chopin, N. (2009). Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. J. R. Stat. Soc. Ser. B Stat. Methodol. 71 319-392. · Zbl 1248.62156 · doi:10.1111/j.1467-9868.2008.00700.x
[38] Shen, G. and Ghosh, J. K. (2011). Developing a new BIC for detecting change-points. J. Statist. Plann. Inference 141 1436-1447. · Zbl 1204.62043 · doi:10.1016/j.jspi.2010.10.017
[39] Song, X.-Y. and Lee, S.-Y. (2006). Model comparison of generalized linear mixed models. Stat. Med. 25 1685-1698. · Zbl 1116.78309 · doi:10.1002/sim.2318
[40] Xie, W., Lewis, P. O., Fan, Y., Kuo, L. and Chen, M.-H. (2011). Improving marginal likelihood estimation for Bayesian phylogenetic model selection. Syst. Biol. 60 150-160.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.