On the use of bootstrap with variational inference: theory, interpretation, and a two-sample test example. (English) Zbl 1405.62165

Summary: Variational inference is a general approach for approximating complex density functions, such as those arising in latent variable models, popular in machine learning. It has been applied to approximate the maximum likelihood estimator and to carry out Bayesian inference, however, quantification of uncertainty with variational inference remains challenging from both theoretical and practical perspectives. This paper is concerned with developing uncertainty measures for variational inference by using bootstrap procedures. We first develop two general bootstrap approaches for assessing the uncertainty of a variational estimate and the study the underlying bootstrap theory in both fixed- and increasing-dimension settings. We then use the bootstrap approach and our theoretical results in the context of mixed membership modeling with multivariate binary data on functional disability from the National Long Term Care Survey. We carry out a two-sample approach to test for changes in the repeated measures of functional disability for the subset of individuals present in 1989 and 1994 waves.


62P10 Applications of statistics to biology and medical sciences; meta analysis
62F40 Bootstrap, jackknife and other resampling methods
62H12 Estimation in multivariate analysis
62F15 Bayesian inference


Full Text: DOI arXiv Euclid


[1] Airoldi, E., Blei, D., Xing, E. and Fienberg, S. (2005). A latent mixed membership model for relational data. In Proceedings of the 3rd International Workshop on Link Discovery 82–89. ACM, New York.
[2] Airoldi, E. M., Blei, D. M., Fienberg, S. E. and Xing, E. P. (2008). Mixed membership stochastic blockmodels. J. Mach. Learn. Res.9 1981–2014. · Zbl 1225.68143
[3] Airoldi, E. M., Blei, D. M., Erosheva, E. A. and Fienberg, S. E. (2015). Introduction to mixed membership models and methods. In Handbook of Mixed Membership Models and Their Applications. Chapman & Hall/CRC Handb. Mod. Stat. Methods 3–13. CRC Press, Boca Raton, FL. · Zbl 1369.62003
[4] Andrews, D. W. K. (2000). Inconsistency of the bootstrap when a parameter is on the boundary of the parameter space. Econometrica68 399–405. · Zbl 1015.62044 · doi:10.1111/1468-0262.00114
[5] Babu, G. J. and Singh, K. (1983). Inference on means using the bootstrap. Ann. Statist.11 999–1003. · Zbl 0539.62043 · doi:10.1214/aos/1176346267
[6] Berry, A. C. (1941). The accuracy of the Gaussian approximation to the sum of independent variates. Trans. Amer. Math. Soc.49 122–136. · doi:10.1090/S0002-9947-1941-0003498-3
[7] Bickel, P., Choi, D., Chang, X. and Zhang, H. (2013). Asymptotic normality of maximum likelihood and its variational approximation for stochastic blockmodels. Ann. Statist.41 1922–1943. · Zbl 1292.62042 · doi:10.1214/13-AOS1124
[8] Blei, D. M. and Jordan, M. I. (2006). Variational inference for Dirichlet process mixtures. Bayesian Anal.1 121–143. · Zbl 1331.62259 · doi:10.1214/06-BA104
[9] Blei, D. M., Kucukelbir, A. and McAuliffe, J. D. (2017). Variational inference: A review for statisticians. J. Amer. Statist. Assoc.112 859–877.
[10] Blei, D. M., Ng, A. Y. and Jordan, M. I. (2003). Latent Dirichlet allocation. J. Mach. Learn. Res.3 993–1022. · Zbl 1112.68379
[11] Box, G. E. P. (1976). Science and statistics. J. Amer. Statist. Assoc.71 791–799. · Zbl 0335.62002 · doi:10.1080/01621459.1976.10480949
[12] Celisse, A., Daudin, J.-J. and Pierre, L. (2012). Consistency of maximum-likelihood and variational estimators in the stochastic block model. Electron. J. Stat.6 1847–1899. · Zbl 1295.62028 · doi:10.1214/12-EJS729
[13] Chernozhukov, V., Chetverikov, D. and Kato, K. (2013). Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors. Ann. Statist.41 2786–2819. · Zbl 1292.62030 · doi:10.1214/13-AOS1161
[14] Damianou, A., Titsias, M. K. and Lawrence, N. D. (2011). Variational Gaussian process dynamical systems. In Advances in Neural Information Processing Systems 2510–2518.
[15] Damianou, A. C., Titsias, M. K. and Lawrence, N. D. (2016). Variational inference for latent variables and uncertain inputs in Gaussian processes. J. Mach. Learn. Res.17 Paper No. 42. · Zbl 1360.62373
[16] Douglas, J. (1997). Joint consistency of nonparametric item characteristic curve and ability estimation. Psychometrika62 7–28. · Zbl 1003.62546 · doi:10.1007/BF02294778
[17] Efron, B. (1982a). The Jackknife, the Bootstrap and Other Resampling Plans. CBMS-NSF Regional Conference Series in Applied Mathematics38. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA. · Zbl 0496.62036
[18] Efron, B. (1982b). Bootstrap methods: Another look at the jackknife. In Breakthroughs in Statistics 569–593.
[19] Erosheva, E. A., Fienberg, S. E. and Joutard, C. (2007). Describing disability through individual-level mixture models for multivariate binary data. Ann. Appl. Stat.1 502–537. · Zbl 1126.62101 · doi:10.1214/07-AOAS126
[20] Esseen, C.-G. (1942). On the Liapounoff limit of error in the theory of probability. Ark. Mat. Astron. Fys.28A 19.
[21] Fan, J. and Zhou, W.-X. (2016). Guarding against spurious discoveries in high dimensions. J. Mach. Learn. Res.17 Paper No. 203. · Zbl 1436.62349
[22] Ghahramani, Z. and Beal, M. J. (2000). Variational inference for Bayesian mixtures of factor analysers. In Advances in Neural Information Processing Systems 449–455.
[23] Haberman, S. J. (1977). Maximum likelihood estimates in exponential response models. Ann. Statist.5 815–841. · Zbl 0368.62019 · doi:10.1214/aos/1176343941
[24] Hall, P. (1992). The Bootstrap and Edgeworth Expansion. Springer, New York. · Zbl 0829.62021
[25] Hall, P., Ormerod, J. T. and Wand, M. P. (2011). Theory of Gaussian variational approximation for a Poisson mixed model. Statist. Sinica21 369–389. · Zbl 1206.62035
[26] Hall, P., Pham, T., Wand, M. P. and Wang, S. S. J. (2011). Asymptotic normality and valid inference for Gaussian variational approximation. Ann. Statist.39 2502–2532. · Zbl 1231.62029 · doi:10.1214/11-AOS908
[27] Horowitz, J. L. (1997). Bootstrap methods in econometrics: Theory and numerical performance. Econom. Soc. Monogr.28 188–222.
[28] Jordan, M. I., Ghahramani, Z., Jaakkola, T. S. and Saul, L. K. (1999). An introduction to variational methods for graphical models. Mach. Learn.37 183–233. · Zbl 0945.68164 · doi:10.1023/A:1007665907178
[29] Khan, M. E., Bouchard, G., Murphy, K. P. and Marlin, B. M. (2010). Variational bounds for mixed-data factor analysis. In Advances in Neural Information Processing Systems 1108–1116.
[30] Klami, A., Virtanen, S., Leppäaho, E. and Kaski, S. (2015). Group factor analysis. IEEE Trans. Neural Netw. Learn. Syst.26 2136–2147.
[31] Latouche, P., Birmelé, E. and Ambroise, C. (2012). Variational Bayesian inference and complexity control for stochastic block models. Stat. Model.12 93–115. · Zbl 1420.62114
[32] Leeb, H. and Pötscher, B. M. (2005). Model selection and inference: Facts and fiction. Econometric Theory21 21–59. · Zbl 1085.62004 · doi:10.1017/S0266466605050036
[33] Mammen, E. (1989). Asymptotics with increasing dimension for robust regression with applications to the bootstrap. Ann. Statist.17 382–400. · Zbl 0674.62017 · doi:10.1214/aos/1176347023
[34] Mammen, E. (1993). Bootstrap and wild bootstrap for high-dimensional linear models. Ann. Statist.21 255–285. · Zbl 0771.62032 · doi:10.1214/aos/1176349025
[35] Manton, K. G., Corder, L. S. and Stallard, E. (1993). Estimates of change in chronic disability and institutional incidence and prevalence rates in the us elderly population from the 1982, 1984, and 1989 national long term care survey. J. Gerontol.48 S153–S166.
[36] Neyman, J. and Scott, E. L. (1948). Consistent estimates based on partially consistent observations. Econometrica16 1–32. · Zbl 0034.07602 · doi:10.2307/1914288
[37] O’Hagan, A., Murphy, T. B. and Gormley, I. C. (2015). On estimation of parameter uncertainty in model-based clustering. Preprint. Available at arXiv:1510.00551.
[38] Portnoy, S. (1984). Asymptotic behavior of \(M\)-estimators of \(p\) regression parameters when \(p^{2}/n\) is large. I. Consistency. Ann. Statist.12 1298–1309. · Zbl 0584.62050 · doi:10.1214/aos/1176346793
[39] Portnoy, S. (1985). Asymptotic behavior of \(M\) estimators of \(p\) regression parameters when \(p^{2}/n\) is large. II. Normal approximation. Ann. Statist.13 1403–1417. · Zbl 0601.62026 · doi:10.1214/aos/1176349744
[40] Portnoy, S. (1988). Asymptotic behavior of likelihood methods for exponential families when the number of parameters tends to infinity. Ann. Statist.16 356–366. · Zbl 0637.62026 · doi:10.1214/aos/1176350710
[41] Pritchard, J. K., Stephens, M. and Donnelly, P. (2000). Inference of population structure using multilocus genotype data. Genetics155 945–959.
[42] Redner, R. A. and Walker, H. F. (1984). Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev.26 195–239. · Zbl 0536.62021 · doi:10.1137/1026034
[43] Singh, K. (1981). On the asymptotic accuracy of Efron’s bootstrap. Ann. Statist.9 1187–1195. · Zbl 0494.62048 · doi:10.1214/aos/1176345636
[44] Tropp, J. A. (2012). User-friendly tail bounds for sums of random matrices. Found. Comput. Math.12 389–434. · Zbl 1259.60008 · doi:10.1007/s10208-011-9099-z
[45] van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics3. Cambridge Univ. Press, Cambridge. · Zbl 0910.62001
[46] Wainwright, M. J. and Jordan, M. I. (2008). Graphical models, exponential families, and variational inference. Found. Trends Mach. Learn.1 1–305. · Zbl 1193.62107 · doi:10.1561/2200000001
[47] Wang, Y. and Blei, D. M. (2017). Frequentist consistency of variational Bayes. Preprint. Available at arXiv:1705.03439.
[48] Wang, Y. S., Matsueda, R. L. and Erosheva, E. A. (2017). A variational EM method for mixed membership models with multivariate rank data: An analysis of public policy preferences. Ann. Appl. Stat.11 1452–1480. · Zbl 1379.62099 · doi:10.1214/17-AOAS1034
[49] Wasserman, L. (2006). All of Nonparametric Statistics. Springer, New York. · Zbl 1099.62029
[50] Wasserman, L., Kolar, M. and Rinaldo, A. (2013). Estimating undirected graphs under weak assumptions. Preprint. Available at arXiv:1309.6933.
[51] Westling, T. and McCormick, T. H. (2015). Establishing consistency and improving uncertainty estimates of variational inference through m-estimation. Preprint. Available at arXiv:1510.08151.
[52] Woodbury, M. A., Clive, J. and Garson, A. (1978). Mathematical typology: A grade of membership technique for obtaining disease definition. Comput. Biomed. Res.11 277–298.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.