zbMATH — the first resource for mathematics

A general method for robust Bayesian modeling. (English) Zbl 1407.62088
Summary: Robust Bayesian models are appealing alternatives to standard models, providing protection from data that contains outliers or other departures from the model assumptions. Historically, robust models were mostly developed on a case-by-case basis; examples include robust linear regression, robust mixture models, and bursty topic models. In this paper we develop a general approach to robust Bayesian modeling. We show how to turn an existing Bayesian model into a robust model, and then develop a generic computational strategy for it. We use our method to study robust variants of several models, including linear regression, Poisson regression, logistic regression, and probabilistic topic models. We discuss the connections between our methods and existing approaches, especially empirical Bayes and James-Stein estimation.

62F15 Bayesian inference
62F35 Robustness and adaptive procedures (parametric inference)
62J12 Generalized linear models (logistic models)
Full Text: DOI Euclid
[1] Ahn, S., Korattikara, A., and Welling, M. (2012). “Bayesian posterior sampling via stochastic gradient Fisher scoring.” arXiv preprint arXiv:1206.6380.
[2] Airoldi, E., Blei, D., Fienberg, S., and Xing, E. (2007). “Combining Stochastic Block Models and Mixed Membership for Statistical Network Analysis.” In Statistical Network Analysis: Models, Issues and New Directions, Lecture Notes in Computer Science, 57–74. Springer-Verlag.
[3] Airoldi, E., Blei, D., Fienberg, S., and Xing, E. (2009). “Mixed Membership Stochastic Blockmodels.” In Neural Information Processing Systems. · Zbl 1225.68143
[4] Antoniak, C. (1974). “Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems.” The Annals of Statistics, 2(6): 1152–1174. · Zbl 0335.60034
[5] Asuncion, A., Welling, M., Smyth, P., and Teh, Y. (2009). “On Smoothing and Inference for Topic Models.” In Uncertainty in Artificial Intelligence.
[6] Attias, H. (2000). “A variational Bayesian framework for graphical models.” In Advances in Neural Information Processing Systems.
[7] Berger, J. O., Moreno, E., Pericchi, L. R., Bayarri, M. J., Bernardo, J. M., Cano, J. A., De la Horra, J., Martín, J., Ríos-Insúa, D., Betrò, B., et al. (1994). “An overview of robust Bayesian analysis.” Test, 3(1): 5–124.
[8] Bernardo, J. and Smith, A. (1994). Bayesian theory. Chichester: John Wiley & Sons Ltd. · Zbl 0796.62002
[9] Bickel, P. and Doksum, K. (2007). Mathematical Statistics: Basic Ideas and Selected Topics, volume 1. Upper Saddle River, NJ: Pearson Prentice Hall, 2nd edition. · Zbl 0403.62001
[10] Bishop, C. (2006). Pattern Recognition and Machine Learning. Springer New York. · Zbl 1107.68072
[11] Blei, D. (2012). “Probabilistic Topic Models.” Communications of the ACM, 55(4): 77–84.
[12] Blei, D. and Lafferty, J. (2007). “A Correlated Topic Model of Science.” Annals of Applied Statistics, 1(1): 17–35. · Zbl 1129.62122
[13] Blei, D., Ng, A., and Jordan, M. (2003). “Latent Dirichlet Allocation.” Journal of Machine Learning Research, 3: 993–1022. · Zbl 1112.68379
[14] Blei, D. M., Kucukelbir, A., and McAuliffe, J. D. (2017). “Variational Inference: A Review for Statisticians.” Journal of the American Statistical Association, 112(518): 859–877. URL http://dx.doi.org/10.1080/01621459.2017.1285773
[15] Box, G. (1976). “Science and Statistics.” Journal of the American Statistical Association, 71(356): 791–799. · Zbl 0335.62002
[16] Box, G. (1980). “Sampling and Bayes’ Inference in Scientific Modeling and Robustness.” Journal of the Royal Statistical Society, Series A, 143(4): 383–430. · Zbl 0471.62036
[17] Cameron, A. C. and Trivedi, P. K. (2013). Regression analysis of count data, volume 53. Cambridge university press. · Zbl 1301.62003
[18] Carlin, B. and Louis, T. (2000a). Bayes and Empirical Bayes Methods for Data Analysis, 2nd Edition. Chapman & Hall/CRC. · Zbl 1017.62005
[19] Carlin, B. and Louis, T. (2000b). “Empirical Bayes: Past, present and future.” Journal of the American Statistical Association, 95(452): 1286–1289. · Zbl 1072.62511
[20] Copas, J. B. (1969). “Compound Decisions and Empirical Bayes.” Journal of the Royal Statistical Society. Series B (Methodological), 31(3): pp. 397–425. URL http://www.jstor.org/stable/2984345 · Zbl 0186.51905
[21] Corduneanu, A. and Bishop, C. (2001). “Variational Bayesian Model Selection for Mixture Distributions.” In International Conference on Artifical Intelligence and Statistics.
[22] Dempster, A., Laird, N., and Rubin, D. (1977). “Maximum likelihood from incomplete data via the EM algorithm.” Journal of the Royal Statistical Society, Series B, 39: 1–38. · Zbl 0364.62022
[23] Diaconis, P. and Ylvisaker, D. (1979). “Conjugate Priors for Expontial Families.” The Annals of Statistics, 7(2): 269–281. · Zbl 0405.62011
[24] Doyle, G. and Elkan, C. (2009). “Accounting for burstiness in topic models.” In Proceedings of the 26th Annual International Conference on Machine Learning, ICML ’09, 281–288. New York, NY, USA: ACM.
[25] Efron, B. (1996). “Empirical Bayes methods for Combining Likelihoods.” Journal of the American Statistical Association, 91(434): 538–550. · Zbl 0868.62018
[26] Efron, B. (2010). Large-scale inference: empirical Bayes methods for estimation, testing, and prediction, volume 1. Cambridge University Press. · Zbl 1277.62016
[27] Efron, B. and Morris, C. (1973). “Combining Possibly Related Estimation Problems.” Journal of the Royal Statistical Society, Series B, 35(3): 379–421. · Zbl 0281.62030
[28] Efron, B. and Morris, C. (1975). “Data analysis using Stein’s estimator and its generalizations.” Journal of the American Statistical Association, 70(350): 311–319. · Zbl 0319.62018
[29] Erosheva, E., Fienberg, S., and Joutard, C. (2007). “Describing Disability Through Individual-Level Mixture Models for Multivariate Binary Data.” Annals of Applied Statistics. · Zbl 1126.62101
[30] Fei-Fei, L. and Perona, P. (2005). “A Bayesian Hierarchical Model for Learning Natural Scene Categories.” IEEE Computer Vision and Pattern Recognition, 524–531.
[31] Feng, J., Xu, H., Mannor, S., and Yan, S. (2014). “Robust Logistic Regression and Classification.” In Advances in Neural Information Processing Systems, 253–261.
[32] Fernández, C. and Steel, M. F. (1999). “Multivariate Student-t regression models: Pitfalls and inference.” Biometrika, 86(1): 153–167. · Zbl 0917.62020
[33] Fine, S., Singer, Y., and Tishby, N. (1998). “The Hierarchical Hidden Markov Model: Analysis and Applications.” Machine Learning, 32: 41–62. · Zbl 0901.68178
[34] Fox, E., Sudderth, E., Jordan, M., and Willsky, A. (2011). “A Sticky HDP-HMM with Application to Speaker Diarization.” Annals of Applied Statistics, 5(2A): 1020–1056. · Zbl 1232.62077
[35] Geisser, S. and Eddy, W. F. (1979). “A predictive approach to model selection.” Journal of the American Statistical Association, 74(365): 153–160. · Zbl 0401.62036
[36] Gelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. B. (2014). Bayesian data analysis, volume 2. Chapman & Hall/CRC Boca Raton, FL, USA. · Zbl 1279.62004
[37] Gelman, A., Meng, X., and Stern, H. (1996). “Posterior Predictive Assessment of Model Fitness Via Realized Discrepancies.” Statistica Sinica, 6: 733–807. · Zbl 0859.62028
[38] Ghahramani, Z. and Beal, M. J. (2000). “Variational Inference for Bayesian Mixtures of Factor Analysers.” In NIPS.
[39] Hoffman, M., Blei, D., Wang, C., and Paisley, J. (2013). “Stochastic Variational Inference.” Journal of Machine Learning Research, 14(1303–1347). · Zbl 1317.68163
[40] Hoffman, M. D. and Gelman, A. (2014). “The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo.” Journal of Machine Learning Research, 15(Apr): 1593–1623. · Zbl 1319.60150
[41] Huber, P. and Ronchetti, E. (2009). Robust Statistics. Wiley, 2nd edition. · Zbl 1276.62022
[42] Huber, P. J. (1964). “Robust Estimation of a Location Parameter.” The Annals of Mathematical Statistics, 35(1): 73–101. · Zbl 0136.39805
[43] Jordan, M., Ghahramani, Z., Jaakkola, T., and Saul, L. (1999). “Introduction to Variational Methods for Graphical Models.” Machine Learning, 37: 183–233. · Zbl 0945.68164
[44] Jorgensen, B. (1987). “Exponential dispersion models.” Journal of the Royal Statistical Society. Series B (Methodological), 127–162. · Zbl 0662.62078
[45] Kalman, R. (1960). “A New Approach to Linear Filtering and Prediction Problems A New Approach to Linear Filtering and Prediction Problems,”.” Transaction of the AMSE: Journal of Basic Engineering, 82: 35–45.
[46] Kass, R. and Steffey, D. (1989). “Approximate Bayesian inference in conditionally independent hierarchical models (parametric empirical Bayes models).” Journal of the American Statistical Association, 84(407): 717–726.
[47] Lange, K., Little, R., and Taylor, J. (1989). “Robust Statistical Modeling Using the t Distribution.” Journal of the American Statistical Association, 84(408): 881.
[48] Madsen, R. E., Kauchak, D., and Elkan, C. (2005). “Modeling word burstiness using the Dirichlet distribution.” In Proceedings of the 22nd international conference on Machine learning, 545–552. ACM.
[49] Maritz, J. and Lwin, T. (1989). Empirical Bayes methods. Monographs on Statistics and Applied Probability. London: Chapman & Hall. · Zbl 0731.62040
[50] McCullagh, P. and Nelder, J. (1989). Generalized Linear Models. London: Chapman and Hall. · Zbl 0744.62098
[51] McCulloch, C. E. and Neuhaus, J. M. (2001). Generalized linear mixed models. Wiley Online Library.
[52] McLachlan, G. and Peel, D. (2000). Finite mixture models. Wiley-Interscience. · Zbl 0963.62061
[53] Morris, C. (1983). “Parametric empirical Bayes inference: Theory and applications.” Journal of the American Statistical Association, 78(381): 47–65. · Zbl 0506.62005
[54] Murphy, K. (2013). Machine Learning: A Probabilistic Approach. MIT Press.
[55] Paisley, B. and Carin, L. (2009). “Nonparametric Factor Analysis with Beta Process Priors.” In International Conference on Machine Learning. · Zbl 1391.94357
[56] Peel, D. and McLachlan, G. J. (2000). “Robust mixture modelling using the t distribution.” Statistics and Computing, 10(4): 339–348.
[57] Polson, N. G. and Scott, J. G. (2010). “Shrink globally, act locally: Sparse Bayesian regularization and prediction.” Bayesian Statistics, 9: 501–538.
[58] Pregibon, D. (1982). “Resistant fits for some commonly used logistic models with medical applications.” Biometrics, 485–498.
[59] Pritchard, J. K., Stephens, M., and Donnelly, P. (2000). “Inference of population structure using multilocus genotype data.” Genetics, 155(2): 945–959.
[60] Rabe-Hesketh, S. and Skrondal, A. (2008). “Generalized linear mixed-effects models.” Longitudinal Data Analysis, 79–106. · Zbl 1274.62031
[61] Rabiner, L. R. (1989). “A tutorial on hidden Markov models and selected applications in speech recognition.” Proceedings of the IEEE, 77: 257–286.
[62] Ranganath, R., Gerrish, S., and Blei, D. (2014). “Black box variational inference.” In Artificial Intelligence and Statistics.
[63] Robbins, H. (1964). “The empirical Bayes approach to statistical decision problems.” The Annals of Mathematical Statistics, 1–20. · Zbl 0138.12304
[64] Robbins, H. (1980). “An empirical Bayes estimation problem.” Proceedings of the National Academy of Sciences, 77(12): 6988. · Zbl 0456.62029
[65] Rubin, D. (1984). “Bayesianly Justifiable and Relevant Frequency Calculations for the Applied Statistician.” The Annals of Statistics, 12(4): 1151–1172. · Zbl 0555.62010
[66] Salakhutdinov, R. and Mnih, A. (2008). “Probabilistic matrix factorization.” In Neural Information Processing Systems.
[67] She, Y. and Owen, A. (2011). “Outlier detection using nonconvex penalized regression.” Journal of the American Statistical Association, 106(494). · Zbl 1232.62068
[68] Stefanski, L. A., Carroll, R. J., and Ruppert, D. (1986). “Optimally hounded score functions for generalized linear models with applications to logistic regression.” Biometrika, 73(2): 413–424. · Zbl 0616.62043
[69] Svensén, M. and Bishop, C. M. (2005). “Robust Bayesian mixture modelling.” Neurocomput., 64: 235–252.
[70] Teh, Y., Jordan, M., Beal, M., and Blei, D. (2006). “Hierarchical Dirichlet processes.” Journal of the American Statistical Association, 101(476): 1566–1581. · Zbl 1171.62349
[71] Teh, Y. W. (2006). “A Hierarchical Bayesian Language Model based on Pitman-Yor Processes.” In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, 985–992. URL http://www.aclweb.org/anthology/P/P06/P06-1124
[72] Tibshirani, J. and Manning, C. D. (2013). “Robust Logistic Regression using Shift Parameters.” CoRR, abs/1305.4987.
[73] Venables, W. N. and Ripley, B. D. (2002). Modern Applied Statistics with S. New York: Springer, fourth edition. ISBN 0-387-95457-0. URL http://www.stats.ox.ac.uk/pub/MASS4 · Zbl 1006.62003
[74] Wainwright, M. and Jordan, M. (2008). “Graphical models, exponential families, and variational inference.” Foundations and Trends in Machine Learning, 1(1–2): 1–305. · Zbl 1193.62107
[75] Wang, C. and Blei, D. M. (2013). “Variational inference in nonconjugate models.” The Journal of Machine Learning Research, 14(1): 1005–1031. · Zbl 1320.62057
[76] Wang, C., Paisley, J., and Blei, D. (2011). “Online Variational Inference for the Hierarchical Dirichlet Process.” In International Conference on Artificial Intelligence and Statistics.
[77] Welling, M. and Teh, Y. W. (2011). “Bayesian learning via stochastic gradient Langevin dynamics.” In Proceedings of the 28th International Conference on Machine Learning (ICML-11), 681–688.
[78] Wood, F., van de Meent, J. W., and Mansinghka, V. (2014). “A New Approach to Probabilistic Programming Inference.” In Artificial Intelligence and Statistics, 1024–1032.
[79] Xing, E. P., Ho, Q., Dai, W., Kim, J. K., Wei, J., Lee, S., Zheng, X., Xie, P., Kumar, A., and Yu, Y. (2013). “Petuum: A New Platform for Distributed Machine Learning on Big Data.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.