Incorporating marginal prior information in latent class models. (English) Zbl 1357.62130

Summary: We present an approach to incorporating informative prior beliefs about marginal probabilities into Bayesian latent class models for categorical data. The basic idea is to append synthetic observations to the original data such that (i) the empirical distributions of the desired margins match those of the prior beliefs, and (ii) the values of the remaining variables are left missing. The degree of prior uncertainty is controlled by the number of augmented records. Posterior inferences can be obtained via typical MCMC algorithms for latent class models, tailored to deal efficiently with the missing values in the concatenated data. We illustrate the approach using a variety of simulations based on data from the American Community Survey, including an example of how augmented records can be used to fit latent class models to data from stratified samples.


62F15 Bayesian inference
60G57 Random measures
62G05 Nonparametric estimation
62H17 Contingency tables
Full Text: DOI Euclid


[1] Dunson, D. B. and Bhattacharya, A. (2011). “Nonparametric Bayes Regression and Classification Through Mixtures of Product Kernels.” In: Bernardo, J. M., Bayarri, M. J., Berger, J. O., Dawid, A. P., Heckerman, D., Smith, A. F. M., and West, M. (eds.), Bayesian Statistics 9, Proceedings of Ninth Valencia International Conference on Bayesian Statistics . Oxford University Press.
[2] Dunson, D. B. and Xing, C. (2009). “Nonparametric Bayes modeling of multivariate categorical data.” Journal of the American Statistical Association , 104(487): 1042-1051. · Zbl 1388.62151
[3] Gebregziabher, M. and DeSantis, S. M. (2010). “Latent class based multiple imputation approach for missing categorical data.” Journal of Statistical Planning and Inference , 140(11): 3252-3262. · Zbl 1204.62125
[4] Gelman, A. and Rubin, D. B. (1992). “Inference from iterative simulation using multiple sequences.” Statistical Science , 7(4): 457-472. · Zbl 1386.65060
[5] Goodman, L. A. (1974). “Exploratory latent structure analysis using both identifiable and unidentifiable models.” Biometrika , 61(2): 215-231. · Zbl 0281.62057
[6] Greenland, S. (2007). “Prior data for non-normal priors.” Statistics in Medicine , 26: 3578-3590.
[7] Hu, J. (2015). “Dirichlet Process Mixture Models for Nested Categorical Data.” Ph.D. thesis, Department of Statistical Science, Duke University.
[8] Ishwaran, H. and James, L. F. (2001). “Gibbs sampling methods for stick-breaking priors.” Journal of the American Statistical Association , 96(453): pp. 161-173. · Zbl 1014.62006
[9] Jain, S. and Neal, R. M. (2004). “A split-merge Markov chain Monte Carlo procedure for the Dirichlet process mixture model.” Journal of Computational and Graphical Statistics , 13(1): 158-182.
[10] Johndrow, J., Cron, A., and Dunson, D. B. (2014). “Bayesian tensor factorizations for massive web networks.” In: ISBA World Meeting 2014 in Cancun, Mexico .
[11] Kalli, M., Griffin, J. E., and Walker, S. G. (2009). “Slice sampling mixture models.” Statistics and Computing , 21: 93-105. · Zbl 1256.65006
[12] Kamakura, W. A. and Wedel, M. (1997). “Statistical data fusion for cross-tabulation.” Journal of Marketing Research , 34: 485-498.
[13] Kessler, D. C., Hoff, P. D., and Dunson, D. B. (2015). “Marginally specified priors for non-parametric Bayesian estimation.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) , 77(1): 35-58.
[14] Kunihama, T. and Dunson, D. B. (2013). “Bayesian modeling of temporal dependence in large sparse contingency tables.” Journal of the American Statistical Association , 108(504): 1324-1338. · Zbl 1283.62120
[15] Kunihama, T., Herring, A. H., Halpern, C. T., and Dunson, D. B. (2014). “Nonparametric Bayes modeling with sample survey weights.” · Zbl 1384.62031
[16] Manrique-Vallier, D. and Reiter, J. P. (2014a). “Bayesian estimation of discrete multivariate latent structure models with structural zeros.” Journal of Computational and Graphical Statistics , 23: 1061-1079.
[17] - (2014b). “Bayesian multiple imputation for large-scale categorical data with structural zeros.” Survey Methodology , 40: 125-134.
[18] Papaspiliopoulos, O. (2008). “A note on posterior sampling from Dirichlet mixture models.” Technical Report , Centre for Research in Statistical Methodology .
[19] Sethuraman, J. (1994). “A constructive definition of D”irichlet priors. Statistica Sinica , 4: 639-650. · Zbl 0823.62007
[20] Si, Y. and Reiter, J. P. (2013). “Nonparametric Bayesian multiple imputation for incomplete categorical variables in large-scale assessment surveys.” Journal of Educational and Behavioral Statistics , 38(5): 499-521.
[21] Si, Y., Reiter, J. P., and Hillygus, D. S. (2015). “Semi-parametric selection models for potentially non-ignorable attrition in panel studies with refreshment samples.” Political Analysis , 23(1): 92-112.
[22] Vermunt, J. K., Van Ginkel, J. R., Van Der Ark, L. A., and Sijtsma, K. (2008). “Multiple imputation of incomplete categorical data using latent class analysis.” Sociological Methodology , 38(1): 369-397.
[23] Wade, S., Mongelluzzo, S., and Petrone, S. (2011). “An Enriched Conjugate Prior for Bayesian Non-parametric Inference.” Bayesian Analysis , 6: 359- 385. · Zbl 1330.62219
[24] Walker, S. G. (2007). “Sampling the Dirichlet mixture model with slices.” Communications in Statistics - Simulation and Computation , 36(1): 45-54. · Zbl 1113.62058
[25] Zhou, J., Bhattacharya, A., Herring, A. H., and Dunson, D. B. (2014). “Bayesian factorizations of big sparse tensors.” Journal of the American Statistical Association , · Zbl 1373.62282
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.