×

Simulating conditionally specified models. (English) Zbl 1490.62018

Summary: Expert systems routinely use conditional reasoning. Conditionally specified statistical models offer several advantages over joint models; one is that Gibbs sampling can be used to generate realizations of the model. As a result, full conditional specification for multiple imputation is gaining popularity because it is flexible and computationally straightforward. However, it would be restrictive to require that every regression/classification must involve all of the variables. Feature selection often removes some variables from the set of predictors, thus making the regression local. A mixture of full and local conditionals is referred to as a partially collapsed Gibbs sampler, which often achieves faster convergence due to reduced conditioning. However, its implementation requires choosing a correct scan order. Using an invalid scan order will bring about an incorrect transition kernel, which leads to the wrong stationary distribution. We prove a necessary and sufficient condition for Gibbs sampling to correctly sample the joint distribution. We propose an algorithm that identifies all of the valid scan orders for a given conditional model. A forward search algorithm is discussed. Checking compatibility among conditionals of different localities is also discussed.

MSC:

62-08 Computational methods for problems pertaining to statistics
60J22 Computational methods in Markov chains
62H30 Classification and discrimination; cluster analysis (statistical aspects)
65C05 Monte Carlo methods
68T05 Learning and adaptive systems in artificial intelligence

Software:

MICE
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Besag, J. E., Spatial interaction and the statistical analysis of lattice systems (with discussion), J. R. Stat. Soc. Ser. B Stat. Methodol., 36, 192-236, (1974) · Zbl 0327.60067
[2] Brand, J. P.L., Development, implementation and evaluation of multiple imputation strategies for the statistical analysis of incomplete data sets, dissertation, (1999), Erasmus University Rotterdam, The Netherlands
[3] Cowles, M. K.; Carlin, B. P., Markov chain Monte Carlo convergence diagnostics: A comparative review, J. Amer. Statist. Assoc., 91, 883-904, (1996) · Zbl 0869.62066
[4] Gelman, A., Parameterization and Bayesian modeling, J. Amer. Statist. Assoc., 99, 537-545, (2004) · Zbl 1117.62343
[5] Gelman, A.; Speed, T. P., Characterizing a joint probability distribution by conditionals, J. R. Stat. Soc. Ser. B Stat. Methodol., 55, 185-188, (1993) · Zbl 0780.62013
[6] Gelman, A.; Speed, T. P., Corrigendum: characterizing a joint probability distribution by conditionals, J. R. Stat. Soc. Ser. B Stat. Methodol., 61, 483, (1999)
[7] Heckerman, D.; Chickering, D. M.; Meek, C.; Rounthwaite, R.; Kadie, C., Dependency networks for inference, collaborative filtering, and data visualization, J. Mach. Learn. Res., 1, 49-75, (2000) · Zbl 1008.68132
[8] A.B. Kennickell, Imputation of the 1989 survey of consumer finances: Stochastic relaxation and multiple imputation, in: ASA 1991 Proceedings of the Section on Survey Research Methods, 1991, pp. 1-10.; A.B. Kennickell, Imputation of the 1989 survey of consumer finances: Stochastic relaxation and multiple imputation, in: ASA 1991 Proceedings of the Section on Survey Research Methods, 1991, pp. 1-10.
[9] P. McCullagh, J.A. Nelder, Generalized Linear Models, second ed., Chapman & Hall, London.; P. McCullagh, J.A. Nelder, Generalized Linear Models, second ed., Chapman & Hall, London. · Zbl 0588.62104
[10] Park, T.; Min, S., Partially collapsed Gibbs sampling for linear mixed-effects models, Comm. Statist. Simulation Comput., 45, 165-180, (2016) · Zbl 1384.62276
[11] Park, T.; van Dyk, D. A., Partially collapsed Gibbs samplers: illustrations and applications, J. Comput. Graph. Statist., 18, 283-305, (2009)
[12] Raghunathan, T. E.; Lepkowski, J. M.; van Hoewyk, J.; Solenberger, P., Multivariate technique for multiply imputing missing values using a sequence of regression models, Wiley Ser. Surv. Methodol., 27, 85-95, (2001)
[13] Spiegelhalter, D.; Dawid, A. P.; Lauritzen, S.; Cowell, R., Bayesian analysis in expert systems, Statist. Sci., 8, 219-282, (1993)
[14] van Buuren, S.; Boshuizen, H. C.; Knook, D. L., Multiple imputation of missing blood pressure covariates in survival analysis, Stat. Med., 18, 681-694, (1999)
[15] S. van Buuren, C.G.M. Oudshoorn, Multivariate Imputation by Chained Equations: MICE v1.0 User’s Manual, TNO Report PG/VGZ/00.038, TNO Preventie en Gezondheid, Leiden, The Netherlands, 2000.; S. van Buuren, C.G.M. Oudshoorn, Multivariate Imputation by Chained Equations: MICE v1.0 User’s Manual, TNO Report PG/VGZ/00.038, TNO Preventie en Gezondheid, Leiden, The Netherlands, 2000.
[16] van Dyk, D. A.; Jiao, X., Metropolis-Hastings within partially collapsed Gibbs samplers, J. Comput. Graph. Statist., 24, 301-327, (2015)
[17] van Dyk, D. A.; Park, T., Partially collapsed Gibbs samplers: theory and methods, J. Amer. Statist. Assoc., 103, 790-796, (2008) · Zbl 1471.62198
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.