A semiparametric approach to mixed outcome latent variable models: estimating the association between cognition and regional brain volumes. (English) Zbl 1283.62218

Summary: Multivariate data that combine binary, categorical, count and continuous outcomes are common in the social and health sciences. We propose a semiparametric Bayesian latent variable model for multivariate data of arbitrary type that does not require specification of conditional distributions. Drawing on the extended rank likelihood method of P.D. Hof [ibid 1, No. 1, 265–283 (2007; Zbl 1129.62050)], we develop a semiparametric approach for latent variable modeling with mixed outcomes and propose associated Markov chain Monte Carlo estimation methods. Motivated by cognitive testing data, we focus on bifactor models, a special case of factor analysis. We employ our semiparametric Bayesian latent variable model to investigate the association between cognitive outcomes and MRI-measured regional brain volumes.


62P10 Applications of statistics to biology and medical sciences; meta analysis
62H12 Estimation in multivariate analysis
92C20 Neural biology
62G05 Nonparametric estimation
62F15 Bayesian inference
62H25 Factor analysis and principal components; correspondence analysis
65C40 Numerical analysis or methods applied to Markov chains


Zbl 1129.62050


Full Text: DOI arXiv Euclid


[1] Anderson, T. W. (2003). An Introduction to Multivariate Statistical Analysis , 3rd ed. Wiley, Hoboken, NJ. · Zbl 1039.62044
[2] Bartholomew, D., Knott, M. and Moustaki, I. (2011). Latent Variable Models and Factor Analysis : A Unified Approach , 3rd ed. Wiley, Chichester. · Zbl 1266.62040
[3] Bollen, K. A. (1989). Structural Equations with Latent Variables . Wiley, New York. · Zbl 0731.62159
[4] Cardenas, V. A., Ezekiel, F., Di Sclafani, V., Gomberg, B. and Fein, G. (2001). Reliability of tissue volumes and their spatial distribution for segmented magnetic resonance images. Psychiatry Research : Neuroimaging 106 193-205.
[5] Cattell, R. B. (1966). The scree test for the number of factors. Multivariate Behavioral Research 1 245-276.
[6] Chui, H. C. (2007). Subcortical ischemic vascular dementia. Neurol. Clin. 25 717-740, vi.
[7] Chui, H. C., Zarow, C., Mack, W. J., Ellis, W. G., Zheng, L., Jagust, W. J., Mungas, D., Reed, B. R., Kramer, J. H., DeCarli, C. C. et al. (2006). Cognitive impact of subcortical vascular and Alzheimer’s disease pathology. Annals of Neurology 60 677.
[8] Congdon, P. (2003). Applied Bayesian Modelling . Wiley, Chichester. · Zbl 1023.62026
[9] Congdon, P. (2006). Bayesian Statistical Modelling , 2nd ed. Wiley, Chichester. · Zbl 1193.62034
[10] Dobra, A. and Lenkoski, A. (2011). Copula Gaussian graphical models and their application to modeling functional disability data. Ann. Appl. Stat. 5 969-993. · Zbl 1232.62046 · doi:10.1214/10-AOAS397
[11] Dunn, J. E. (1973). A note on a sufficiency condition for uniqueness of restricted factor matrix. Psychometrika 38 141-143. · Zbl 0249.62061 · doi:10.1007/BF02291181
[12] Dunson, D. B. (2003). Dynamic latent trait models for multidimensional longitudinal data. J. Amer. Statist. Assoc. 98 555-563. · Zbl 1040.62100 · doi:10.1198/016214503000000387
[13] Dunson, D. B. et al. (2006). Efficient Bayesian model averaging in factor analysis. Technical report, Duke Univ., Durham, NC.
[14] Erosheva, E. and Curtis, S. M. (2011). Specification of rotational constraints in Bayesian confirmatory factor analysis. Technical Report No. 589, Univ. Washington, Seattle, WA.
[15] Geweke, J. (1992). Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments. In Bayesian Statistics , 4 ( PeñíScola , 1991) (J. M. Bernardo, J. Berger, A. P. Dawid and J. F. M. Smith, eds.) 169-193. Oxford Univ. Press, New York.
[16] Geweke, J. and Zhou, G. (1996). Measuring the pricing error of the arbitrage pricing theory. Review of Financial Studies 9 557-587.
[17] Ghosh, J. and Dunson, D. B. (2008). Bayesian model selection in factor analytic models. In Random effect and latent variable model selection 151-163. Springer, New York.
[18] Ghosh, J. and Dunson, D. B. (2009). Default prior distributions and efficient posterior computation in Bayesian factor analysis. J. Comput. Graph. Statist. 18 306-320. · doi:10.1198/jcgs.2009.07145
[19] Gruhl, J., Erosheva, E. and Crane, P. (2010). Analyzing cognitive testing data with extensions of item response theory models. Presented at the Joint Statistical Meetings, Vancouver, Canada, August 3, 2010.
[20] Gruhl, J., Erosheva, E. and Crane, P. (2011). A semiparametric Bayesian latent trait model for multivariate mixed type data. In International Meeting of the Pyschometric Society .
[21] Guttman, L. (1954). Some necessary conditions for common-factor analysis. Psychometrika 19 149-161. · Zbl 0058.13004 · doi:10.1007/BF02289162
[22] Hachinski, V., Iadecola, C., Petersen, R. C., Breteler, M. M., Nyenhuis, D. L., Black, S. E., Powers, W. J., DeCarli, C., Merino, J. G., Kalaria, R. N. et al. (2006). National institute of neurological disorders and stroke-Canadian stroke network vascular cognitive impairment harmonization standards. Stroke 37 2220-2241.
[23] Hoff, P. D. (2007). Extending the rank likelihood for semiparametric copula estimation. Ann. Appl. Stat. 1 265-283. · Zbl 1129.62050 · doi:10.1214/07-AOAS107
[24] Hoff, P. D. (2009). A First Course in Bayesian Statistical Methods . Springer, New York. · Zbl 1213.62044
[25] Holzinger, K. J. and Swineford, F. (1937). The bi-factor method. Psychometrika 2 41-54.
[26] Jennrich, R. I. (1978). Rotational equivalence of factor loading matrices with specified values. Psychometrika 43 421-426. · Zbl 0383.62031 · doi:10.1007/BF02293650
[27] Jennrich, R. I. and Bentler, P. M. (2011). Exploratory bi-factor analysis. Psychometrika 76 537-549. · Zbl 1284.62715 · doi:10.1007/s11336-011-9218-4
[28] Jöreskog, K. G. (1969). A general approach to confirmatory maximum likelihood factor analysis. Psychometrika 34 183-202.
[29] Klüppelberg, C. and Kuhn, G. (2009). Copula structure analysis. J. R. Stat. Soc. Ser. B Stat. Methodol. 71 737-753. · Zbl 1250.62031 · doi:10.1111/j.1467-9868.2009.00707.x
[30] Knowles, D. and Ghahramani, Z. (2011). Nonparametric Bayesian sparse factor models with application to gene expression modeling. Ann. Appl. Stat. 5 1534-1552. · Zbl 1223.62013 · doi:10.1214/10-AOAS435
[31] Kuczynski, B., Targan, E., Madison, C., Weiner, M., Zhang, Y., Reed, B., Chui, H. C. and Jagust, W. (2010). White matter integrity and cortical metabolic associations in aging and dementia. Alzheimer’s and Dementia 6 54-62.
[32] Liu, C., Rubin, D. B. and Wu, Y. N. (1998). Parameter expansion to accelerate EM: The PX-EM algorithm. Biometrika 85 755-770. · Zbl 0921.62071 · doi:10.1093/biomet/85.4.755
[33] Liu, J. S. and Wu, Y. N. (1999). Parameter expansion for data augmentation. J. Amer. Statist. Assoc. 94 1264-1274. · Zbl 1069.62514 · doi:10.2307/2669940
[34] Loken, E. (2005). Identification constraints and inference in factor models. Struct. Equ. Model. 12 232-244. · doi:10.1207/s15328007sem1202_3
[35] Lopes, H. F. and West, M. (2004). Bayesian model assessment in factor analysis. Statist. Sinica 14 41-67. · Zbl 1035.62060
[36] Millsap, R. E. (2001). When trivial constraints are not trivial: The choice of uniqueness constraints in confirmatory factor analysis. Struct. Equ. Model. 8 1-17.
[37] Morris, J. C. (1993). The Clinical Dementia Rating (CDR): Current version and scoring rules. Neurology 43 2412-2414.
[38] Morris, J. C. (1997). Clinical dementia rating: A reliable and valid diagnostic and staging measure for dementia of the Alzheimer type. Int. Psychogeriatr. 9 Suppl 1 173-176; discussion 177-178.
[39] Moustaki, I. and Knott, M. (2000). Generalized latent trait models. Psychometrika 65 391-411. · Zbl 1291.62236 · doi:10.1007/BF02296153
[40] Mungas, D., Harvey, D., Reed, B. R., Jagust, W. J., DeCarli, C., Beckett, L., Mack, W. J., Kramer, J. H., Weiner, M. W., Schuff, N. et al. (2005). Longitudinal volumetric MRI change and rate of cognitive decline. Neurology 65 565-571.
[41] Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Appl. Psychol. Meas. 16 159.
[42] Murray, J. S., Dunson, D. B., Carin, L. and Lucas, J. E. (2013). Bayesian Gaussian copula factor models for mixed data. J. Amer. Statist. Assoc. 108 656-665. · Zbl 06195968
[43] Pettitt, A. N. (1982). Inference for the linear model using a likelihood based on ranks. J. R. Stat. Soc. Ser. B Stat. Methodol. 44 234-243. · Zbl 0493.62044
[44] Raftery, A. E. and Lewis, S. M. (1995). The number of iterations, convergence diagnostics and generic Metropolis algorithms. In Practical Markov Chain Monte Carlo (W. R. Gilks, D. J. Spiegelhalter and S. Richardson, eds.). Chapman & Hall, London, UK.
[45] Rai, P. and Daumé III, H. (2009). The infinite hierarchical factor regression model. Available at . 0908.0570
[46] Reise, S. P., Morizot, J. and Hays, R. D. (2007). The role of the bifactor model in resolving dimensionality issues in health outcomes measures. Qual. Life Res. 16 Suppl 1 19-31.
[47] Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph Supplement 34 1-100.
[48] Sammel, M. D., Ryan, L. M. and Legler, J. M. (1997). Latent variable models for mixed discrete and continuous outcomes. J. R. Stat. Soc. Ser. B Stat. Methodol. 59 667-678. · Zbl 0889.62043
[49] Shi, J. Q. and Lee, S. Y. (1998). Bayesian sampling-based approach for factor analysis models with continuous and polytomous data. British J. Math. Statist. Psych. 51 233-252.
[50] Skrondal, A. and Rabe-Hesketh, S. (2004). Generalized Latent Variable Modeling : Multilevel , Longitudinal , and Structural Equation Models . Chapman & Hall/CRC, Boca Raton, FL. · Zbl 1097.62001
[51] Stephens, M. (2000). Dealing with label switching in mixture models. J. R. Stat. Soc. Ser. B Stat. Methodol. 62 795-809. · Zbl 0957.62020 · doi:10.1111/1467-9868.00265
[52] van der Linden, W. J. and Hambleton, R. K., eds. (1997). Handbook of Modern Item Response Theory . Springer, New York. · Zbl 0872.62099
[53] West, M. (1987). On scale mixtures of normal distributions. Biometrika 74 646-648. · Zbl 0648.62015 · doi:10.1093/biomet/74.3.646
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.