Partial identification of latent correlations with binary data. (English) Zbl 1477.62335

Summary: The tetrachoric correlation is a popular measure of association for binary data and estimates the correlation of an underlying normal latent vector. However, when the underlying vector is not normal, the tetrachoric correlation will be different from the underlying correlation. Since assuming underlying normality is often done on pragmatic and not substantial grounds, the estimated tetrachoric correlation may therefore be quite different from the true underlying correlation that is modeled in structural equation modeling. This motivates studying the range of latent correlations that are compatible with given binary data, when the distribution of the latent vector is partly or completely unknown. We show that nothing can be said about the latent correlations unless we know more than what can be derived from the data. We identify an interval constituting all latent correlations compatible with observed data when the marginals of the latent variables are known. Also, we quantify how partial knowledge of the dependence structure of the latent variables affect the range of compatible latent correlations. Implications for tests of underlying normality are briefly discussed.


62P15 Applications of statistics to psychology
62H20 Measures of association (correlation, canonical correlation, etc.)
62H25 Factor analysis and principal components; correspondence analysis
Full Text: DOI


[1] Almeida, C.; Mouchart, M., Testing normality of latent variables in the polychoric correlation, Statistica, 74, 1, 3-25 (2014) · Zbl 1307.62121
[2] Asparouhov, T.; Muthén, B., Structural equation models and mixture models with continuous nonnormal skewed distributions, Structural Equation Modeling, 23, 1, 1-19 (2016)
[3] Asquith, W. H. (2020). copBasic|General bivariate copula theory and many utility functions [Computer software manual]. Retrieved from https://CRAN.R-project.org/package=copBasic
[4] Azzalini, A., The skew-normal and related families (2013), Cambridge: Cambridge University Press, Cambridge · Zbl 1338.62007
[5] Bernard, C.; Jiang, X.; Vanduffel, S., A note on ‘Improved Fréchet bounds and model-free pricing of multi-asset options’ by Tankov (2011), Journal of Applied Probability, 49, 3, 866-875 (2012) · Zbl 1259.60022
[6] Bollen, KA, Structural equations with latent variables (2014), New Jersey: Wiley, New Jersey
[7] Christoffersson, A., Factor analysis of dichotomized variables, Psychometrika, 40, 1, 5-32 (1975) · Zbl 0322.62063
[8] Claeskens, G.; Hjort, NL, Model selection and model averaging (2008), Cambridge: Cambridge University Press, Cambridge · Zbl 1166.62001
[9] Foldnes, N.; Grønneberg, S., On identification and non-normal simulation in ordinal covariance and item response models, Psychometrika, 84, 4, 1000-1017 (2019) · Zbl 1439.62129
[10] Foldnes, N.; Grønneberg, S., Pernicious polychorics: The impact and detection of underlying non-normality, Structural Equation Modeling, 27, 4, 525-543 (2019)
[11] Foldnes, N., & Grønneberg, S. (2020). The sensitivity of structural equation modeling with ordinal data to underlying non-normality and observed distributional forms. Psychological Methods. (Forthcoming).
[12] Fréchet, M. (1958). Remarques de M. Fréchet au sujet de la note précédente. Comptes rendus hebdomadaires des séances de l’Académie des sciences(2), 2719-2720. Retrieved from https://gallica.bnf.fr/ark:/12148/bpt6k723q/f661.image · Zbl 0084.35804
[13] Fréchet, M., Sur les tableaux de corrélation dont les marges sont données, Revue de l’Institut International de Statistique, 28, 1-2, 10-32 (1960) · Zbl 0093.01602
[14] Grønneberg, S.; Foldnes, N., Covariance model simulation using regular vines, Psychometrika, 82, 4, 1035-1051 (2017) · Zbl 1402.62093
[15] Höffding, W. (1940). Maßstabinvariante korrelationstheorie für diskontinuierliche verteilungen (Unpublished doctoral dissertation). Universität Berlin. · JFM 66.0649.02
[16] Joe, H., Multivariate models and multivariate dependence concepts (1997), Boca Raton: CRC Press, Boca Raton · Zbl 0990.62517
[17] Jöreskog, K. G. (1994). Structural equation modeling with ordinal variables. In Multivariate analysis and its applications (pp. 297-310). Institute of Mathematical Statistics. doi:10.1214/lnms/1215463803
[18] Jöreskog, KG; Sörbom, D., LISREL 8: User’s reference guide (1996), Illinois: Scientific Software International, Illinois
[19] Kallenberg, O., Foundations of modern probability (2006), Berlin: Springer Science, Berlin
[20] Kolenikov, S.; Angeles, G., Socioeconomic status measurement with discrete proxy variables: Is principal component analysis a reliable answer?, Review of Income and Wealth, 55, 1, 128-165 (2009)
[21] Lehmann, EL, Some concepts of dependence, The Annals of Mathematical Statistics, 37, 5, 1137-1153 (1966) · Zbl 0146.40601
[22] Manski, CF, Partial identification of probability distributions (2003), Berlin: Springer Science, Berlin · Zbl 1047.62001
[23] Maydeu-Olivares, A., Limited information estimation and testing of discretized multivariate normal structural models, Psychometrika, 71, 1, 57-77 (2006) · Zbl 1306.62476
[24] Molenaar, D.; Dolan, CV; Irwing, P.; Booth, T.; Hughes, DJ, Nonnormality in latent trait modelling, The wiley handbook of psychometric testing, 347-373 (2018), New Jersey: Wiley Online Library, New Jersey
[25] Muthén, B., Contributions to factor analysis of dichotomous variables, Psychometrika, 43, 4, 551-560 (1978) · Zbl 0394.62042
[26] Muthén, B., A general structural equation model with dichotomous, ordered categorical, and continuous latent variable indicators, Psychometrika, 49, 1, 115-132 (1984)
[27] Muthén, B.; Hofacker, C., Testing the assumptions underlying tetrachoric correlations, Psychometrika, 53, 4, 563-577 (1988) · Zbl 0718.62125
[28] Muthén, L. K., & Muthén, B. O. (1998-2017). Mplus user’s guide (8th ed., pp. 204-215). Los Angeles, CA: Muthén & Muthén.
[29] Narasimhan, B., Johnson, S. G., Hahn, T., Bouvier, A., & Kiêu, K. (2020). cubature: Adaptive multivariate integration over hypercubes [Computer software manual]. Retrieved from https://CRAN.R-project.org/package=cubature
[30] Nelsen, RB, An introduction to copulas (2007), Berlin: Springer Science, Berlin
[31] Olsson, U., Maximum likelihood estimation of the polychoric correlation coefficient, Psychometrika, 44, 4, 443-460 (1979) · Zbl 0428.62083
[32] Owen, DB, A table of normal integrals, Communications in Statistics - Simulation and Computation, 9, 4, 389-419 (1980) · Zbl 0462.62089
[33] Pearl, J., Causality (2009), Cambridge: Cambridge University Press, Cambridge
[34] Pearson, K., I. Mathematical contributions to the theory of evolution.|VII. on the correlation of characters not quantitatively measurable. Philosophical Transactions of the Royal Society of London, Series A, 195, 1-47 (1900) · JFM 32.0238.01
[35] Pearson, K., On a new method of determining correlation between a measured character a, and a character b, of which only the percentage of cases wherein B exceeds (or falls short of) a given intensity is recorded for each grade of a, Biometrika, 7, 1-2, 96-105 (1909)
[36] Pearson, K.; Heron, D., On theories of association, Biometrika, 9, 1-2, 159-315 (1913)
[37] Pearson, K.; Pearson, ES, On polychoric coefficients of correlation, Biometrika, 14, 1-2, 127-156 (1922)
[38] R Core Team. (2020). R: A language and environment for statistical computing [Computer software manual].
[39] Rosseel, Y., lavaan: An R package for structural equation modeling, Journal of Statistical Software, 48, 2, 1-36 (2012)
[40] Satorra, A., & Bentler, P. (1988). Scaling corrections for statistics in covariance structure analysis (Tech. Rep.). Retrieved from https://escholarship.org/content/qt3141h70c/qt3141h70c.pdf
[41] Shapiro, A., Asymptotic distribution theory in the analysis of covariance structures, South African Statistical Journal, 17, 1, 33-81 (1983) · Zbl 0517.62025
[42] Sklar, M., Fonctions de répartition à n dimensions et leurs marges, Publ Inst Statist Univ Paris, 8, 229-231 (1959) · Zbl 0100.14202
[43] Takane, Y.; de Leeuw, J., On the relationship between item response theory and factor analysis of discretized variables, Psychometrika, 52, 3, 393-408 (1987) · Zbl 0628.62104
[44] Tamer, E., Partial identification in econometrics, Annual Review of Economics, 2, 1, 167-195 (2010)
[45] Tankov, P., Improved Fréchet bounds and model-free pricing of multi-asset options, Journal of Applied Probability, 48, 2, 389-403 (2011) · Zbl 1219.60016
[46] Tate, RF, Applications of correlation models for biserial data, Journal of the American Statistical Association, 50, 272, 1078-1095 (1955) · Zbl 0066.13103
[47] Tate, RF, The theory of correlation between two continuous variables when one is dichotomized, Biometrika, 42, 1-2, 205-216 (1955) · Zbl 0065.12901
[48] Vaswani, S., Assumptions underlying the use of the tetrachoric correlation coefficient, Sankhyā: The Indian Journal of Statistics, 10, 3, 269-276 (1950) · Zbl 0045.08702
[49] Whitt, W., Bivariate distributions with given marginals, The Annals of Statistics, 4, 6, 1280-1289 (1976) · Zbl 0367.62022
[50] Yan, J., Enjoy the joy of copulas: With a package copula, Journal of Statistical Software, 21, 4, 1-21 (2007)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.