×

Partial distance correlation with methods for dissimilarities. (English) Zbl 1309.62105

Summary: Distance covariance and distance correlation are scalar coefficients that characterize independence of random vectors in arbitrary dimension. Properties, extensions and applications of distance correlation have been discussed in the recent literature, but the problem of defining the partial distance correlation has remained an open question of considerable interest. The problem of partial distance correlation is more complex than partial correlation partly because the squared distance covariance is not an inner product in the usual linear space. For the definition of partial distance correlation, we introduce a new Hilbert space where the squared distance covariance is the inner product. We define the partial distance correlation statistics with the help of this Hilbert space, and develop and implement a test for zero partial distance correlation. Our intermediate results provide an unbiased estimator of squared distance covariance, and a neat solution to the problem of distance correlation for dissimilarities rather than distances.

MSC:

62H20 Measures of association (correlation, canonical correlation, etc.)
62H15 Hypothesis testing in multivariate analysis
62G05 Nonparametric estimation

Software:

ecodist; R; ppcor; vegan; pdcor; energy
PDF BibTeX XML Cite
Full Text: DOI arXiv Euclid

References:

[1] Baba, K., Shibata, R. and Sibuya, M. (2004). Partial correlation and conditional correlation as measures of conditional independence. Aust. N.Z. J. Stat. 46 657-664. · Zbl 1061.62086
[2] Cailliez, F. (1983). The analytical solution of the additive constant problem. Psychometrika 48 305-308. · Zbl 0534.62079
[3] Cox, T. F. and Cox, M. A. A. (2001). Multidimensional Scaling , 2nd ed. Chapman & Hall, London. · Zbl 1004.91067
[4] Dueck, J., Edelmann, D., Gneiting, T. and Richards, D. (2012). The affinely invariant distance correlation. Preprint. Available at . · Zbl 1320.62133
[5] Feuerverger, A. (1993). A consistent test for bivariate dependence. Int. Stat. Rev. 61 419-433. · Zbl 0826.62032
[6] Goslee, S. C. and Urban, D. L. (2007). The ecodist package for dissimilarity-based analysis of ecological data. J. Stat. Softw. 22 1-19.
[7] Gower, J. C. (1966). Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika 53 325-338. · Zbl 0192.26003
[8] Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning , 2nd ed. Springer, New York. · Zbl 1273.62005
[9] Huber, J. (1981). Partial and semipartial correlation-A vector approach. Two-Year Coll. Math. J. 12 151-153.
[10] Josse, J. and Holmes, S. (2013). Measures of dependence between random vectors and tests of independence. Literature review. Available at .
[11] Kim, S. (2012). ppcor: Partial and semi-partial (part) correlation. R package version 1.0. Available at .
[12] Kong, J., Klein, B. E. K., Klein, R., Lee, K. and Wahba, G. (2012). Using distance correlation and SS-ANOVA to assess associations of familial relationships, lifestyle factors, diseases, and mortality. Proc. Natl. Acad. Sci. 109 20352-20357.
[13] Legendre, P. (2000). Comparison of permutation methods for the partial correlation and partial Mantel tests. J. Stat. Comput. Simul. 67 37-73. · Zbl 1146.62355
[14] Legendre, P. and Legendre, L. (2012). Numerical Ecology , 3rd English ed. Elsevier, Amsterdam. · Zbl 0588.92021
[15] Li, R., Zhong, W. and Zhu, L. (2012). Feature screening via distance correlation learning. J. Amer. Statist. Assoc. 107 1129-1139. · Zbl 1443.62184
[16] Lyons, R. (2013). Distance covariance in metric spaces. Ann. Probab. 41 3284-3305. · Zbl 1292.62087
[17] Mantel, N. (1967). The detection of disease clustering and a generalized regression approach. Cancer Res. 27 209-220.
[18] Mardia, K. V. (1978). Some properties of classical multi-dimensional scaling. Comm. Statist. Theory Methods 7 1233-1241. · Zbl 0403.62033
[19] Mardia, K. V., Kent, J. T. and Bibby, J. M. (1979). Multivariate Analysis . Academic Press, London. · Zbl 0432.62029
[20] Oksanen, J., Guillaume Blanchet, F., Kindt, R., Legendre, P., Minchin, P. R., O’Hara, R. B., Simpson, G. L., Solymos, P., Stevens, M. H. H. and Wagner, H. (2013). vegan: Community ecology package. R package version 2.0-7. Available at .
[21] Piepho, H. P. (2005). Permutation tests for the correlation among genetic distances and measures of heterosis. Theor. Appl. Genet. 111 95-99.
[22] R Core Team (2013). R : A Language and Environment for Statistical Computing . R Foundation for Statistical Computing, Vienna, Austria. Available at .
[23] Reif, J. C., Melchinger, A. E., Xia, X. C., Warburton, M. L., Hoisington, D. A., Vasal, S. K., Srinivasan, G., Bohn, M. and Frisch, M. (2003). Genetic distance based on simple sequence repeats and heterosis in tropical maize populations. Crop Sci. 43 1275-1282.
[24] Rizzo, M. L. and Székely, G. J. (2013). pdcor: Partial distance correlation. R package version 1.0.0.
[25] Rizzo, M. L. and Székely, G. J. (2014). energy: E-statistics (energy statistics). R package version 1.6.1. Available at .
[26] Schoenberg, I. J. (1935). Remarks to Maurice Fréchet’s article “Sur la définition axiomatique d’une classe d’espace distanciés vectoriellement applicable sur l’espace de Hilbert.” Ann. of Math. (2) 36 724-732. · Zbl 0012.30703
[27] Sejdinovic, D., Sriperumbudur, B., Gretton, A. and Fukumizu, K. (2013). Equivalence of distance-based and RKHS-based statistics in hypothesis testing. Ann. Statist. 41 2263-2291. · Zbl 1281.62117
[28] Smouse, P. E., Long, J. C. and Sokal, R. R. (1986). Multiple regression and correlation extensions of the Mantel test of matrix correspondence. Syst. Zool. 35 7-632.
[29] Stamey, T. A., Kabalin, J. N., McNeal, J. E., Johnstone, I. M., Freiha, F., Redwine, E. A. and Yang, N. (1989). Prostate specific antigen in the diagnosis and treatment of adenocarcinoma of the prostate. II. Radical prostatectomy treated patients. J. Urol. 141 1076-1083.
[30] Székely, G. J. and Rizzo, M. L. (2009). Brownian distance covariance. Ann. Appl. Stat. 3 1236-1265. · Zbl 1196.62077
[31] Székely, G. J. and Rizzo, M. L. (2012). On the uniqueness of distance covariance. Statist. Probab. Lett. 82 2278-2282.
[32] Székely, G. J. and Rizzo, M. L. (2013). The distance correlation \(t\)-test of independence in high dimension. J. Multivariate Anal. 117 193-213. · Zbl 1277.62128
[33] Székely, G. J. and Rizzo, M. L. (2013). Energy statistics: A class of statistics based on distances. J. Statist. Plann. Inference 143 1249-1272. · Zbl 1278.62072
[34] Székely, G. J., Rizzo, M. L. and Bakirov, N. K. (2007). Measuring and testing dependence by correlation of distances. Ann. Statist. 35 2769-2794. · Zbl 1129.62059
[35] Torgerson, W. S. (1958). Theory and Methods of Scaling . Wiley, New York.
[36] Wermuth, N. and Cox, D. R. (2013). Concepts and a case study for a flexible class of graphical Markov models. In Robustness and Complex Data Structures 331-350. Springer, Heidelberg.
[37] Young, G. and Householder, A. S. (1938). Discussion of a set of points in terms of their mutual distances. Psychometrika 3 19-22. · JFM 64.1302.04
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.