×

On quantifying dependence: a framework for developing interpretable measures. (English) Zbl 1332.62189

Summary: We present a framework for selecting and developing measures of dependence when the goal is the quantification of a relationship between two variables, not simply the establishment of its existence. Much of the literature on dependence measures is focused, at least implicitly, on detection or revolves around the inclusion/exclusion of particular axioms and discussing which measures satisfy said axioms. In contrast, we start with only a few nonrestrictive guidelines focused on existence, range and interpretability, which provide a very open and flexible framework. For quantification, the most crucial is the notion of interpretability, whose foundation can be found in the work of L. A. Goodman and W. H. Kruskal [Measures of association for cross classifications. New York, Heidelberg, Berlin: Springer-Verlag (1979; Zbl 0426.62034)], and whose importance can be seen in the popularity of tools such as the \(R^{2}\) in linear regression. While Goodman and Kruskal focused on probabilistic interpretations for their measures, we demonstrate how more general measures of information can be used to achieve the same goal. To that end, we present a strategy for building dependence measures that is designed to allow practitioners to tailor measures to their needs. We demonstrate how many well-known measures fit in with our framework and conclude the paper by presenting two real data examples. Our first example explores U.S. income and education where we demonstrate how this methodology can help guide the selection and development of a dependence measure. Our second example examines measures of dependence for functional data, and illustrates them using data on geomagnetic storms.

MSC:

62H20 Measures of association (correlation, canonical correlation, etc.)
62P25 Applications of statistics to social sciences
62P35 Applications of statistics to physics
62H30 Classification and discrimination; cluster analysis (statistical aspects)

Citations:

Zbl 0426.62034

Software:

fda (R)
PDFBibTeX XMLCite
Full Text: DOI arXiv Euclid

References:

[1] Ash, R. B. (1990). Information Theory . Dover, New York. · Zbl 0768.94005
[2] Bell, C. B. (1962). Mutual information and maximal correlation as measures of dependence. Ann. Math. Statist. 33 587-595. · Zbl 0212.51001 · doi:10.1214/aoms/1177704583
[3] Boyd, S. and Vandenberghe, L. (2004). Convex Optimization . Cambridge Univ. Press, Cambridge. · Zbl 1058.90049
[4] Cover, T. M. and Thomas, J. A. (2006). Elements of Information Theory , 2nd ed. Wiley, Hoboken, NJ. · Zbl 1140.94001 · doi:10.1002/047174882X
[5] Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B Stat. Methodol. 39 1-38. · Zbl 0364.62022
[6] Doksum, K. and Samarov, A. (1995). Nonparametric estimation of global functionals and a measure of the explanatory power of covariates in regression. Ann. Statist. 23 1443-1473. · Zbl 0843.62045 · doi:10.1214/aos/1176324307
[7] Ebrahimi, N., Soofi, E. S. and Soyer, R. (2010). Information measures in perspective. International Statistical Review 78 383-412.
[8] Efron, B. (1978). Regression and ANOVA with zero-one data: Measures of residual variation. J. Amer. Statist. Assoc. 73 113-121. · Zbl 0381.62058 · doi:10.2307/2286531
[9] Gelfand, I. M. and Fomin, S. V. (1963). Calculus of Variations . Prentice Hall International, Englewood Cliffs, NJ. · Zbl 0127.05402
[10] Goodman, L. A. and Kruskal, W. H. (1979). Measures of Association for Cross Classifications. Springer Series in Statistics 1 . Springer, New York. · Zbl 0426.62034
[11] Grey, R. M. (2011). Entropy and Information Theory . Springer, New York.
[12] Hall, W. J. (1970). On characterizing dependence in joint distributions. In Essays in Probability and Statistics 339-376. Univ. North Carolina Press, Chapel Hill, NC. · Zbl 0265.62013
[13] Horváth, L., Kokoszka, P. and Reimherr, M. (2009). Two sample inference in functional linear models. Canad. J. Statist. 37 571-591. · Zbl 1191.62088 · doi:10.1002/cjs.10035
[14] Lehmann, E. L. (1966). Some concepts of dependence. Ann. Math. Statist. 37 1137-1153. · Zbl 0146.40601 · doi:10.1214/aoms/1177699260
[15] Liang, K.-Y., Zeger, S. L. and Qaqish, B. (1992). Multivariate regression analyses for categorical data. J. R. Stat. Soc. Ser. B Stat. Methodol. 54 3-40. · Zbl 0775.62172
[16] Lipsitz, S. R., Laird, N. M. and Harrington, D. P. (1991). Generalized estimating equations for correlated binary data: Using the odds ratio as a measure of association. Biometrika 78 153-160. · doi:10.1093/biomet/78.1.153
[17] McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models . Chapman & Hall, Boca Raton, FL. · Zbl 0744.62098
[18] Moskowitz, C. (2011). U.S. must take space storm threat seriously, experts warn. Available at .
[19] Nelsen, R. B. (2010). An Introduction to Copulas . Springer, New York. · Zbl 1152.62030
[20] Nicolae, D. L. (2006). Quantifying the amount of missing information in genetic association studies. Genet. Epidemiol. 30 703-717.
[21] Nicolae, D. L., Meng, X.-L. and Kong, A. (2008). Quantifying the fraction of missing information for hypothesis testing in statistical and genetic studies. Statist. Sci. 23 287-312. · Zbl 1329.62092 · doi:10.1214/07-STS244
[22] Ramsay, J. O. and Silverman, B. W. (2005). Functional Data Analysis , 2nd ed. Springer, New York. · Zbl 1079.62006 · doi:10.1007/b98888
[23] Reimherr, M. and Nicolae, D. L. (2011). You’ve gotta be lucky: Coverage and the elusive gene-gene interaction. Ann. Hum. Genet. 75 105-111.
[24] Rényi, A. (1959). On measures of dependence. Acta Math. Acad. Sci. Hungar. 10 441-451 (unbound insert). · Zbl 0091.14403 · doi:10.1007/BF02024507
[25] Reshef, D. N., Reshef, Y. A., Finucane, H. K., Grossman, S. R., McVean, G., Turnbaugh, P. J., Lander, E. S., Mitzenmacher, M. and Sabeti, P. C. (2011). Detecting novel associations in large data sets. Science 334 1518-1524. · Zbl 1359.62216
[26] Schweizer, B. and Wolff, E. F. (1981). On nonparametric measures of dependence for random variables. Ann. Statist. 9 879-885. · Zbl 0468.62012 · doi:10.1214/aos/1176345528
[27] Siburg, K. F. and Stoimenov, P. A. (2010). A measure of mutual complete dependence. Metrika 71 239-251. · Zbl 1182.62124 · doi:10.1007/s00184-008-0229-9
[28] Székely, G. J., Rizzo, M. L. and Bakirov, N. K. (2007). Measuring and testing dependence by correlation of distances. Ann. Statist. 35 2769-2794. · Zbl 1129.62059 · doi:10.1214/009053607000000505
[29] Székely, G. J. and Rizzo, M. L. (2009). Brownian distance covariance. Ann. Appl. Stat. 3 1236-1265. · Zbl 1196.62077 · doi:10.1214/09-AOAS312
[30] U.S. Census Bureau (2010). Educational attainment-people 25 years old and over. Available at .
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.