Correlation analysis for compositional data. (English) Zbl 1178.86019

Summary: Compositional data need a special treatment prior to correlation analysis. In this paper we argue why standard transformations for compositional data are not suitable for computing correlations, and why the use of raw or log-transformed data is neither meaningful. As a solution, a procedure based on balances is outlined, leading to sensible correlation measures. The construction of the balances is demonstrated using a real data example from geochemistry. It is shown that the considered correlation measures are invariant with respect to the choice of the binary partitions forming the balances. Robust counterparts to the classical, non-robust correlation measures are introduced and applied. By using appropriate graphical representations, it is shown how the resulting correlation coefficients can be interpreted.


86A32 Geostatistics


R; robustbase
Full Text: DOI


[1] Aitchison J (1986) The statistical analysis of compositional data. Monographs on statistics and applied probability. Chapman & Hall, London, 416 p · Zbl 0688.62004
[2] Anderson TW (1958) An introduction to multivariate statistical analysis. Wiley, New York, 374 p · Zbl 0083.14601
[3] Anděl J (1978) Mathematical statistics. SNTL/Alfa, Prague, 346 p (in Czech)
[4] Buccianti A, Pawlowsky-Glahn V (2005) New perspectives on water chemistry and compositional data analysis. Math Geol 37(7):703–727 · Zbl 1103.62111
[5] Conover WJ (1998) Practical nonparametric statistics, 3rd edn. Wiley, New York, 584 p
[6] Egozcue JJ, Pawlowsky-Glahn V (2005) Groups of parts and their balances in compositional data analysis. Math Geol 37(7):795–828 · Zbl 1177.86018
[7] Egozcue JJ, Pawlowsky-Glahn V (2006) Simplicial geometry for compositional data. In: Buccianti A, Mateu-Figueras G, Pawlowsky-Glahn V (eds) Compositional data analysis in the geosciences: From theory to practice. Special publications, vol 264. Geological Society, London, pp 145–160 · Zbl 1156.86307
[8] Egozcue JJ, Pawlowsky-Glahn V, Mateu-Figueraz G, Barceló-Vidal C (2003) Isometric logratio transformations for compositional data analysis. Math Geol 35(3):279–300 · Zbl 1302.86024
[9] Filzmoser P, Hron K (2008) Outlier detection for compositional data using robust methods. Math Geosci 40(3):233–248 · Zbl 1135.62040
[10] Gabriel KR (1971) The biplot graphic display of matrices with application to principal component analysis. Biometrika 58:453–467 · Zbl 0228.62034
[11] Harville DA (1997) Matrix algebra from a statistican’s perspective. Springer, New York, 630 p
[12] Johnson R, Wichern D (2007) Applied multivariate statistical analysis, 6th edn. Prentice-Hall, London, 816 p · Zbl 1269.62044
[13] Mahalanobis P (1936) On the generalized distance in statistics. Proc Natl Inst Sci India 12:49–55 · Zbl 0015.03302
[14] Maronna R, Martin RD, Yohai VJ (2006) Robust statistics: Theory and methods. Wiley, New York, 436 p · Zbl 1094.62040
[15] Pawlowsky-Glahn V, Egozcue JJ, Tolosana-Delgado J (2007) Lecture notes on compositional data analysis. http://diobma.udg.edu/handle/10256/297/
[16] Pearson K (1897) Mathematical contributions to the theory of evolution. On a form of spurious correlation which may arise when indices are used in the measurement of organs. Proc R Soc Lond LX:489–502 · JFM 28.0209.02
[17] R Development Core Team (2008) R: A language and environment for statistical computing. Vienna, http://www.r-project.org
[18] Reimann C, Filzmoser P (2000) Normal and lognormal data distribution in geochemistry: Death of a myth. Consequences for the statistical treatment of geochemical and environmental data. Environ Geol 39:1001–1014
[19] Reimann C, Äyräs M, Chekushin V, Bogatyrev I, Boyd R, Caritat PD, Dutter R, Finne T, Halleraker J, Jæger O, Kashulina G, Lehto O, Niskavaara H, Pavlov V, Räisänen M, Strand T, Volden T (1998) Environmental geochemical atlas of the Central Barents region. Special publication. Geological Survey of Norway (NGU), Geological Survey of Finland (GTK), and Central Kola Expedition (CKE), Trondheim, Espoo, Monchegorsk, 745 p
[20] Reimann C, Filzmoser P, Garrett RG, Dutter R (2008) Statistical data analysis explained. Applied environmental statistics with R. Wiley, Chichester, 362 p
[21] Rousseeuw PJ, Van Driessen K (1999) A fast algorithm for the minimum covariance determinant estimator. Technometrics 41:212–223
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.