×

zbMATH — the first resource for mathematics

Bayesian-multiplicative treatment of count zeros in compositional data sets. (English) Zbl 07258982
Summary: Compositional count data are discrete vectors representing the numbers of outcomes falling into any of several mutually exclusive categories. Compositional techniques based on the log-ratio methodology are appropriate in those cases where the total sum of the vector elements is not of interest. Such compositional count data sets can contain zero values which are often the result of insufficiently large samples. That is, they refer to unobserved positive values that may have been observed with a larger number of trials or with a different sampling design. Because the log-ratio transformations require data with positive values, any statistical analysis of count compositions must be preceded by a proper replacement of the zeros. A Bayesian-multiplicative treatment has been proposed for addressing this count zero problem in several case studies. This treatment involves the Dirichlet prior distribution as the conjugate distribution of the multinomial distribution and a multiplicative modification of the non-zero values. Different parameterizations of the prior distribution provide different zero replacement results, whose coherence with the vector space structure of the simplex is stated. Their performance is evaluated from both the theoretical and the computational point of view.
Reviewer: Reviewer (Berlin)

MSC:
62 Statistics
Software:
R; robCompositions
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Aebischer, NJ, Robertson, PA, Kenward, RE (1993) Compositional analysis of habitat use from animal radio-tracking data. Ecology, 74(5), 1313-25.
[2] Agresti, A (2003) Categorical data analysis. Wiley Series in Probability and Statistics, p. 710. 2nd edn, Hoboken: John Wiley & Sons. · Zbl 1018.62002
[3] Aitchison, J (1986) The statistical analysis of compositional data. Monographs on Statistics and Applied Probability (Reprinted 2003 with additional material by The Blackburn Press). London: Chapman and Hall Ltd., p. 416.
[4] Bernard, JM (2005) An introduction to the imprecise Dirichlet model for multinomial data. International Journal of Approximate Reasoning, 39(2-3), 123-50. · Zbl 1066.62003
[5] Butler, A, Glasbey, C (2008) A latent Gaussian model for compositional data with zeros. Journal of the Royal Statistical Society Series C-Applied Statistics, 57, 505-20.
[6] Davis, CS (1993) The computer generation of the multinomial random variates. Computational Statistics & Data Analysis, 16, 205-17. · Zbl 0937.62543
[7] Eaton, ML (1983) Multivariate statistics. A vector space approach. New York: John Wiley & Sons, p. 512. · Zbl 1160.62326
[8] Egozcue, JJ (2009) Reply to ‘On the Harker variation diagrams; ...’ by J.A. Cortés. Mathematical Geosciences, 41, 829-34. · Zbl 1178.86018
[9] Egozcue, JJ, Pawlowsky-Glahn, V (2006) Simplicial geometry for compositional data. In Buccianti, A, Mateu-Figueras, G, Pawlowsky-Glahn, V (eds), Compositional data analysis in the geosciences: From theory to practice London: Geological Society, pp. 145-160. · Zbl 1156.86307
[10] Egozcue, JJ, Pawlowsky-Glahn, V, Mateu-Figueras, G, Barceló-Vidal, C (2003) Isometric logratio transformations for compositional data analysis. Mathematical Geology, 35(3), 279-300. · Zbl 1302.86024
[11] Egozcue, JJ, Tolosana-Delgado, R, Ortego, MI (eds) (2011) Proceedings of CODAWORK’11: The 4th Compositional Data Analysis Workshop. Sant Feliu De Guxols, May 10-13. ISBN978-84-87867-76-7 (electronic publication).
[12] Elston, DA, Illius, AW, Gordon, IJ (1996) Assessment of preference among a range of options using log ratio analysis. Ecology, 77, 2538-48.
[13] Filzmoser, P, Hron, K, Templ, M (2012) Discriminant analysis for compositional data and robust parameter estimation. Computational Statistics, 27(4), 585-604. · Zbl 1304.65033
[14] Friedman, J, Alm, EJ (2012) Inferring correlation networks from genomic survey data. PLoS Computational Biology, 8(9), e1002687. doi:10.1371/journal.pcbi.1002687.
[15] Graffelman, J (2011) Statistical inference for Hardy-Weinberg equilibrium using logratio coordinates. In Egozcue, J.J., Tolosana-Delgado, R., Ortego, M.I. (Eds), Proceedings of the 4th International Workshop on Compositional Data Analysis, p. 5.
[16] Graffelman, J, Egozcue, JJ (2011) Hardy-Weinberg equilibrium: A nonparametric compositional approach, Ch. 15. In Pawlowsky-Glahn, V., Buccianti, A. (Eds), Compositional Data Analysis: Theory and Applications, pp. 208-17. Chichester, UK: John Wiley & Sons, Ltd.
[17] Hron, K, Templ, M, Filzmoser, P (2010) Imputation of missing values for compositional data using classical and robust methods. Computational Statistics & Data Analysis, 54(12), 3095-107. · Zbl 1284.62049
[18] Martín-Fernández, JA, Barceló-Vidal, C, Pawlowsky-Glahn, V (2003) Dealing with zeros and missing values in compositional data sets using nonparametric imputation. Mathematical Geology, 35(3), 253-78. · Zbl 1302.86027
[19] Martín-Fernández, JA, Palarea-Albaladejo, J, Olea, RA (2011) Dealing with zeros, Ch. 4. In Pawlowsky-Glahn, V., Buccianti, A. (Eds), Compositional Data Analysis: Theory and Applications, pp. 47-62. Chichester, UK: John Wiley & Sons, Ltd.
[20] Martín-Fernández, JA, Hron, K, Templ, M, Filzmoser, P, Palarea-Albaladejo, J (2012) Model-based replacement of rounded zeros in compositional data: Classical and robust approach. Computational Statistics & Data Analysis, 56(3), 2688-704. · Zbl 1255.62116
[21] Mateu-Figueras, G, Pawlowsky-Glahn, V(2008) A critical approach to probability laws in geochemistry. Mathematical Geosciences, 40(5), 489-502. · Zbl 1153.86338
[22] Monti, GS, Mateu-Figueras, G, Pawlowsky-Glahn, V (2011) Notes on the scaled Dirichlet distribution. In Pawlowsky-Glahn, V., Buccianti, A. (Eds), Compositional Data Analysis: Theory and Applications, pp. 128-38. Chichester, UK: John Wiley & Sons, Ltd.
[23] Palarea-Albaladejo, J, Martín-Fernández, JA, Gómez-García, J (2007) A parametric approach for dealing with compositional rounded zeros. Mathematical Geology, 39, 625-45. · Zbl 1130.86001
[24] Palarea-Albaladejo, J, Martín-Fernández, JA (2008) A modified EM alr-algorithm for replacing rounded zeros in compositional data sets. Computers & Geosciences, 34(8), 902-17.
[25] Palarea-Albaladejo, J, Martín-Fernández, JA, Soto, JA (2012) Dealing with distances and transformations for fuzzy c-Means clustering of compositional data. Journal of Classification, 29(2), 144-69. · Zbl 1360.62347
[26] Palarea-Albaladejo, J, Martín-Fernández, JA (2013) Values below detection limit in compositional chemical data. Analytica Chimica Acta, 764, 32-43.
[27] Pawlowsky-Glahn, V, Buccianti, A, eds (2011) Compositional data analysis: Theory and applications. Chichester: John Wiley & Sons, p. 378.
[28] Pawlowsky-Glahn, V, Egozcue, JJ (2002) BLU estimators and compositional data. Mathematical Geology, 34(3), 259-74. · Zbl 1031.86007
[29] Pearson, K (1897) Mathematical contributions to the theory of evolution. On a form of spurious correlation which may arise when indices are used in the measurement of organs. Proceedings of the Royal Society of London, 60, 489-502. · JFM 28.0209.02
[30] Pierotti, MER, Martín-Fernández, JA, Seehausen, O (2009) A mapping individual variation in male mating preference space: Multiple choice in a colour polymorphic cichlid fish. Evolution, 63(9), 2372-88.
[31] R development core team (2012) R: A language and environment for statistical computing, Vienna, Austria: R Foundation for Statistical Computing. http://www.r-project.org.
[32] Richardson, D (1997) How to recognize zero. Journal of Symbolic Computation, 24(6), 627-45. · Zbl 0917.11062
[33] Rodrigues, PC, Lima, AT (2009) Analysis of an European union election using principal component analysis. Statistical Papers, 50, 895-904. · Zbl 1247.91053
[34] Stewart, C, Field, C (2010) Managing the essential zeros in quantitative fatty acid signature analysis. Journal of Agricultural, Biological, and Environmental Statistics, 16(1), 45-69. · Zbl 1306.62237
[35] Templ, M, Hron, K, Filzmoser, P (2011) robCompositions: An R-package for robust statistical analysis of compositional data, Ch. 25. In Pawlowsky-Glahn, V., Buccianti, A. (Eds), Compositional Data Analysis: Theory and Applications, pp. 341-55. Chichester, UK: John Wiley & Sons, Ltd.
[36] Walley, P (1996) Inferences from multinomial data: Learning about a bag of marbles. Journal of the Royal Statistical Society Series B (Methodological), 58(1), 3-57. · Zbl 0834.62004
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.