×

Model-based replacement of rounded zeros in compositional data: classical and robust approaches. (English) Zbl 1255.62116

Summary: The log-ratio methodology represents a powerful set of methods and techniques for statistical analysis of compositional data. These techniques may be used for the estimation of rounded zeros or values below the detection limit in cases when the underlying data are compositional in nature. An algorithm based on iterative log-ratio regressions is developed by combining a particular family of isometric log-ratio transformations with censored regression. In the context of classical regression methods, the equivalence of the method based on additive and isometric log-ratio transformations is proved. This equivalence does not hold for robust regression. Based on Monte Carlo methods, simulations are performed to assess the performance of classical and robust methods. To illustrate the method, a case study involving geochemical data is conducted.

MSC:

62G08 Nonparametric regression and quantile regression
62G35 Nonparametric robustness
65C05 Monte Carlo methods
65G50 Roundoff error

Software:

robustbase; R
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Aitchison, J., The Statistical Analysis of Compositional Data. Monographs on Statistics and Applied Probability (1986), Chapman and Hall Ltd.: Chapman and Hall Ltd. London, UK, p. 416 (Reprinted 2003 with additional material by The Blackburn Press) · Zbl 0688.62004
[2] Aitchison, J.; Barceló-Vidal, C.; Martín-Fernández, J. A.; Pawlowsky-Glahn, V., Logratio analysis and compositional distance, Mathematical Geology, 32, 3, 271-275 (2000) · Zbl 1101.86309
[3] Amemiya, T., Tobit models: a survey, Journal of Econometrics, 24, 3-61 (1984) · Zbl 0539.62121
[4] Barceló-Vidal, C., Aguilar, L., Martín-Fernández, J.A., 2011. Compositional VARIMA Time Series (Chapter 7). In: Pawlowsky-Glahn and Buccianti (2011); Barceló-Vidal, C., Aguilar, L., Martín-Fernández, J.A., 2011. Compositional VARIMA Time Series (Chapter 7). In: Pawlowsky-Glahn and Buccianti (2011)
[5] (Buccianti, A.; Mateu-Figueras, G.; Pawlowsky-Glahn, V., Compositional Data Analysis in the Geosciences: From Theory to Practice. Compositional Data Analysis in the Geosciences: From Theory to Practice, Special Publications, vol. 264 (2006), Geological Society: Geological Society London) · Zbl 1155.86002
[6] Daunis-i-Estadella, J., Martín-Fernández, J.A. (Eds.), 2008. Proceedings of CODAWORK’08, the 3rd Compositional Data Analysis Workshop. Universitat de Girona. ISBN: 84-8458-272-4. http://ima.udg.es/Activitats/CoDaWork08/; Daunis-i-Estadella, J., Martín-Fernández, J.A. (Eds.), 2008. Proceedings of CODAWORK’08, the 3rd Compositional Data Analysis Workshop. Universitat de Girona. ISBN: 84-8458-272-4. http://ima.udg.es/Activitats/CoDaWork08/
[7] Egozcue, J. J., Reply to on the harker variation diagrams; by J.A. Cortés, Mathematical Geosciences, 41, 7, 829-834 (2009) · Zbl 1178.86018
[8] Egozcue, J. J.; Pawlowsky-Glahn, V., Groups of parts and their balances in compositional data analysis, Mathematical Geology, 37, 7, 795-828 (2005) · Zbl 1177.86018
[9] Egozcue, J. J.; Pawlowsky-Glahn, V., Simplicial geometry for compositional data, (Buccianti, A.; Mateu-Figueras, G.; Pawlowsky-Glahn, V., Compositional Data Analysis in the Geosciences: From Theory to Practice. Compositional Data Analysis in the Geosciences: From Theory to Practice, Special Publications, vol. 264 (2006), Geological Society: Geological Society London), 145-160 · Zbl 1156.86307
[10] Egozcue, J.J., Pawlowsky-Glahn, V., 2011. Basic concepts and procedures (Chapter 2). In: Pawlowsky-Glahn and Buccianti (2011); Egozcue, J.J., Pawlowsky-Glahn, V., 2011. Basic concepts and procedures (Chapter 2). In: Pawlowsky-Glahn and Buccianti (2011)
[11] Egozcue, J. J.; Pawlowsky-Glahn, V.; Mateu-Figueras, G.; Barceló-Vidal, C., Isometric logratio transformations for compositional data analysis, Mathematical Geology, 35, 3, 279-300 (2003) · Zbl 1302.86024
[12] Egozcue, J.J., Tolosana-Delgado, R., Ortego, M.I. (Eds.), 2011. Proceedings of CODAWORK’11, the 4th Compositional Data Analysis Workshop. Sant Feliu De Guxols. ISBN: 978-84-87867-76-7 (electronic publication). May 10-13.; Egozcue, J.J., Tolosana-Delgado, R., Ortego, M.I. (Eds.), 2011. Proceedings of CODAWORK’11, the 4th Compositional Data Analysis Workshop. Sant Feliu De Guxols. ISBN: 978-84-87867-76-7 (electronic publication). May 10-13.
[13] Filzmoser, P.; Hron, K., Outlier detection for compositional data using robust methods, Mathematical Geosciences, 40, 3, 233-248 (2008) · Zbl 1135.62040
[14] Filzmoser, P., Hron, K., 2011. Robust statistical analysis (Chapter 5). In: Pawlowsky-Glahn and Buccianti (2011); Filzmoser, P., Hron, K., 2011. Robust statistical analysis (Chapter 5). In: Pawlowsky-Glahn and Buccianti (2011)
[15] Filzmoser, P.; Hron, K.; Reimann, C., Principal component analysis for compositional data with outliers, Environmetrics, 20, 6, 621-632 (2009)
[16] Fišerová, E.; Hron, K., On interpretation of orthonormal coordinates for compositional data, Mathematical Geosciences, 43, 4, 455-468 (2011)
[17] Hron, K.; Templ, M.; Filzmoser, P., Imputation of missing values for compositional data using classical and robust methods, Computational Statistics and Data Analysis, 54, 12, 3095-3107 (2010) · Zbl 1284.62049
[18] Huber, P. J., Robust Statistics (1981), John Wiley: John Wiley New York · Zbl 0536.62025
[19] Johnson, R. A.; Wichern, D. W., Applied Multivariate Statistical Analysis (2002), Prentice Hall: Prentice Hall London
[20] Little, R. J.A.; Rubin, D. B., Statistical Analysis with Missing Data (1987), Wiley: Wiley New Jersey · Zbl 0665.62004
[21] Maronna, R.; Martin, R. D.; Yohai, V. J., Robust Statistics: Theory and Methods (2006), John Wiley: John Wiley New York, USA, p. 436 · Zbl 1094.62040
[22] Martín-Fernández, J. A.; Barceló-Vidal, C.; Pawlowsky-Glahn, V., Dealing with zeros and missing values in compositional data sets using nonparametric imputation, Mathematical Geology, 35, 3, 253-278 (2003) · Zbl 1302.86027
[23] Martín-Fernández, J.A., Palarea-Albaladejo, J., Olea, R.A., 2011. Dealing with zeros (Chapter 4). In: Pawlowsky-Glahn and Buccianti (2011); Martín-Fernández, J.A., Palarea-Albaladejo, J., Olea, R.A., 2011. Dealing with zeros (Chapter 4). In: Pawlowsky-Glahn and Buccianti (2011)
[24] Martín-Fernández, J. A.; Thió-Henestrosa, S., Rounded zeros: some practical aspects for compositional data, (Buccianti, A.; Mateu-Figueras, G.; Pawlowsky-Glahn, V., Compositional Data Analysis in the Geosciences: From Theory to Practice. Compositional Data Analysis in the Geosciences: From Theory to Practice, Special Publications, vol. 264 (2006), Geological Society: Geological Society London), 191-201
[25] Mateu-Figueras, G., Barceló-Vidal, C. (Eds)., 2005. Proceedings of CODAWORK’05, the 2nd Compositional Data Analysis Workshop. Universitat de Girona. ISBN: 84-8458-222-1. http://ima.udg.es/Activitats/CoDaWork05/; Mateu-Figueras, G., Barceló-Vidal, C. (Eds)., 2005. Proceedings of CODAWORK’05, the 2nd Compositional Data Analysis Workshop. Universitat de Girona. ISBN: 84-8458-222-1. http://ima.udg.es/Activitats/CoDaWork05/
[26] Mateu-Figueras, G.; Pawlowsky-Glahn, V., A critical approach to probability laws in geochemistry, Mathematical Geosciences, 40, 5, 489-502 (2008) · Zbl 1153.86338
[27] Palarea-Albaladejo, J.; Martín-Fernández, J. A., A modified EM alr-algorithm for replacing rounded zeros in compositional data sets, Computers & Geosciences, 34, 8, 902-917 (2008)
[28] Palarea-Albaladejo, J.; Martín-Fernández, J. A.; Gómez-García, J., A parametric approach for dealing with compositional rounded zeros, Mathematical Geology, 39, 7, 625-645 (2007) · Zbl 1130.86001
[29] (Pawlowsky-Glahn, V.; Buccianti, A., Compositional Data Analysis: Theory and Applications (2011), John Wiley & Sons, Ltd.: John Wiley & Sons, Ltd. Chichester,UK), 378 · Zbl 1103.62111
[30] Pearson, K., Mathematical contributions to the theory of evolution. On a form of spurious correlation which may arise when indices are used in the measurement of organs, Proceedings of the Royal Society of London, 60, 489-502 (1897) · JFM 28.0209.02
[31] R Development Core Team, 2008. R: a language and environment for statistical computing. Vienna. http://www.r-project.org; R Development Core Team, 2008. R: a language and environment for statistical computing. Vienna. http://www.r-project.org
[32] Reimann, C.; Äyräs, M.; Chekushin, V. A.; Bogatyrev, I.; Boyd, R.; de Caritat, P.; Dutter, R.; Finne, T. E.; Halleraker, J. H.; Jæger, Ø.; Kashulina, G.; Niskavaara, H.; Lehto, O.; Pavlov, V.; Räisänen, M. L.; Strand, T.; Volden, T., Environmental Geochemical Atlas of the Central Barents Region (1998), NGU-GTK-CKE Special Publication: NGU-GTK-CKE Special Publication Trondheim, Norway, p. 745
[33] Reimann, C.; Filzmoser, P.; Garrett, R. G.; Dutter, R., Statistical Data Analysis Explained: Applied Environmental Statistics with R (2008), Wiley: Wiley Chichester
[34] Seber, G. A.F., A Matrix Handbook for Statisticians (2008), John Wiley & Sons, Inc.: John Wiley & Sons, Inc. Hoboken, New Jersey, USA, p. 559 · Zbl 1143.15001
[35] Thió-Henestrosa, S. Martín-Fernández, J.A. (Eds.), 2003. Proceedings of CODAWORK’03, the 1st Compositional Data Analysis Workshop. Universitat de Girona. ISBN: 84-8458-111-X. http://ima.udg.es/Activitats/CoDaWork03/; Thió-Henestrosa, S. Martín-Fernández, J.A. (Eds.), 2003. Proceedings of CODAWORK’03, the 1st Compositional Data Analysis Workshop. Universitat de Girona. ISBN: 84-8458-111-X. http://ima.udg.es/Activitats/CoDaWork03/
[36] Tolosana-Delgado, R., van den Boogaart, K.G., Pawlowsky-Glahn, V., 2011. Geostatistics for compositions (Chapter 6). In: Pawlowsky-Glahn and Buccianti (2011); Tolosana-Delgado, R., van den Boogaart, K.G., Pawlowsky-Glahn, V., 2011. Geostatistics for compositions (Chapter 6). In: Pawlowsky-Glahn and Buccianti (2011)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.