×

zbMATH — the first resource for mathematics

Regression with compositional response having unobserved components or below detection limit values. (English) Zbl 07258985
Summary: The typical way to deal with zeros and missing values in compositional data sets is to impute them with a reasonable value, and then the desired statistical model is estimated with the imputed data set, e.g., a regression model. This contribution aims at presenting alternative approaches to this problem within the framework of Bayesian regression with a compositional response. In the first step, a compositional data set with missing data is considered to follow a normal distribution on the simplex, which mean value is given as an Aitchison affine linear combination of some fully observed explanatory variables. Both the coefficients of this linear combination and the missing values can be estimated with standard Gibbs sampling techniques. In the second step, a normally distributed additive error is considered superimposed on the compositional response, and values are taken as ‘below the detection limit’ (BDL) if they are ‘too small’ in comparison with the additive standard deviation of each variable. Within this framework, the regression parameters and all missing values (including BDL) can be estimated with a Metropolis-Hastings algorithm. Both methods estimate the regression coefficients without need of any preliminary imputation step, and adequately propagate the uncertainty derived from the fact that the missing values and BDL are not actually observed, something imputation methods cannot achieve.
MSC:
62 Statistics
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Aitchison, J (1982) The statistical analysis of compositional data (with discussion). Journal of the Royal Statistical Society Series B (Statistical Methodology), 44, 139-77. · Zbl 0491.62017
[2] Aitchison, J (1986) The statistical analysis of compositional data. London: Chapman & Hall Ltd (Reprinted in 2003 with additional material by The Blackburn Press). · Zbl 0688.62004
[3] Barceló-Vidal, C, Martín-Fernández, JA, Pawlowsky-Glahn, V (2001) Mathematical foundations of compositional data analysis. In Ross, G (ed.), Proceedings of IAMG’01—The VII Annual Conference of the International Association for Mathematical Geology, Cancun: IAMG. · Zbl 1052.62531
[4] Blatt, H, Middleton, G, Murray, R (1972) Origin of sedimentary rocks. Enlgewood Cliffs, NJ: Prentice-Hall.
[5] Casella, G, George, EI (1992) Explaining the Gibbs sampler. American Statistician, 46, 167-74.
[6] Egozcue, JJ, Pawlowsky-Glahn, V (2005) Groups of parts and their balances in compositional data analysis. Mathematical Geology, 37, 795-828. · Zbl 1177.86018
[7] Egozcue, JJ, Pawlowsky-Glahn, V, Mateu-Figueras, G, Barceló-Vidal, C (2003) Isometric logratio transformations for compositional data analysis. Mathematical Geology, 35, 279-300. · Zbl 1302.86024
[8] Ferreira, JTAS, Steel, FJ (2007) A new class of skewed multivariate distributions with applications to regression analysis. Statistica Sinica, 17, 505-29.
[9] Gelman, A, Carlin, JB, Stern, HS, Rubin, DB (1995) Bayesian data analysis. New York: Chapman & Hall.
[10] Gross, AL (2000) Bayesian interval estimation of multiple correlations with missing data: A gibbs sampling approach. Multivariate Behavioral Research, 35, 201-27.
[11] Gross, AL, Torres-Quevedo, R (1995) Estimating correlatons with missing data, a Bayesian approach. Psychometrika, 60, 341-54. · Zbl 0863.62026
[12] Hastings, WK (1970) Monte Carlo sampling methods using markov chains and their applications. Biometrika, 57, 97-109. · Zbl 0219.65008
[13] Little, RJA, Rubin, DB (2002) Statistical analysis with missing data. New York: Wiley.
[14] Liu, C (1996) Bayesian robust multivariate linear regression with incomplete data. Journal of the American Statistical Association, 91, 1219-27. · Zbl 0880.62028
[15] Martín-Fernández, JA, Hron, K, Templ, M, Filzmoser, P, Palarea-Albaladejo, J (2012) Model-based replacement of rounded zeros in compositional data: Classical and robust approaches. Computational Statistics and Data Analysis, 56, 2688-704. · Zbl 1255.62116
[16] Mateu-Figueras, G, Pawlowsky-Glahn, V, Barceló-Vidal, C (2003) Distributions on the simplex. In Thió-Henestrosa, S, Martín-Fernández, J-A (eds), Proceedings of the 1st International Workshop on Compositional Data Analysis. Girona: Universitat de Girona.
[17] Palarea-Albaladejo, J, Martín-Fernández, JA, Gómez-García, JA (2007) Parametric approach for dealing with compositional rounded zeros. Mathematical Geology, 39, 625-45. · Zbl 1130.86001
[18] Pawlowsky-Glahn, V, Egozcue, JJ (2001) Geometric approach to statistical analysis on the simplex. Stochastic Environmental Research and Risk Assessment, 15, 384-98. · Zbl 0987.62001
[19] R Core Team (2013) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL http://www.R-project.org.
[20] Templ, M, Hron, K, Filzmoser, P (2011) robCompositions: An R-package for robust statistical analysis of compositional data. In Pawlowsky-Glahn, V, Buccianti, A (eds), Compositional data analysis: Theory and applications. Chichester, UK: John Wiley and Sons.
[21] Tierney, L (1994) Markov chains for exploring posterior distributions. Annals of Statistics, 22, 1701-62. · Zbl 0829.62080
[22] Tolosana-Delgado, R, von Eynatten, H (2009) Grain-size control on petrographic composition of sediments: Compositional regression and rounded zeroes. Mathematical Geosciences, 41, 869-86. · Zbl 1178.86025
[23] van den Boogaart, KG, Tolosana-Delgado, R, Bren, M (2011) The compositional meaning of a detection limit. In Egozcue, JJ, Tolosana-Delgado, R, Ortego, MI (eds), Proceedings of the 4th International Workshop on Compositional Data Analysis. Barcelona: CIMNE.
[24] van den Boogaart, KG, Tolosana-Delgado, R (2013) Analyzing compositional data with R. Heidelberg: Springer. · Zbl 1276.62011
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.