×

zbMATH — the first resource for mathematics

A Dirichlet regression model for compositional data with zeros. (English) Zbl 1407.62093
Summary: Compositional data are met in many different fields, such as economics, archaeometry, ecology, geology and political sciences. Regression where the dependent variable is a composition is usually carried out via a log-ratio transformation of the composition or via the Dirichlet distribution. However, when there are zero values in the data these two ways are not readily applicable. Suggestions for this problem exist, but most of them rely on substituting the zero values. In this paper we adjust the Dirichlet distribution when covariates are present, in order to allow for zero values to be present in the data, without modifying any values. To do so, we modify the log-likelihood of the Dirichlet distribution to account for zero values. Examples and simulation studies exhibit the performance of the zero adjusted Dirichlet regression.
Reviewer: Reviewer (Berlin)

MSC:
62F30 Parametric inference under constraints
62J12 Generalized linear models (logistic models)
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Aitchison, J., The statistical analysis of compositional data, J. R. Stat. Soc., Ser. B, 44, 139-177, (1982) · Zbl 0491.62017
[2] J. Aitchison, The Statistical Analysis of Compositional Data (Chapman and Hall, London, 2003). · Zbl 0688.62004
[3] I. J. Bear and D. Billheimer, “A logistic normal mixture model allowing essential zeros,” in Proceedings of the 6th Compositional Data Analysis Workshop, Girona, Spain, 2015.
[4] Butler, A.; Glasbey, C., A latent Gaussian model for compositional data with zeros, J. R. Stat. Soc., Ser. C, 57, 505-520, (2008)
[5] Campbell, G.; Mosimann, J. E., Multivariate analysis of size and shape: modelling with the Dirichlet distribution, 93-101, (1987)
[6] Davis, P. J., Leonhard euler’s integral: a historical profile of the gamma function: in memoriam: milton abramowitz, Am.Math. Mon., 66, 849-869, (1959) · Zbl 0091.00506
[7] Endres, D. M.; Schindelin, J. E., A new metric for probability distributions, IEEE Trans. Inform. Theory, 49, 1858-1860, (2003) · Zbl 1294.62003
[8] Gueorguieva, R.; Rosenheck, R.; Zelterman, D., Dirichlet component regression and its applications to psychiatric data, Comput. Stat. Data Anal., 52, 5344-5355, (2008) · Zbl 1452.62066
[9] Gourieroux, C.; Monfort, A.; Trognon, A., Pseudo maximum likelihood methods: theory, Econometrica, 52, 681-700, (1984) · Zbl 0575.62031
[10] R. H. Hijazi, “An EM-algorithm based method to deal with rounded zeros in compositional data under Dirichlet models,” in Proceedings of the 1st Compositional Data Analysis Workshop, Girona, Spain, 2011.
[11] Hijazi, R. H.; Jernigan, R.W., Modelling compositional data using Dirichlet regression models, J. Appl. Probab. Stat., 4, 77-91, (2009) · Zbl 1166.62053
[12] S. Kullback, Information Theory and Statistics (Dover, New York, 1997). · Zbl 0897.62003
[13] Leininger, T. J.; Gelfand, A. E.; Allen, J. M.; Silander, J. A., Spatial regression modeling for compositional data with many zeros, J. Agricult., Biol. Environ. Stat., 18, 314-334, (2013) · Zbl 1303.62085
[14] J. M. Maier, DirichletReg: Dirichlet Regression in R (2014). http://dirichletreg.r-forge.r-project.org/.
[15] Martín-Fernández, J. A.; Hron, K.; Templ, M.; Filzmoser, P.; Palarea-Albaladejo, J., Model-based replacement of rounded zeros in compositional data: classical and robust approaches, Comput. Stat. Data Anal., 56, 2688-2704, (2012) · Zbl 1255.62116
[16] I. T. Jolliffe, Principal Component Analysis (Springer, New York, 2005). · Zbl 1011.62064
[17] Lin, W.; Shi, P.; Feng, R.; Li, H., Variable selection in regression with compositional covariates, Biometrika, 101, 785-797, (2014) · Zbl 1306.62164
[18] Murteira Joséand, M. R.; Ramalho, J. J. S., Regression analysis of multivariate fractional data, Econometric Rev., 35, 515-552, (2016)
[19] K. W. Ng, G. L. Tian, and M. L. Tang, Dirichlet and Related Distributions: Theory, Methods and Applications (Wiley, Chichester, 2011). · Zbl 1234.60006
[20] Ospina, R.; Ferrari, S. L. P., Inflated beta distributions, Stat. Papers, 51, 111-126, (2010) · Zbl 1247.62043
[21] Österreicher, F.; Vajda, I., A new class of metric divergences on probability spaces and its applicability in statistics, Ann. Inst. Stat. Math., 55, 639-653, (2003) · Zbl 1052.62002
[22] Palarea-Albaladejo, J.; Martín-Fernández, J. A., Amodified emalr-algorithm for replacing rounded zeros in compositional data sets, Comput. Geosci., 34, 902-917, (2008)
[23] Scealy, J. L.; Welsh, A. H., Regression for compositional data by using distributions defined on the hypersphere, J. R. Stat. Soc., Ser. B, 73, 351-375, (2011)
[24] Smith, R. L., A statistical assessment of buchanan’s vote in palm beach county, Stat. Sci., 17, 441-457, (2002) · Zbl 1062.91536
[25] Stephens, M. A., Use of the vonmises distribution to analyse continuous proportions, Biometrika, 69, 197-203, (1982)
[26] Stewart, C.; Field, C., Managing the essential zeros in quantitative fatty acid signature analysis, J. Agricult., Biol., Environ. Stat., 16, 45-69, (2011) · Zbl 1306.62237
[27] M. Templ, K. Hron, and P. Filzmoser, robCompositions: Robust Estimation for Compositional Data, R PackageVersion 0.8-4. · Zbl 0491.62017
[28] H. Theil, Economics and Information Theory (North-Holland, Amsterdam, 1967).
[29] T. W. Yee, VGAM: Vector Generalized Linear and Additive Models. R Package Version 0.8-4 (2011). http://CRAN. R-project.org/package=VGAM.
[30] Zadora, G.; Neocleous, T.; Aitken, C., A two-level model for evidence evaluation in the presence of zeros, J. Forensic Sci., 55, 371-384, (2010)
[31] M. Tsagris and G. Athineou, Compositional: Compositional Data Analysis. R package version 2.8 (2017). https://CRAN.R-project.org/package=Compositional.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.