Discriminant analysis for compositional data incorporating cell-wise uncertainties.

*(English)*Zbl 1458.86014Summary: In the geosciences it is still uncommon to include measurement uncertainties into statistical methods such as discriminant analysis, but, especially for trace elements, measurement uncertainties are frequently of relevant size. Uncertainties can be reported by each measured variable, by each observation or by individual cells (i.e., each observation has an individual uncertainty for each variable). Most methods incorporating uncertainties use the uncertainties as weights for the variables or observations of the data set. The method proposed in this contribution uses variance additivity properties and generalised least squares to calculate better estimates of group variances and group means, which then influence the decision rules of linear and quadratic discrimination algorithms. This methodological framework allows incorporation of cell-wise uncertainties, and would be largely valid if the information about co-dependency between variable errors within each observation were reported. The method is also appropriate for incorporating uncertainties into compositional data sets – for example, those formed by concentrations, proportions, percentages or any other form of information about the relative abundance of a set of components forming a whole – even if such uncertainties are nearly never reported considering this compositional nature. The methods are illustrated by means of case studies with simulated data.

##### Keywords:

weighted discriminant analysis; linear discriminant analysis; quadratic discriminant analysis; CoDa; cell-wise uncertainty; measurement uncertainty; geochemical data
PDF
BibTeX
XML
Cite

\textit{S. Pospiech} et al., Math. Geosci. 53, No. 1, 1--20 (2021; Zbl 1458.86014)

Full Text:
DOI

**OpenURL**

##### References:

[1] | Aitchison, J., The statistical analysis of compositional data (1986), London: Chapman & Hall Ltd., London · Zbl 0688.62004 |

[2] | Aitchison, J.; Pawlowsky-Glahn, V., The one-hour course in compositional data analysis or compositional data analysis is simple, Proc IAMG, 97, 3-35 (1997) |

[3] | BIPM, IEC and IFCC, ILAC and ISO, IUPAC, IUPAP, OIML (2008) The international vocabulary of metrology-basic and general concepts and associated terms (VIM), JCGM 200: 2008. Online |

[4] | Egozcue, JJ; Pawlowsky-Glahn, V.; Pawlowsky-Glahn, V.; Buccianti, A., Basic concepts and procedures, Compositional data analysis: theory and applications, 12-27 (2011), Chichester: Wiley, Chichester |

[5] | Egozcue, JJ; Pawlowsky-Glahn, V.; Mateu-Figueras, G.; Barceló-Vidal, C., Isometric logratio transformations for compositional data analysis, Math Geol, 35, 3, 279-300 (2003) · Zbl 1302.86024 |

[6] | Fahlbusch W (2018) Transfer of main and trace elements from soil to plant with an emphasis on trace element supply for biogas digestion plants. Ph.D. thesis, Georg-August-Universität Göttingen |

[7] | Fahlbusch, W.; Hey, K.; Sauer, B.; Ruppert, H., Trace element delivery for biogas production enhanced by alternative energy crops: results from two-year field trials, Energy Sustain Soc, 8, 1, 1-11 (2018) |

[8] | Hamilton, NE; Ferry, M., ggtern: ternary diagrams using ggplot2, J Stat Softw Code Snippets, 87, 3, 1-17 (2018) |

[9] | Hawkins, DM, A Model for Assay Precision, Stat Biopharm Res, 6, 3, 263-269 (2014) |

[10] | Nguyen TP (2019) Transfer of Nutrient and Harmful Elements from Soil to Rice and Health Risk Assessments for the Vietnamese Population. Ph.D. thesis, Georg-August-Universiät Göttingen |

[11] | Pawlowsky-Glahn, V.; Egozcue, JJ; Tolosana-Delgado, R., Modeling and analysis of compositional data (2015), Hoboken: Wiley, Hoboken |

[12] | Pospiech SI (2018) Geochemical Characterization of Tea Leaves (Camellia sinensis) and Soils for Provenance Studies based on Compositional Data Analysis. Ph.D. thesis, Georg-August-Universität Göttingen |

[13] | Pospiech, S.; Fahlbusch, W.; Sauer, B.; Pasold, T.; Ruppert, H., Alteration of trace element concentrations in plants by adhering particles—methods of correction, Chemosphere, 182, 501-508 (2017) |

[14] | Pospiech S, van den Boogaart KG (2020) Discriminant analysis incorporating individual uncertainties. R package version 0.1.2-1. https://CRAN.R-project.org/package=vdar |

[15] | Potts, PJ, A proposal for the publication of geochemical data in the scientific literature, Geostand Geoanal Res, 36, 3, 225-230 (2012) |

[16] | R Core Team (2019) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria |

[17] | Stevens, SS, On the theory of scales of measurement, Science, 103, 2684, 677-680 (1946) · Zbl 1226.91050 |

[18] | van den Boogaart, KG; Tolosana-Delgado, R., “Compositions”: a unified R package to analyze compositional data, Comput Geosci, 34, 4, 320-338 (2008) |

[19] | van den Boogaart, KG; Tolosana-Delgado, R., Analyzing compositional data with R (2013), Heidelberg: Springer, Heidelberg · Zbl 1276.62011 |

[20] | Venables, WN; Ripley, BD, Modern applied statistics with S (2002), New York: Springer, New York |

[21] | Wickham, H., ggplot2: elegant graphics for data analysis (2016), New York: Springer, New York · Zbl 1397.62006 |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.