Descriptive statistics for symbolic data. (English) Zbl 0978.62005

Bock, Hans-Hermann (ed.) et al., Analysis of symbolic data. Exploratory methods for extracting statistical information from complex data. Berlin: Springer. Studies in Classification, Data Analysis, and Knowledge Organization. 106-124 (2000).
Te main task of this chapter is to extend the concept of frequency distributions and the standard definitions of descriptive statistics for real-valued data, such as the empirical mean, empirical standard deviation and median, to the general framework of symbolic variables.
Let \(Y\) be a single-valued quantitative (numerical) variable which is determined by a numerical value \(Y(k)\) for each element \(k\) of the finite set \(E=\{1,\dots,n\}\). Denote by \(y=(Y(1),\dots,Y(n))\) the finite sequence of values taken by the variable \(Y\) on the objects \(k\in E\). Define the observed frequency \({\mathcal O}_y\) of \(y\) as the function \({\mathcal O}_y:\mathbf R\to\mathbf N\) such that \({\mathcal O}_y\colon=\#\{k\in E |Y(k)=\xi\}\). Then the number of instances \(n_i\) of the value \(\xi_i\), the distribution function \(F_y\), the empirical mean \(\overline{y}\), and the empirical standard deviation \(s_y\) are defined from the knowledge of the observed frequency \({\mathcal O}_y\) as follows: \(n_i={\mathcal O}_y(\xi_i)\); \[ F_y(\xi)=n^{-1} \sum\limits_{\xi_j\leq \xi}{\mathcal O}_y(\xi_j);\quad\overline{y}=n^{-1}\sum\limits_{j=1}^l{\mathcal O}_y(\xi_j)\xi_j;\quad s_y^2=n^{-1}\sum\limits_{j=1}^l{\mathcal O}_y(\xi_j)(\xi_j-\overline{y})^2. \] The authors first provide a framework that allows the extension of the definition of the observed frequency to a symbolic variable and then generalize the above mentioned formulas in order to find generalized definitions of the empirical distribution function and the classical summary measures for multi-valued and interval-valued variables.
For the entire collection see [Zbl 1039.62501].


62A01 Foundations and philosophical topics in statistics
62G30 Order statistics; empirical distribution functions