## Descriptive statistics for symbolic data.(English)Zbl 0978.62005

Bock, Hans-Hermann (ed.) et al., Analysis of symbolic data. Exploratory methods for extracting statistical information from complex data. Berlin: Springer. Studies in Classification, Data Analysis, and Knowledge Organization. 106-124 (2000).
Te main task of this chapter is to extend the concept of frequency distributions and the standard definitions of descriptive statistics for real-valued data, such as the empirical mean, empirical standard deviation and median, to the general framework of symbolic variables.
Let $$Y$$ be a single-valued quantitative (numerical) variable which is determined by a numerical value $$Y(k)$$ for each element $$k$$ of the finite set $$E=\{1,\dots,n\}$$. Denote by $$y=(Y(1),\dots,Y(n))$$ the finite sequence of values taken by the variable $$Y$$ on the objects $$k\in E$$. Define the observed frequency $${\mathcal O}_y$$ of $$y$$ as the function $${\mathcal O}_y:\mathbf R\to\mathbf N$$ such that $${\mathcal O}_y\colon=\#\{k\in E |Y(k)=\xi\}$$. Then the number of instances $$n_i$$ of the value $$\xi_i$$, the distribution function $$F_y$$, the empirical mean $$\overline{y}$$, and the empirical standard deviation $$s_y$$ are defined from the knowledge of the observed frequency $${\mathcal O}_y$$ as follows: $$n_i={\mathcal O}_y(\xi_i)$$; $F_y(\xi)=n^{-1} \sum\limits_{\xi_j\leq \xi}{\mathcal O}_y(\xi_j);\quad\overline{y}=n^{-1}\sum\limits_{j=1}^l{\mathcal O}_y(\xi_j)\xi_j;\quad s_y^2=n^{-1}\sum\limits_{j=1}^l{\mathcal O}_y(\xi_j)(\xi_j-\overline{y})^2.$ The authors first provide a framework that allows the extension of the definition of the observed frequency to a symbolic variable and then generalize the above mentioned formulas in order to find generalized definitions of the empirical distribution function and the classical summary measures for multi-valued and interval-valued variables.
For the entire collection see [Zbl 1039.62501].

### MSC:

 62A01 Foundations and philosophical topics in statistics 62G30 Order statistics; empirical distribution functions