An optimal variable cell histogram based on the sample spacings. (English) Zbl 0745.62034

Summary: Suppose we wish to construct a variable \(k\)-cell histogram based on an independent identically distributed sample of size \(n-1\) from an unknown density \(f\) on the interval of finite length. A variable cell histogram requires cutpoints and heights of all of its cells to be specified. We propose the following procedure:
(i) choose from the order statistics corresponding to the sample a set of \(k+1\) cutpoints that maximize a criterion, a function of the sample spacings; (ii) compute heights of the \(k\) cells according to a formula.
The resulting histogram estimates a \(k\)-cell theoretical histogram that stays constant within a cell and that minimizes the Hellinger distance to the density \(f\). The histogram tends to estimate low density regions accurately and is easy to compute. We find the number of cells of order \(n^{1/3}\) minimizing the mean Hellinger distance between the density \(f\) and a class of histograms whose cutpoints are chosen from the order statistics.


62G07 Density estimation
62G30 Order statistics; empirical distribution functions
Full Text: DOI