×

Bump hunting with non-Gaussian kernels. (English) Zbl 1056.62049

Summary: It is well known that the number of modes of a kernel density estimator is monotone nonincreasing in the bandwidth if the kernel is a Gaussian density. There is numerical evidence of nonmonotonicity in the case of some non-Gaussian kernels, but little additional information is available. The present paper provides theoretical and numerical descriptions of the extent to which the number of modes is a nonmonotone function of bandwidth in the case of general compactly supported densities.
Our results address popular kernels used in practice, for example, the Epanechnikov, biweight and triweight kernels, and show that in such cases nonmonotonicity is present with strictly positive probability for all sample sizes \(n\geq 3\). In the Epanechnikov and biweight cases the probability of nonmonotonicity equals 1 for all \(n\geq 2\). Nevertheless, in spite of the prevalence of lack of monotonicity revealed by these results, it is shown that the notion of a critical bandwidth (the smallest bandwidth above which the number of modes is guaranteed to be monotone) is still well defined. Moreover, just as in the Gaussian case, the critical bandwidth is of the same size as the bandwidth that minimises mean squared error of the density estimator.
These theoretical results, and new numerical evidence, show that the main effects of nonmonotonicity occur for relatively small bandwidths, and have negligible impact on many aspects of bump hunting.

MSC:

62G07 Density estimation
62G20 Asymptotic properties of nonparametric inference

Software:

SiZer

References:

[1] Chaudhuri, P. and Marron, J. S. (1999). SiZer for exploration of structures in curves. J. Amer. Statist. Assoc. 94 807–823. · Zbl 1072.62556 · doi:10.2307/2669996
[2] Chaudhuri, P. and Marron, J. S. (2000). Scale space view of curve estimation. Ann. Statist. 28 408–428. · Zbl 1106.62318 · doi:10.1214/aos/1016218224
[3] Cheng, M.-Y. and Hall, P. (1999). Mode testing in difficult cases. Ann. Statist. 27 1294–1315. · Zbl 0957.62028 · doi:10.1214/aos/1017938927
[4] Cuevas, A. and González-Manteiga, W. (1991). Data-driven smoothing based on convexity properties. In Nonparametric Functional Estimation and Related Topics (G. Roussas, ed.) 225–240. Kluwer, Dordrecht. · Zbl 0806.62024
[5] Escobar, M. D. and West, M. (1995). Bayesian density estimation and inference using mixtures. J. Amer. Statist. Assoc. 90 577–588. · Zbl 0826.62021 · doi:10.2307/2291069
[6] Fisher, N. I., Mammen, E. and Marron, J. S. (1994). Testing for multimodality. Comput. Statist. Data Anal. 18 499–512. · Zbl 0900.62227 · doi:10.1016/0167-9473(94)90080-9
[7] Fisher, N. I. and Marron, J. S. (2001). Mode testing via the excess mass estimate. Biometrika 88 499–517. · Zbl 0985.62034 · doi:10.1093/biomet/88.2.499
[8] Good, I. J. and Gaskins, R. A. (1980). Density estimation and bump-hunting by the penalized likelihood method exemplified by scattering and meteorite data (with discussion). J. Amer. Statist. Assoc. 75 42–73. · Zbl 0432.62024 · doi:10.2307/2287377
[9] Hall, P. and York, M. (2001). On the calibration of Silverman’s test for multimodality. Statist. Sinica 11 515–536. · Zbl 1026.62047
[10] Hartigan, J. A. and Hartigan, P. M. (1985). The DIP test of unimodality. Ann. Statist. 13 70–84. JSTOR: · Zbl 0575.62045 · doi:10.1214/aos/1176346577
[11] Izenman, A. J. and Sommer, C. (1988). Philatelic mixtures and multimodal densities. J. Amer. Statist. Assoc. 83 941–953.
[12] Komlós, J., Major, P. and Tusnády, G. (1976). An approximation of partial sums of independent rv’s, and the sample df. II. Z. Wahrsch. Verv. Gebiete 34 33–58. · Zbl 0307.60045 · doi:10.1007/BF00532688
[13] Mammen, E., Marron, J. S. and Fisher, N. I. (1992). Some asymptotics for multimodality tests based on kernel density estimates. Probab. Theory Related Fields 91 115–132. · Zbl 0745.62048 · doi:10.1007/BF01194493
[14] Minnotte, M. C. (1997). Nonparametric testing of the existence of modes. Ann. Statist. 25 1646–1660. · Zbl 0936.62056 · doi:10.1214/aos/1031594735
[15] Minnotte, M. C. and Scott, D. W. (1993). The mode tree: A tool for visualization of nonparametric density estimates. J. Comput. Graph. Statist. 2 51–68.
[16] Müller, D. W and Sawitzki, G. (1991). Excess mass estimates and tests for multimodality. J. Amer. Statist. Assoc. 86 738–746. · Zbl 0733.62040 · doi:10.2307/2290406
[17] Polonik, W. (1995a). Measuring mass concentrations and estimating density contour clusters—an excess mass approach. Ann. Statist. 23 855–881. JSTOR: · Zbl 0841.62045 · doi:10.1214/aos/1176324626
[18] Polonik, W. (1995b). Density estimation under qualitative assumptions in higher dimensions. J. Multivariate Anal. 55 61–81. · Zbl 0847.62027 · doi:10.1006/jmva.1995.1067
[19] Roeder, K. (1990). Density estimation with confidence sets exemplified by superclusters and voids in the galaxies. J. Amer. Statist. Assoc. 85 617–624. · Zbl 0704.62103 · doi:10.2307/2289993
[20] Roeder, K. (1994). A graphical technique for determining the number of components in a mixture of normals. J. Amer. Statist. Assoc. 89 487–495. · Zbl 0798.62004 · doi:10.2307/2290850
[21] Schoenberg, I. J. (1950). On Pólya frequency functions. II. Variation-diminishing integral operators of the convolution type. Acta Sci. Math. (Szeged) 12 97–106. · Zbl 0035.35201
[22] Sheather, S. J. and Jones, M. C. (1991). A reliable data-based bandwidth selection method for kernel density estimation. J. Roy. Statist. Soc. Ser. B 53 683–690. · Zbl 0800.62219
[23] Silverman, B. W. (1981). Using kernel density estimates to investigate multimodality. J. Roy. Statist. Soc. Ser. B 43 97–99.
[24] Silverman, B. W. (1983). Some properties of a test for multimodality based on kernel density estimates. In Probability , Statistics and Analysis (J. F. C. Kingman and G. E. H. Reuter, eds.) 248–259. Cambridge Univ. Press. · Zbl 0504.62036
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.