×

Smooth discrimination analysis. (English) Zbl 0961.62058

Ann. Stat. 27, No. 6, 1808-1829 (1999); correction ibid. 32, No. 5, 2340-2341 (2004).
Summary: Discriminant analysis for two data sets in \(\mathbb{R}^d\) with probability densities \(f\) and \(g\) can be based on the estimation of the set \(G=\{x:f(x)\geq g(x)\}\). We consider applications where it is appropriate to assume that the region \(G\) has a smooth boundary or belongs to another nonparametric class of sets. In particular, this assumption makes sense if discrimination is used as a data analytic tool.
Decision rules based on minimization of empirical risk over the whole class of sets and over sieves are considered. Their rates of convergence are obtained. We show that these rules achieve optimal rates for estimation of \(G\) and optimal rates of convergence for Bayes risks. An interesting conclusion is that the optimal rates for Bayes risks can be very fast, in particular, faster than the “parametric” root-\(n\) rate. These fast rates cannot be guaranteed for plug-in rules.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62G05 Nonparametric estimation
62G20 Asymptotic properties of nonparametric inference
62C10 Bayesian problems; characterization of Bayes procedures
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] ALEXANDER, K. S. 1984. Probability inequalities for empirical processes and a law of the ASSOUAD, P. 1983. Deux remarques sur l’estimation. C. R. Acad. Sci. Paris 296 1021 1024. Z.
[2] BARRON, A. 1991. Complexity regularization with application to artificial neural networks. In Z. Nonparametric Functional Estimation and Related Topics G. Roussas, ed. 561 576. Kluwer, Dordrecht. Z. · Zbl 0739.62001
[3] BARRON, A. 1994. Approximation and estimation bounds for artificial neural networks. Machine Learning 14 115 133. Z. · Zbl 0818.68127
[4] BARRON, A., BIRGE, L. and MASSART, P. 1999. Risk bounds for model selection via penalization. Ṕrobab. Theory Related Fields 113 301 413. Z. · Zbl 0946.62036 · doi:10.1007/s004400050210
[5] BIRGE, L. and MASSART, P. 1993. Rates of convergence for minimum contrast estimators. Ṕrobab. Theory Related Fields 97 113 150. Z. · Zbl 0805.62037 · doi:10.1007/BF01199316
[6] BIRGE, L. and MASSART, P. 1998. Minimum contrast estimators on sieves: exponential bounds ánd rates of convergence. Bernoulli 4 329 375. Z. · Zbl 0954.62033 · doi:10.2307/3318720
[7] BLOCH, D. A. and SILVERMAN, B. W. 1997. Monotone discriminant functions and their applications in rheumathology. J. Amer. Statist. Assoc. 92 144 153. Z. JSTOR: · Zbl 0887.62071 · doi:10.2307/2291458
[8] BRETAGNOLLE, J. and HUBER, C. 1979. Estimation des densites: risque minimax. Z. Warsch. \' Verw. Gebiete 47 119 137. Z. · Zbl 0413.62024 · doi:10.1007/BF00535278
[9] DEVROYE, L., GYORFI, L. and LUGOSI, G. 1996. A Probabilistic Theory of Pattern Recognition. \" Springer, New York. Z. · Zbl 0853.68150
[10] DUDLEY, R. M. 1974. Metric entropy of some classes of sets with differentiable boundaries. J. Approx. Theory 10 227 236. Z. · Zbl 0275.41011 · doi:10.1016/0021-9045(74)90120-8
[11] HARTIGAN, J. A. 1987. Estimation of a convex density contour in two dimensions. J. Amer. Statist. Assoc. 82 267 270. Z. JSTOR: · Zbl 0607.62045 · doi:10.2307/2289162
[12] KOROSTELEV, A. P. and TSYBAKOV, A. B. 1993. Minimax Theory of Image Reconstruction. Lecture Notes in Statist. 82. Springer, New York. Z. · Zbl 0833.62039
[13] MAMMEN, E. 1991. Nonparametric regression under qualitative smoothness assumptions. Ann. Statist. 19 741 759. Z. · Zbl 0737.62039 · doi:10.1214/aos/1176348118
[14] MAMMEN, E. and TSYBAKOV, A. B. 1995. Asymptotic minimax recovery of sets with smooth boundaries. Ann. Statist. 23 502 524. Z. · Zbl 0834.62038 · doi:10.1214/aos/1176324533
[15] MARRON, J. S. 1983. Optimal rates of convergence to Bayes risk in nonparametric discrimination. Ann. Statist. 11 1142 1155. Z. · Zbl 0554.62053
[16] MULLER, D. W. 1993. The excess mass approach in statistics. Beitrage zur Statistik 3. Inst. \" \" Math. fur Angewandte, Univ. Heidelberg. \"
[17] MULLER, D. W. 1995. A backward-induction algorithm for computing the best convex contrast öf two bivariate samples. Beitrage zur Satistik 29. Inst. fur Angewandte, Univ. \" Ḧeidelberg. Z.
[18] MULLER, D. W. and SAWITZKI, G. 1991. Excess mass estimates and tests for multimodality. \" J. Amer. Statist. Assoc. 86 738 746. Z. · Zbl 0733.62040 · doi:10.2307/2290406
[19] POLONIK, W. 1995. Measuring mass concentrations and estimating density contour clusters: an excess mass approach. Ann. Statist. 23 855 881. Z. · Zbl 0841.62045 · doi:10.1214/aos/1176324626
[20] RUDEMO, M. and STRYHN, H. 1994. Approximating the distributions of maximum likelihood contour estimates in two-region images. Scand. J. Statist. 21 41 56. Z. · Zbl 0804.62044
[21] TSYBAKOV, A. B. 1997. On nonparametric estimation of density level sets. Ann. Statist. 25 948 969. Z. · Zbl 0881.62039 · doi:10.1214/aos/1069362732
[22] VAN DE GEER, S. 1991. The entropy bound for monotone functions. Technical Report 91 100. Univ. Leiden. Z.
[23] VAN DE GEER, S. 1995. The method of sieves and minimum contrast estimates. Math. Methods Statist. 4 20 38. Z. · Zbl 0831.62029
[24] VAN DE GEER, S. 1998. Applications of Empirical Process Theory to M-estimation. Unpublished manuscript. Z.
[25] VAPNIK, V. N. 1996. The Nature of Statistical Learning Theory. Springer, New York. Z. · Zbl 0934.62009
[26] VAPNIK, V. N. and CHERVONENKIS, A. JA. 1974. Theory of Pattern Recognition. Nauka, Moscow Z. in Russian. Z. · Zbl 0284.68070
[27] WONG, W. H. and SHEN, X. 1995. Probability inequalities for likelihood ratios and convergence rates of sieve MLEs. Ann. Statist. 23 339 362. · Zbl 0829.62002 · doi:10.1214/aos/1176324524
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.