×

Optimal weighted nearest neighbour classifiers. (English) Zbl 1373.62317

Summary: We derive an asymptotic expansion for the excess risk (regret) of a weighted nearest-neighbour classifier. This allows us to find the asymptotically optimal vector of nonnegative weights, which has a rather simple form. We show that the ratio of the regret of this classifier to that of an unweighted \(k\)-nearest neighbour classifier depends asymptotically only on the dimension \(d\) of the feature vectors, and not on the underlying populations. The improvement is greatest when \(d=4\), but thereafter decreases as \(d\to\infty\). The popular bagged nearest neighbour classifier can also be regarded as a weighted nearest neighbour classifier, and we show that its corresponding weights are somewhat suboptimal when \(d\) is small (in particular, worse than those of the unweighted \(k\)-nearest neighbour classifier when \(d=1\)), but are close to optimal when \(d\) is large. Finally, we argue that improvements in the rate of convergence are possible under stronger smoothness assumptions, provided we allow negative weights. Our findings are supported by an empirical performance comparison on both simulated and real data sets.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62G20 Asymptotic properties of nonparametric inference

Software:

UCI-ml
PDF BibTeX XML Cite
Full Text: DOI arXiv Euclid

References:

[1] Athitsos, V. and Sclaroff, S. (2005). Boosting nearest neighbour classifiers for multiclass recognition. In Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition 45-55. IEEE Computer Society, Washington, DC.
[2] Audibert, J.-Y. and Tsybakov, A. B. (2007). Fast learning rates for plug-in classifiers. Ann. Statist. 35 608-633. · Zbl 1118.62041
[3] Bailey, T. and Jain, A. (1978). A note on distance-weighted \(k\)-nearest neighbour rules. Transactions on Systems , Man , and Cybernetics 8 311-313. · Zbl 0383.68067
[4] Biau, G., Cérou, F. and Guyader, A. (2010). On the rate of convergence of the bagged nearest neighbour estimate. J. Mach. Learn. Res. 11 687-712. · Zbl 1242.62025
[5] Biau, G. and Devroye, L. (2010). On the layered nearest neighbour estimate, the bagged nearest neighbour estimate and the random forest method in regression and classification. J. Multivariate Anal. 101 2499-2518. · Zbl 1198.62048
[6] Blanchard, G., Bousquet, O. and Massart, P. (2008). Statistical performance of support vector machines. Ann. Statist. 36 489-531. · Zbl 1133.62044
[7] Boucheron, S., Bousquet, O. and Lugosi, G. (2005). Theory of classification: A survey of some recent advances. ESAIM Probab. Stat. 9 323-375. · Zbl 1136.62355
[8] Breiman, L. (1996). Heuristics of instability and stabilization in model selection. Ann. Statist. 24 2350-2383. · Zbl 0867.62055
[9] Breiman, L. (1999). Using adaptive bagging to debias regressions. Technical report, Dept. Statistics, Univ. California, Berkeley. · Zbl 1052.68109
[10] Breiman, L. (2001). Random forests. Mach. Learn. 45 5-32. · Zbl 1007.68152
[11] Chanda, K. C. and Ruymgaart, F. H. (1989). Asymptotic estimate of probability of misclassification for discriminant rules based on density estimates. Statist. Probab. Lett. 8 81-88. · Zbl 0668.62040
[12] Cortes, C. and Vapnik, V. (1995). Support-vector networks. Mach. Learn. 20 273-297. · Zbl 0831.68098
[13] Cover, T. M. and Hart, P. E. (1967). Nearest neighbour pattern classification. IEEE Trans. Inform. Theory 13 21-27. · Zbl 0154.44505
[14] Devroye, L., Györfi, L. and Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition. Applications of Mathematics ( New York ) 31 . Springer, New York. · Zbl 0853.68150
[15] Fix, E. and Hodges, J. L. (1951). Discriminatory analysis-Nonparametric discrimination: Consistency properties. Technical Report 4, Project no. 21-29-004, USAF School of Aviation Medicine, Randolph Field, Texas. · Zbl 0715.62080
[16] Fix, E. and Hodges, J. L. (1989). Discriminatory analysis-Nonparametric discrimination: Consistency properties. Internat. Statist. Rev. 57 238-247. · Zbl 0715.62080
[17] Frank, A. and Asuncion, A. (2010). UCI machine learning repository. School of Information and Computer Sciences, Univ. California, Irvine, CA. Available at .
[18] Gordon, A. D. (1999). Classification , 2nd ed. Chapman & Hall, London. · Zbl 0929.62068
[19] Gray, A. (2004). Tubes , 2nd ed. Progress in Mathematics 221 . Birkhäuser, Basel. · Zbl 0692.53001
[20] Guillemin, V. and Pollack, A. (1974). Differential Topology . Prentice-Hall, Englewood Cliffs, NJ. · Zbl 0361.57001
[21] Hall, P. and Kang, K.-H. (2005). Bandwidth choice for nonparametric classification. Ann. Statist. 33 284-306. · Zbl 1064.62075
[22] Hall, P., Park, B. U. and Samworth, R. J. (2008). Choice of neighbour order in nearest-neighbour classification. Ann. Statist. 36 2135-2152. · Zbl 1274.62421
[23] Hall, P. and Samworth, R. J. (2005). Properties of bagged nearest neighbour classifiers. J. R. Stat. Soc. Ser. B Stat. Methodol. 67 363-379. · Zbl 1069.62051
[24] Hand, D. J. (1981). Discrimination and Classification . Wiley, Chichester. · Zbl 0587.62119
[25] Ibragimov, I. A. and Khasminskiĭ, R. Z. (1980). Nonparametric regression estimation. Dokl. Akad. Nauk SSSR 252 780-784. · Zbl 0507.62064
[26] Ibragimov, I. A. and Khasminskiĭ, R. Z. (1981). Statistical Estimation. Applications of Mathematics 16 . Springer, New York.
[27] Ibragimov, I. A. and Khasminskiĭ, R. Z. (1982). On the bounds for quality of nonparametric regression function estimation. Theory Probab. Appl. 27 81-94. · Zbl 0494.62042
[28] Lepskiĭ, O. V. (1991). Asymptotically minimax adaptive estimation. I. Upper bounds. Optimally adaptive estimates. Theory Probab. Appl. 36 682-697. · Zbl 0776.62039
[29] Mammen, E. and Tsybakov, A. B. (1999). Smooth discrimination analysis. Ann. Statist. 27 1808-1829. · Zbl 0961.62058
[30] Martínez-Muñoz, G. and Suárez, A. (2010). Out-of-bag estimation of the optimal sample size in bagging. Pattern Recognition 43 143-152. · Zbl 0431.94001
[31] Moore, J. D. (1992). Book review: Tubes. Bull. Amer. Math. Soc. ( N.S. ) 27 311-313.
[32] Polonik, W. (1995). Measuring mass concentrations and estimating density contour clusters-An excess mass approach. Ann. Statist. 23 855-881. · Zbl 0841.62045
[33] Raudys, Š. and Young, D. M. (2004). Results in statistical discriminant analysis: A review of the former Soviet Union literature. J. Multivariate Anal. 89 1-35. · Zbl 1036.62053
[34] Rigollet, P. and Vert, R. (2009). Optimal rates for plug-in estimators of density level sets. Bernoulli 15 1154-1178. · Zbl 1200.62034
[35] Royall, R. (1966). A class of nonparametric estimators of a smooth regression function. Ph.D. thesis, Stanford Univ., Stanford, CA.
[36] Samworth, R. J. (2012). Supplement to “Optimal weighted nearest neighbour classifiers.” . · Zbl 1373.62317
[37] Samworth, R. J. and Wand, M. P. (2010). Asymptotics and optimal bandwidth selection for highest density region estimation. Ann. Statist. 38 1767-1792. · Zbl 1189.62061
[38] Shorack, G. R. and Wellner, J. A. (1986). Empirical Processes with Applications to Statistics . Wiley, New York. · Zbl 1170.62365
[39] Steele, B. M. (2009). Exact bootstrap \(k\)-nearest neighbour learners. Mach. Learn. 74 235-255.
[40] Steinwart, I. and Christmann, A. (2008). Support Vector Machines . Springer, New York. · Zbl 1203.68171
[41] Stone, C. J. (1977). Consistent nonparametric regression. Ann. Statist. 5 595-645. · Zbl 0366.62051
[42] Tsybakov, A. B. (2004). Optimal aggregation of classifiers in statistical learning. Ann. Statist. 32 135-166. · Zbl 1105.62353
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.