Some theory for Fisher’s linear discriminant function, ‘naive Bayes’, and some alternatives when there are many more variables than observations. (English) Zbl 1064.62073

Summary: We show that the ‘naive Bayes’ classifier which assumes independent covariates greatly outperforms the Fisher linear discriminant rule under broad conditions when the number of variables grows faster than the number of observations, in the classical problem of discriminating between two normal populations. We also introduce a class of rules spanning the range between independence and arbitrary dependence. These rules are shown to achieve Bayes consistency for the Gaussian ‘coloured noise’ model and to adapt to a spectrum of convergence rates, which we conjecture to be minimax.


62H30 Classification and discrimination; cluster analysis (statistical aspects)
62F15 Bayesian inference
62M15 Inference from stochastic processes and spectral analysis
Full Text: DOI


[1] Böttcher, A., Dijksma, A., Langer, H., Dritschel, M., Rovnyak, J. and Kaashoek, M. (1996) Lectures on Operator Theory and Its Applications. Providence, RI: American Mathematical Society. · Zbl 0830.00012
[2] Bradley, T. (2002) On positive spectral density functions. Bernoulli, 8, 175-193. Abstract can also be found in the ISI/STMA publication URL: · Zbl 1004.60054
[3] De Vore, R. and Lorentz, G. (1993) Constructive Approximation. Berlin: Springer-Verlag.
[4] Domingos, P. and Pazzani, M. (1997) On the optimality of the simple Bayesian classifier under zeroone loss. Machine Learning, 29, 103-130. · Zbl 0892.68076
[5] Donoho, D.L., Johnstone, I.M., Kerkyacharian, G. and Pickard, D. (1995) Wavelet shrinkage: asymptopia? (with discussion). J. Roy. Statist. Soc. Ser. B, 57, 301-369. JSTOR: · Zbl 0827.62035
[6] Dudoit, S., Fridlyand, J. and Speed, T.P. (2002) Comparison of discrimination methods for the classification of tumors using gene expression data. J. Amer. Statist. Assoc., 97, 77-87. Abstract can also be found in the ISI/STMA publication URL: JSTOR: links.jstor.org · Zbl 1073.62576
[7] Greenshtein, E. and Ritov, Y. (2004) Consistency in high dimensional linear predictor selection and the virtue of overparametrization. Bernoulli, 10, 971-988. · Zbl 1055.62078
[8] Grenander, U. and Szegö, G. (1984) Toeplitz Forms and Their Applications. New York: Chelsea. · Zbl 0611.47018
[9] Johnstone, I.M. (2002) Function estimation and Gaussian sequence models. Manuscript. · Zbl 1037.91527
[10] Levina, E. (2002) Statistical issues in texture analysis. PhD thesis, University of California, Berkeley.
[11] Lewis, D.D. (1998) Naive (Bayes) at forty: The independence assumption in information retrieval. In C. Nédellec and C. Rouveirol (eds), Proceedings of ECML-98, 10th European Conference on Machine Learning, pp. 4-15. Heidelberg: Springer-Verlag.
[12] Luenberger, D.G. (1984) Linear and Nonlinear Programming. Addison-Wesley. · Zbl 0571.90051
[13] McLachlan, G.J. (1992) Discriminant Analysis and Statistical Pattern Recognition. New York: Wiley. · Zbl 1108.62317
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.