×

Boosting conditional probability estimators. (English) Zbl 1407.62118

Summary: In the standard agnostic multiclass model, \(<\)instance, label\(>\) pairs are sampled independently from some underlying distribution. This distribution induces a conditional probability over the labels given an instance, and our goal in this paper is to learn this conditional distribution. Since even unconditional densities are quite challenging to learn, we give our learner access to \(<\)instance, conditional distribution\(>\) pairs. Assuming a base learner oracle in this model, we might seek a boosting algorithm for constructing a strong learner. Unfortunately, without further assumptions, this is provably impossible. However, we give a new boosting algorithm that succeeds in the following sense: given a base learner guaranteed to achieve some average accuracy (i.e., risk), we efficiently construct a learner that achieves the same level of accuracy with arbitrarily high probability. We give generalization guarantees of several different kinds, including distribution-free accuracy and risk bounds. None of our estimates depend on the number of boosting rounds and some of them admit dimension-free formulations.

MSC:

62G07 Density estimation
62H30 Classification and discrimination; cluster analysis (statistical aspects)
65C50 Other computational problems in probability (MSC2010)

Software:

AdaBoost.MH
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Alon, N., Ben-David, S., Cesa-Bianchi, N., Haussler, D.: Scale-sensitive dimensions, uniform convergence, and learnability. J. ACM 44(4), 615-631 (1997) · Zbl 0891.68086 · doi:10.1145/263867.263927
[2] Bartlett, P., Shawe-Taylor, J.: Generalization performance of support vector machines and other pattern classifiers (1999) · Zbl 1401.62096
[3] Breiman, L.: Arcing classifier (with discussion and a rejoinder by the author). Ann. Statist. 26, 801-849 (1998) · Zbl 0934.62064 · doi:10.1214/aos/1024691079
[4] Das, D.; Petrov, S., Unsupervised part-of-speech tagging with bilingual graph-based projections, 600-609 (2011), Stroudsburg
[5] Devroye, L., Lugosi, G.: Combinatorial methods in density estimation, springer series in statistics. Springer, New York (2001) · Zbl 0964.62025 · doi:10.1007/978-1-4613-0125-7
[6] Duffy, N., Helmbold, D.: Boosting methods for regression. Mach. Learn. 47, 153-200 (2002) · Zbl 0998.68113 · doi:10.1023/A:1013685603443
[7] Fan, W., Stolfo, S.J., Zhang, J., Chan, P.K.: Adacost: misclassification cost-sensitive boosting. In: ICML, pp. 97-105 (1999)
[8] Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119-139 (1997) · Zbl 0880.68103 · doi:10.1006/jcss.1997.1504
[9] Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: a statistical view of boosting. Ann. Stat. 28, 337-374 (2000) · Zbl 1106.62323 · doi:10.1214/aos/1016218223
[10] Gibbs, A.L., Su, F.E.: On choosing and bounding probability metrics. Int. Stat. Rev. 70(3), 419-435 (2002) · Zbl 1217.62014 · doi:10.1111/j.1751-5823.2002.tb00178.x
[11] Gottlieb, L.A., Kontorovich, L., Krauthgamer, R.: Efficient classification for metric data. In: COLT, pp. 433-440 (2010) · Zbl 1360.62332
[12] Gupta, A., Krauthgamer, R., Lee, J.R.: Bounded geometries, fractals, and low-distortion embeddings. In: FOCS, pp. 534-543 (2003)
[13] Kanamori, T.: Deformation of log-likelihood loss function for multiclass boosting. Neural Netw. 23(7), 843-864 (2010) · Zbl 1401.62096 · doi:10.1016/j.neunet.2010.05.009
[14] Krauthgamer, R., Lee, J.R.: Navigating nets: Simple algorithms for proximity search. In: 15th annual ACM-SIAM Symposium on discrete algorithms, pp. 791-801 (2004) · Zbl 1318.68071
[15] McDiarmid, C.; Siemons, J. (ed.), On the method of bounded differences, 148-188 (1989), San Mateo · Zbl 0712.05012
[16] Mease, D., Wyner, A.J., Buja, A.: Boosted classification trees and class probability/quantile estimation. J. Mach. Learn. Res. 8, 409-439 (2007) · Zbl 1222.68261
[17] Schapire, R.E., Freund, Y., Bartlett, P., Lee, W.S.: Boosting the margin: a new explanation for the effectiveness of voting methods. Ann. Statist. 26(5), 1651-1686 (1998) · Zbl 0929.62069 · doi:10.1214/aos/1024691352
[18] Talagrand, M.: New concentration inequalities in product spaces. Invent. Math. 126(3), 505-563 (1996) · Zbl 0893.60001 · doi:10.1007/s002220050108
[19] Toutanova, K., Cherry, C.: A global model for joint lemmatization and part-of-speech prediction. In: Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the AFNLP, ACL ’09, pp. 486-494 (2009) · Zbl 1222.68261
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.