##
**Arcing classifiers. (With discussion).**
*(English)*
Zbl 0934.62064

Summary: Recent work has shown that combining multiple versions of unstable classifiers such as trees or neural nets results in reduced test set error. One of the more effective is bagging. Here, modified training sets are formed by resampling from the original training set, classifiers constructed using these training sets and then combined by voting. Y. Freund and R. Schapire [in L. Saitta (ed.), Machine Learning: Proc. Thirteenth Int. Conf. 148-156 (1996); see also Ann. Stat. 26, No. 5, 1651-1686 (1998; Zbl 0929.62069)] propose an algorithm the basis of which is to adaptively resample and combine (hence the acronym “arcing”) so that the weights in the resampling are increased for those cases most often misclassified and the combining is done by weighted voting.

Arcing is more successful than bagging in test set error reduction. We explore two arcing algorithms, compare them to each other and to bagging, and try to understand how arcing works. We introduce the definitions of bias and variance for a classifier as components of the test set error. Unstable classifiers can have low bias on a large range of data sets. Their problem is high variance. Combining multiple versions either through bagging or arcing reduces variance significantly.

Arcing is more successful than bagging in test set error reduction. We explore two arcing algorithms, compare them to each other and to bagging, and try to understand how arcing works. We introduce the definitions of bias and variance for a classifier as components of the test set error. Unstable classifiers can have low bias on a large range of data sets. Their problem is high variance. Combining multiple versions either through bagging or arcing reduces variance significantly.

### MSC:

62H30 | Classification and discrimination; cluster analysis (statistical aspects) |

### Keywords:

ensemble methods; decision trees; neural networks; boosting; error-correcting; output coding; Markov chain Monte Carlo; bagging### Citations:

Zbl 0929.62069
Full Text:
DOI

### References:

[1] | ALI, K. 1995. Learning probabilistic relational concept descriptions. Ph.D. dissertation, Dept. Computer Science, Univ. California, Irvine. Z. |

[2] | BREIMAN, L. 1996a. Bagging predictors. Machine Learning 26 123 140. Z. · Zbl 0858.68080 |

[3] | BREIMAN, L. 1996b. The heuristics of instability in model selection. Ann. Statist. 24 2350 2383. Z. · Zbl 0867.62055 |

[4] | BREIMAN, L., FRIEDMAN, J., OLSHEN, R. and STONE, C. 1984. Classification and Regression Trees. Chapman and Hall, London. Z. · Zbl 0541.62042 |

[5] | DRUCKER, H. and CORTES, C. 1996. Boosting decision trees. Advances in Neural Information Processing Sy stems 8 479 485. Z. |

[6] | FREUND, Y. and SCHAPIRE, R. 1996. Experiments with a new boosting algorithm. In Machine Z. Learning: Proceedings of the Thirteenth International Conference L. Saitta, ed. 148 156. Morgan Kaufmann, San Francisco. Z. |

[7] | FREUND, Y. and SCHAPIRE, R. 1997. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Sy stem Sci. 55 119 139. · Zbl 0880.68103 |

[8] | FRIEDMAN, J. H. 1996. On bias, variance, 0 1-loss, and the curse of dimensionality. Journal of Knowledge Discovery and Data Mining. To appear. Z. |

[9] | GEMAN, S., BIENENSTOCK, E. and DOURSAT, R. 1992. Neural networks and the bias variance dilemma. Neural Computations 4 1 58. Z. |

[10] | HASTIE, T. and TIBSHIRANI, R. 1994. Handwritten digit recognition via deformable prototy pes. Unpublished manuscript. Available at ftp stat.stanford.edu pub hastie/ zip.ps.Z. Z. |

[11] | KEARNS, M. and VALIANT, L. G. 1988. Learning Boolean formulae or finite automata is as hard as factoring. Technical Report TR-14-88, Aiken Computation Laboratory, Harvard Univ. Z. |

[12] | KEARNS, M. and VALIANT, L. G. 1989. Cry ptographic limitations on learning Boolean formulae and finite automata. In Proceedings of the Twenty-First Annual ACM Sy mposium on Theory of Computing. 433 444. ACM Press, New York. Z. |

[13] | KOHAVI, R. and WOLPERT, D. H. 1996. Bias plus variance decomposition for zero-one loss functions. In Machine Learning: Proceedings of the Thirteenth International ConferZ. ence. L. Saitta, ed. 275 283. Morgan Kaufmann, San Francisco. Z. |

[14] | KONG, E. B. and DIETTERICH, T. G. 1995. Error-correcting output coding corrects bias and variance. In Proceedings of the Twelfth International Conference on Machine Learning Z. A. Prieditis and S. Russell, eds. 313 321. Morgan Kaufmann, San Francisco. |

[15] | LE CUN, Y., BOSER, B., DENKER, J., HENDERSON, D., HOWARD, R., HUBBARD, W. and JACKEL, L. Z. 1990. Handwritten digit recognition with a back-propagation network. Advances in Neural Information Processing Sy stems 2 396 404. Z. |

[16] | MICHIE, D., SPIEGELHALTER, D. and TAy LOR, C. 1994. Machine Learning, Neural and Statistical Classification. Ellis Horwood, London. Z. · Zbl 0827.68094 |

[17] | QUINLAN, J. R. 1996. Bagging, Boosting, and C4.5. In Proceedings of AAAI ’96 National Conference on Artificial Intelligence 725 730. Z. |

[18] | SCHAPIRE, R. 1990. The strength of weak learnability. Machine Learning 5 197 227. Z. |

[19] | SIMARD, P., LE CUN, Y. and DENKER, J. 1993. Efficient pattern recognition using a new transformation distance. Advances in Neural Information Processing Sy stems 5 50 58. Z. |

[20] | TIBSHIRANI, R. 1996. Bias, variance, and prediction error for classification rules. Technical Report, Dept. Statistics, Univ. Toronto. Z. |

[21] | VAPNIK, V. 1995. The Nature of Statistical Learning Theory. Springer, New York. · Zbl 0833.62008 |

[22] | BERKELEY, CALIFORNIA 94720-3860 E-MAIL: leo@stat.berkeley.edu |

[23] | BREIMAN, L. 1996. Bagging predictors. Machine Learning 26 123 140. · Zbl 0858.68080 |

[24] | BREIMAN, L. 1996. The heuristics of instability in model selection. Ann. Statist. 24 2350 2383. · Zbl 0867.62055 |

[25] | DRUCKER, H. and CORTES, C. 1996. Boosting decision trees. Advances in Neural Information Processing Sy stems 8 479 485. |

[26] | FLOy D, S. and WARMUTH, M. 1995. Sample compression, learnability, and the Vapnik Chervonenkis dimension. Machine Learning 21 269 304. |

[27] | FREUND, Y. 1995. Boosting a weak learning algorithm by majority. Inform. and Comput. 121 256 285. · Zbl 0833.68109 |

[28] | FREUND, Y. and SCHAPIRE, R. E. 1997. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Sy stem Sci. 55 119 139. · Zbl 0880.68103 |

[29] | KEARNS, M. and VALIANT, L. G. 1994. Cry ptographic limitations on learning Boolean formulae and finite automata. J. Assoc. Comput. Mach. 4 67 95. · Zbl 0807.68073 |

[30] | KOHAVI, R. and WOLPERT, D. H. 1996. Bias plus variance decomposition for zero-one loss functions. In Machine Learning: Proceedings of the Thirteenth International Z. ConferenceL. Saitta, ed. 275 283. Morgan Kaufmann, San Francisco. |

[31] | KONG, E. B. and DIETTERICH, T. G. 1995. Error-correcting output coding corrects bias and variance. In Proceedings of the Twelfth International Conference on Machine Learning Z. A. Prieditis and S. Russell, eds. 313 321. Morgan Kaufmann, San Francisco. |

[32] | QUINLAN, J. R. 1996. Bagging, boosting, and C4.5. In Proceedings of the Thirteenth National Conference on Artificial Intelligence 725 730. |

[33] | QUINLAN, J. R. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco. · Zbl 0900.68112 |

[34] | SCHAPIRE, R. E. 1990. The strength of weak learnability. Machine Learning 5 197 227. |

[35] | SCHAPIRE, R. E., FREUND, Y., BARTLETT, P. and LEE, W. S. 1998. Boosting the margin: a new explanation for the effectiveness of voting methods. Ann. Statist. 26. · Zbl 0929.62069 |

[36] | TIBSHIRANI, R. 1996. Bias, variance and prediction error for classification rules. Technical Report, Univ. Toronto. |

[37] | VALIANT, L. G. 1994. A theory of the learnable. Communications of the ACM 27 1134 1142. · Zbl 0587.68077 |

[38] | VAPNIK, V. N. 1982. Estimation of Dependences Based on Empirical Data. Springer, New York. · Zbl 0499.62005 |

[39] | FLORHAM PARK, NEW JERSEY 07932-0971 E-MAIL: yoav@research.att.com schapire@research.att.com Q I, where X is one component of a fixed-length feature vector and m Ä X c4 i i c is a constant. One could of course use multivariate functions or “trans-gen erated features” 7. Suppose we examine some of the queries by constructing a single binary tree TT by the usual data-driven induction method: stepwise entropy reduction estimated from a training set LL. Since we cannot entertain all possible splits at each node, we exploit a natural partial ordering on the set Q and examine only a tiny fraction of them. Basically we incrementally grow the geometric arrangements as we proceed down the tree. The classifier based on Z. Z. TT is then C Q, LL arg max P Y j TT. If the depths of the leaves of TT j Z. are far smaller than M, then evidently C Q, LL is not the Bay es classifier. However, for depths on the order of hundreds or thousands we could expect that P Y j TT P Y j Q, Z. Z. Z. the difference in some appropriate norm being a kind of “approximation error.” Of course, we cannot actually create or store a tree of such depth, and Z the best classification rate we obtained with a single tree of average depth. around ten was about 90 |

[40] | TT, P Y k TT Y j, 1 n, m N. Naturally, small covariances lead to n m small errors. More generally, if the trees are produced with some sampling mechanism from the population of trees, involving either resampling from LL or random restrictions on the queries, then the quantities above can be analyzed by taking expectations relative to the space of trees. In regard to arcing, probably the deterministic reweighting on misclassified examples produces new trees which are quite different from the previous ones. Moreover, the errors induced on data points which were correctly classified by the existing trees are sufficiently randomized to avoid any sy stematic deterioration. |

[41] | AMIT, Y. and GEMAN, D. 1994. Randomized inquiries about shape; an application to handwritten digit recognition. Technical Report 401, Univ. Chicago. |

[42] | AMIT, Y. and GEMAN, D. 1997. Shape quantization and recognition with randomized trees. Neural Computation 9 1545 1588. |

[43] | AMIT, Y., GEMAN, D. and JEDy NAK, B. 1998. Efficient focusing and face detection. In Face Z. Recognition: From Theory to Applications H. Wechsler and J. Phillips, eds. Springer, Berlin. |

[44] | AMIT, Y., GEMAN, D. and WILDER, K. 1997. Joint induction of shape features and tree classifiers. IEEE Trans. PAMI 19 1300 1306. |

[45] | BREIMAN, L., FRIEDMAN, J., OLSHEN, R. and STONE, C. 1984. Classification and Regression Trees. Wadsworth, Belmont, CA. · Zbl 0541.62042 |

[46] | FREUND, Y. and SCHAPIRE, R. 1997. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Sy stem Sci. 55 119 139. · Zbl 0880.68103 |

[47] | FRIEDMAN, J. H. 1973. A recursive partitioning decision rule for nonparametric classification. IEEE Trans. Comput. 26 404 408. · Zbl 0403.62036 |

[48] | GEMAN, S., BIENENSTOCK, E. and DOURSAT, R. 1992. Neural networks and the bias variance dilemma. Neural Computation 4 1 58. |

[49] | SHLIEN, S. 1990. Multiple binary decision tree classifiers. Pattern Recognition 23 757 763. |

[50] | WILDER, K. 1998. Decision tree algorithms for handwritten digit recognition. Ph.D. dissertation, Univ. Massachusetts, Amherst. |

[51] | CHICAGO, ILLINOIS 60637 UNIVERSITY OF MASSACHUSETTS E-MAIL: amit@galton.uchicago.edu AMHERST, MASSACHUSETTS 01003 E-MAIL: geman@math.umass.edu ALI, K. M. and PAZZANI, M. J. 1996. Error reduction through learning multiple descriptions. Machine Learning 24 173 202. |

[52] | CHERKAUER, K. J. 1996. Human expert-level performance on a scientific image analysis task by a sy stem using combined artificial neural networks. In Working Notes of the AAAI Z. Workshop on Integrating Multiple Learned Models P. Chan, ed. 15 21. AAAI Press, Menlo Park, CA. Z. |

[53] | DIETTERICH, T. G. and BAKIRI, G. 1995. Solving multiclass learning problems via error-correcting output codes. J. Artificial Intelligence Res. 2 263 286. Z. · Zbl 0900.68358 |

[54] | DIETTERICH, T. G. and KONG, E. B. 1995. Machine learning bias, statistical bias, and statistical variance of decision tree algorithms. Technical Report, Dept. Computer Science, Oregon State Univ., Corvallis, Oregon. Available from ftp: ftp.cs.orst. edu pub tgd papers tr-bias.ps gz. Z. |

[55] | FREUND, Y. and SCHAPIRE, R. E. 1996. Experiments with a new boosting algorithm. In ProceedZ. ings of the Thirteenth International Conference on Machine Learning L. Saitta, ed. 148 156. Morgan Kaufmann, San Francisco. Z. |

[56] | HASHEM, S. 1993. Optimal linear combinations of neural networks. Ph.D. dissertation, School of Industrial Engineering, Purdue Univ., Lafay ette, IN. Z. |

[57] | KONG, E. B. and DIETTERICH, T. G. 1995. Error-correcting output coding corrects bias and Z variance. In Twelfth International Conference on Machine Learning A. Prieditis and. S. Russell, eds. 313 321. Morgan Kaufmann, San Francisco. Z. |

[58] | MACKAY, D. 1992. A practical bay esian framework for backpropagation networks. Neural Computation 4 448 472. Z. |

[59] | NEAL, R. 1993. Probabilistic inference using Markov chain Monte Carlo methods. Technical Report CRG-TR-93-1, Dept. Computer Science, Univ. Toronto. Z. |

[60] | PERRONE, M. P. and COOPER, L. N. 1993. When networks disagree: ensemble methods for Z hy brid neural networks. In Neural Networks for Speech and Image Processing R. J.. Mammone, ed. 126 142. Chapman and Hall, London. Z. |

[61] | QUINLAN, J. R. 1993. C4.5: Programs for Empirical Learning. Morgan Kaufmann, San Francisco.Z. · Zbl 0900.68112 |

[62] | SCHAPIRE, R. E. 1997. Using output codes to boost multiclass learning problems. In Proceedings of the Fourteenth International Conference on Machine Learning 313 321. Morgan Kaufmann, San Francisco. |

[63] | CORVALLIS, OREGON 97331-3202 E-MAIL: tgd@cs.orst.edu AMIT, Y. and GEMAN, D. 1997. Shape quantization and recognition with randomized trees. Neural Computation 9 1545 1588. Z. |

[64] | BREIMAN, L. 1996c. Out-of-bag estimation. Available at ftp.stat users breimanas OOBestimation. Z. · Zbl 1059.01542 |

[65] | BREIMAN, L. 1997. Prediction games and arcing algorithms. Technical Report 504, Dept. Statistics, Univ. California, Berkeley. Available at www.stat.berkeley.edu. Z. URL: |

[66] | DIETTERICH, T. 1998. An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting and randomization. Machine Learning 1 22. Z. |

[67] | FREUND, Y. and SCHAPIRE, R. 1996. Experiments with a new boosting algorithm. In Machine Z. Learning: Proceedings of the Thirteenth International Conference L. Saitta, ed. 148 156. Morgan Kaufmann, San Francisco. Z. |

[68] | JI, C. and MA. S. 1997. Combinations of weak classifiers. IEEE Trans. Neural Networks 8 32 42. Z. |

[69] | KONG, E. B. and DIETTERICH, T. G. 1995. Error-correcting output coding corrects bias and variance. In Proceedings of the Twelfth International Conference on Machine Learning Z. A. Prieditis and S. Russell, eds. 313 321. Morgan Kaufmann, San Francisco. Z. |

[70] | SCHAPIRE, R., FREUND, Y., BARTLETT, P. and LEE, W. S. 1998. Boosting the margin: a new explanation for the effectiveness of voting methods. Ann. Statist. 26. To appear. Z. · Zbl 0929.62069 |

[71] | VAPNIK, V. N. 1995. The Nature of Statistical Learning Theory. Springer, New York. · Zbl 0833.62008 |

[72] | BERKELEY, CALIFORNIA 94720-3860 E-MAIL: leo@stat.berkeley.edu |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.