×

Cost-sensitive boosting for classification of imbalanced data. (English) Zbl 1122.68505

Summary: Classification of data with imbalanced class distribution has posed a significant drawback of the performance attainable by most standard classifier learning algorithms, which assume a relatively balanced class distribution and equal misclassification costs. The significant difficulty and frequent occurrence of the class imbalance problem indicate the need for extra research efforts. The objective of this paper is to investigate meta-techniques applicable to most classifier learning algorithms, with the aim to advance the classification of imbalanced data. The AdaBoost algorithm is reported as a successful meta-technique for improving classification accuracy. The insight gained from a comprehensive analysis of the AdaBoost algorithm in terms of its advantages and shortcomings in tacking the class imbalance problem leads to the exploration of three cost-sensitive boosting algorithms, which are developed by introducing cost items into the learning framework of AdaBoost. Further analysis shows that one of the proposed algorithms tallies with the stagewise additive modelling in statistics to minimize the cost exponential loss. These boosting algorithms are also studied with respect to their weighting strategies towards different types of samples, and their effectiveness in identifying rare cases through experiments on several real world medical data sets, where the class imbalance problem prevails.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
68W05 Nonnumerical algorithms
68P15 Database theory
68T10 Pattern recognition, speech recognition
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Chawla, N. V.; Japkowicz, N.; Kolcz, A., Editorial: special issue on learning from imbalanced data sets, SIGKDD Explorations Special Issue on Learning from Imbalanced Datasets, 6, 1, 1-6 (2004)
[2] Fawcett, T. E.; Provost, F., Adaptive fraud detection, Data Mining Knowl. Discovery, 1, 3, 291-316 (1997)
[3] Kubat, M.; Holte, R.; Matwin, S., Machine learning for the detection of oil spills in satellite radar images, Mach. Learn., 30, 195-215 (1998)
[4] Riddle, P.; Segal, R.; Etzioni, O., Representation design and brute-force induction in a boeing manufacturing domain, Appl. Artif. Intell., 8, 125-147 (1991)
[5] Murph, P. M.; Aha, D. W., UCI repository of machine learning databases (1991), Department of Information and Computer Science: Department of Information and Computer Science University of California, Irvine
[6] K. Ezawa, M. Singh, S.W. Norton, Learning goal oriented Bayesian networks for telecommunications risk management, in: Proceedings of the Thirteenth International Conference on Machine Learning, Bari, Italy, 1996, pp. 139-147.; K. Ezawa, M. Singh, S.W. Norton, Learning goal oriented Bayesian networks for telecommunications risk management, in: Proceedings of the Thirteenth International Conference on Machine Learning, Bari, Italy, 1996, pp. 139-147.
[7] C. Cardie, N. Howe, Improving minority class predication using case-specific feature weights, in: Proceedings of the fourteenth International Conference on Machine Learning, Nashville, TN, July 1997, pp. 57-65.; C. Cardie, N. Howe, Improving minority class predication using case-specific feature weights, in: Proceedings of the fourteenth International Conference on Machine Learning, Nashville, TN, July 1997, pp. 57-65.
[8] G.E.A.P.A. Batista, R.C. Prati, M.C. Monard, A study of the behavior of several methods for balancing machine learning training data, SIGKDD Explorations Special Issue on Learning from Imbalanced Datasets, vol. 6(1), 2004, pp. 20-29.; G.E.A.P.A. Batista, R.C. Prati, M.C. Monard, A study of the behavior of several methods for balancing machine learning training data, SIGKDD Explorations Special Issue on Learning from Imbalanced Datasets, vol. 6(1), 2004, pp. 20-29.
[9] Japkowicz, N.; Stephen, S., The class imbalance problem: a systematic study, Intell. Data Anal. J., 6, 5, 429-450 (2002) · Zbl 1085.68628
[10] G. Weiss, Mining with rarity: a unifying framework, SIGKDD Explorations Special Issue on Learning from Imbalanced Datasets, vol. 6(1), 2004, pp. 7-19.; G. Weiss, Mining with rarity: a unifying framework, SIGKDD Explorations Special Issue on Learning from Imbalanced Datasets, vol. 6(1), 2004, pp. 7-19.
[11] R. Akbani, S. Kwek, N. Jakowicz, Applying support vector machines to imbalanced datasets, in: Proceedings of European Conference on Machine Learning, Pisa, Italy, September 2004, pp. 39-50.; R. Akbani, S. Kwek, N. Jakowicz, Applying support vector machines to imbalanced datasets, in: Proceedings of European Conference on Machine Learning, Pisa, Italy, September 2004, pp. 39-50. · Zbl 1132.68523
[12] B. Raskutti, A. Kowalczyk, Extreme rebalancing for SVMs: a case study, in: Proceedings of European Conference on Machine Learning, Pisa, Italy, September 2004, pp. 60-69.; B. Raskutti, A. Kowalczyk, Extreme rebalancing for SVMs: a case study, in: Proceedings of European Conference on Machine Learning, Pisa, Italy, September 2004, pp. 60-69.
[13] Wu, G.; Chang, E. Y., Class-boundary alignment for imbalanced dataset learning, (Proceedings of the ICML’03 Workshop on Learning from Imbalanced Data Sets (August 2003), Washington: Washington DC)
[14] Zhang, J.; Mani, I., KNN approach to unbalanced data distributions: a case study involving information extraction, (Proceedings of the ICML’03 Workshop on Learning from Imbalanced Data Sets (August 2003), Washington: Washington DC)
[15] B. Liu, W. Hsu, Y. Ma, Mining association rules with multiple minimum supports, in: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, August 1999, pp. 337-341.; B. Liu, W. Hsu, Y. Ma, Mining association rules with multiple minimum supports, in: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, August 1999, pp. 337-341.
[16] Wong, A. K.C.; Wang, Y., High order pattern discovery from discrete-valued data, IEEE Trans. Knowl. Data Eng., 9, 6, 877-893 (1997)
[17] N. Japkowicz, in: Proceedings of the AAAI’2000 Workshop on Learning from Imbalanced Data Sets, AAAI Tech Report WS-00-05, AAAI, 2000.; N. Japkowicz, in: Proceedings of the AAAI’2000 Workshop on Learning from Imbalanced Data Sets, AAAI Tech Report WS-00-05, AAAI, 2000.
[18] N.V. Chawla, N. Japkowicz, A. Kotcz, in: Proceedings of the ICML’2003 Workshop on Learning from Imbalanced Data Sets, ICML, 2003.; N.V. Chawla, N. Japkowicz, A. Kotcz, in: Proceedings of the ICML’2003 Workshop on Learning from Imbalanced Data Sets, ICML, 2003.
[19] N.V. Chawla, N. Japkowicz, A. Kotcz, SIGKDD Explorations, Special Issue on Class Imbalances, vol. 6(1), ACM, New York, June 2004.; N.V. Chawla, N. Japkowicz, A. Kotcz, SIGKDD Explorations, Special Issue on Class Imbalances, vol. 6(1), ACM, New York, June 2004.
[20] Chawla, N.; Bowyer, K.; Hall, L.; Kegelmeyer, W. P., SMOTE: synthetic minority over-sampling technique, Journal of Artificial Intelligence Research, 16, 321-357 (2002) · Zbl 0994.68128
[21] A. Estabrooks, A combination scheme for inductive learning from imbalanced data sets, Master’s Thesis, Faculty of Computer Science, Dalhousie University, Halifax, Nova Scotia, Canada, 2000.; A. Estabrooks, A combination scheme for inductive learning from imbalanced data sets, Master’s Thesis, Faculty of Computer Science, Dalhousie University, Halifax, Nova Scotia, Canada, 2000.
[22] Kubat, M.; Matwin, S., Addressing the curse of imbalanced training sets: one-sided selection, (Proceedings of the 14th International Conference on Machine Learning (1997), Morgan Kaufmann: Morgan Kaufmann Los Altos, CA), 179-186
[23] M. Pazzani, C. Merz, P. Murphy, K. Ali, T. Hume, C. Brunk, Reducing misclassification costs, in: Proceedings of the Eleventh International Conference on Machine Learning, New Brunswick, NJ, July 1994, pp. 217-225.; M. Pazzani, C. Merz, P. Murphy, K. Ali, T. Hume, C. Brunk, Reducing misclassification costs, in: Proceedings of the Eleventh International Conference on Machine Learning, New Brunswick, NJ, July 1994, pp. 217-225.
[24] Japkowicz, N., Supervised versus unsupervised binary-learning by feedforward neural networks, Mach. Learn., 41, 1 (2001) · Zbl 0970.68128
[25] Y. Freund, R.E. Schapire, Experiments with a new boosting algorithm, in: Proceedings of the Thirteenth International Conference on Machine Learning, 1996, The Mit Press, Cambridge, MA, Morgan Kaufmann, Los Altos, CA, pp. 148-156.; Y. Freund, R.E. Schapire, Experiments with a new boosting algorithm, in: Proceedings of the Thirteenth International Conference on Machine Learning, 1996, The Mit Press, Cambridge, MA, Morgan Kaufmann, Los Altos, CA, pp. 148-156.
[26] Freund, Y.; Schapire, R. E., A decision-theoretic generalization of on-line learning and an application to boosting, J. Comput. Syst. Sci., 55, 1, 119-139 (1997) · Zbl 0880.68103
[27] Schapire, R. E.; Singer, Y., Improved boosting algorithms using confidence-rated predictions, Mach. Learn., 37, 3, 297-336 (1999) · Zbl 0945.68194
[28] Schapire, R. E.; Singer, Y., Boosting the margin: a new explanation for the effectiveness of voting methods, Mach. Learn., 37, 3, 297-336 (1999)
[29] Friedman, J.; Hastie, T.; Tibshirani, R., Additive logistic regression: a statistical view of boosting, Ann. Statist., 28, 2, 337-374 (2000) · Zbl 1106.62323
[30] Ridgeway, G., The state of boosting, Comput. Sci. Statist., 31, 172-181 (1999)
[31] N. Japkowicz, Concept-learning in the presence of between-class and within-class imbalances, in: Proceedings of the Fourteenth Conference of the Canadian Society for Computational Studies of Intelligence, Ottawa, Canada, June 2001, pp. 67-77.; N. Japkowicz, Concept-learning in the presence of between-class and within-class imbalances, in: Proceedings of the Fourteenth Conference of the Canadian Society for Computational Studies of Intelligence, Ottawa, Canada, June 2001, pp. 67-77. · Zbl 0984.68643
[32] M.V. Joshi, Learning classifier models for predicting rare phenomena, Ph.D. Thesis, University of Minnesota, Twin Cites, MN, USA, 2002.; M.V. Joshi, Learning classifier models for predicting rare phenomena, Ph.D. Thesis, University of Minnesota, Twin Cites, MN, USA, 2002.
[33] Weiss, G.; Provost, F., Learning when training data are costly: the effect of class distribution on tree induction, J. Artif. Intell. Res., 19, 315-354 (2003) · Zbl 1046.68094
[34] R.C. Prati, G.E.A.P.A. Batista, Class imbalances versus class overlapping: an analysis of a learning system behavior, in: Proceedings of the Mexican International Conference on Artificial Intelligence (MICAI), Mexico City, Mexico, April 2004, pp. 312-321.; R.C. Prati, G.E.A.P.A. Batista, Class imbalances versus class overlapping: an analysis of a learning system behavior, in: Proceedings of the Mexican International Conference on Artificial Intelligence (MICAI), Mexico City, Mexico, April 2004, pp. 312-321.
[35] Zhou, Z. H.; Liu, X. Y., Training cost-sensitive neural networks with methods addressing the class imbalance problem, IEEE Trans. Knowl. Data Eng., 18, 1, 63-77 (2006)
[36] Quinlan, J. R., Improved estimates for the accuracy of small disjuncts, Mach. Learn., 6, 93-98 (1991)
[37] Zadrozny, B.; Elkan, C., Learning and making decisions when costs and probabilities are both unknown, (Proceedings of the Seventh International Conference on Knowledge Discovery and Data Mining (August 2001), San Francisco: San Francisco CA), 204-213
[38] Lin, Y.; Lee, Y.; Wahba, G., Support vector machines for classification in nonstandard situations, Mach. Learn., 46, 191-202 (2002) · Zbl 0998.68103
[39] Liu, B.; Ma, Y.; Wong, C. K., Improving an association rule based classifier, (Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery (September 2000), Lyon: Lyon France), 504-509
[40] Manevitz, L. M.; Yousef, M., One-class svms for document classification, J. Mach. Learn. Res., 2, 139-154 (2001) · Zbl 1002.68597
[41] Zadrozny, B.; Langford, J.; Abe, N., Cost-sensitive learning by cost-proportionate example weighting, (Proceedings of the Third IEEE International Conference on Data Mining (November 2003), Melbourne: Melbourne FL), 435-442
[42] Ling, C. X.; Li, C., Decision trees with minimal costs, (Proceedings of the 21st International Conference on Machine Learning (July 2004), Banff: Banff Canada)
[43] J. Bradford, C. Kunz, R. Kohavi, C. Brunk, C.E. Brodley, Pruning decision trees with misclassification costs, in: Proceedings of the Tenth European Conference on Machine Learning (ECML-98), Chemnitz, Germany, April 1998, pp. 131-136.; J. Bradford, C. Kunz, R. Kohavi, C. Brunk, C.E. Brodley, Pruning decision trees with misclassification costs, in: Proceedings of the Tenth European Conference on Machine Learning (ECML-98), Chemnitz, Germany, April 1998, pp. 131-136.
[44] P. Domingos, P. Metacost, Metacost: a general method for making classifiers cost sensitive, in: Advances in Neural Networks, International Journal of Pattern Recognition and Artificial Intelligence, San Diego, CA, 1999, pp. 155-164.; P. Domingos, P. Metacost, Metacost: a general method for making classifiers cost sensitive, in: Advances in Neural Networks, International Journal of Pattern Recognition and Artificial Intelligence, San Diego, CA, 1999, pp. 155-164.
[45] N. Abe, B. Zadrozny, J. Langford, An iterative method for multi-class cost-sensitive learning, in: Proceedings of the tenth ACN SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA, August 2004, pp. 3-11.; N. Abe, B. Zadrozny, J. Langford, An iterative method for multi-class cost-sensitive learning, in: Proceedings of the tenth ACN SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA, August 2004, pp. 3-11.
[46] M.V. Joshi, V. Kumar, R.C. Agarwal, Evaluating boosting algorithms to classify rare classes: Comparison and improvements, in: Proceedings of the First IEEE International Conference on Data Mining (ICDM’01), 2001.; M.V. Joshi, V. Kumar, R.C. Agarwal, Evaluating boosting algorithms to classify rare classes: Comparison and improvements, in: Proceedings of the First IEEE International Conference on Data Mining (ICDM’01), 2001.
[47] D. Lewis, W. Gale, Training text classifiers by uncertainty sampling, in: Proceedings of the Seventeenth Annual International ACM SIGIR Conference on Research and Development in Information, New York, NY, August 1998, pp. 73-79.; D. Lewis, W. Gale, Training text classifiers by uncertainty sampling, in: Proceedings of the Seventeenth Annual International ACM SIGIR Conference on Research and Development in Information, New York, NY, August 1998, pp. 73-79.
[48] Tan, P.; Steinbach, M.; Kumar, V., Introduction to Data Mining (2006), Addison-Wesley: Addison-Wesley Reading, MA
[49] F. Provost, T. Fawcett, Analysis and visualization of classifier performance: comparison under imprecise class and cost distributions, in: Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD-97), Newportbeach, CA, August 1997, p. 43-48.; F. Provost, T. Fawcett, Analysis and visualization of classifier performance: comparison under imprecise class and cost distributions, in: Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD-97), Newportbeach, CA, August 1997, p. 43-48.
[50] Hanley, J. A.; McNeil, B. J., The meaning and use of the area under a receiver operating characteristic (ROC) curve, Intell. Data Anal. J., 143, 29-36 (1982)
[51] Breiman, L., Bagging predictors, Machine Learning, 24, 2, 123-140 (1996) · Zbl 0858.68080
[52] Breiman, L., Random forests, Machine Learning, 45, 5-32 (2001) · Zbl 1007.68152
[53] R.E. Schapire, The boosting approach to machine learning—an overview, in: MSRI Workshop on Nonlinear Estimation and Classification, Berkeley, CA, March 2002, pp. 149-172.; R.E. Schapire, The boosting approach to machine learning—an overview, in: MSRI Workshop on Nonlinear Estimation and Classification, Berkeley, CA, March 2002, pp. 149-172.
[54] Kittler, J.; Katef, M.; Duin, R.; Matas, J., On combining classifiers, IEEE Trans. Pattern Anal. Mach. Intell., 20, 3 (1998)
[55] M.S. Kamel, N. Wanas, Data dependence in combining classifiers, in: Proceedings of the Fourth International Workshop on Multiple Classifiers Systems, Surrey, UK, June 2003.; M.S. Kamel, N. Wanas, Data dependence in combining classifiers, in: Proceedings of the Fourth International Workshop on Multiple Classifiers Systems, Surrey, UK, June 2003. · Zbl 1040.68660
[56] W. Fan, S.J. Stolfo, J. Zhang, P.K. Chan, Adacost: misclassification cost-sensitive boosting, in: Proceedings of Sixth International Conference on Machine Learning (ICML-99), Bled, Slovenia, 1999, pp. 97-105.; W. Fan, S.J. Stolfo, J. Zhang, P.K. Chan, Adacost: misclassification cost-sensitive boosting, in: Proceedings of Sixth International Conference on Machine Learning (ICML-99), Bled, Slovenia, 1999, pp. 97-105.
[57] Ting, K. M., A comparative study of cost-sensitive boosting algorithms, (Proceedings of the 17th International Conference on Machine Learning (2000), Stanford University: Stanford University CA), 983-990
[58] C. Chen, A. Liaw, L. Breiman, Using random forests to learn unbalanced data, \( \langle;\) http://stat-www.berkeley.edu/users/chenchao/666.pdf \(\rangle;\); C. Chen, A. Liaw, L. Breiman, Using random forests to learn unbalanced data, \( \langle;\) http://stat-www.berkeley.edu/users/chenchao/666.pdf \(\rangle;\)
[59] Turney, P., Types of cost in inductive concept learning, (Proceedings of Workshop on Cost-Sensitive Learning at the 17th International Conference on Machine Learning (2000), Stanford University: Stanford University CA), 15-21
[60] Viola, P.; Jones, M., Fast and robust classification using asymmetric adaboost and a detector cascade, (Proceedings of the Neural Information Processing Systems Conference (December 2001), Vancouver: Vancouver BC, Canada), 1311-1318
[61] C. Elkan, The foundations of cost-sensitive learning, in: Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, 2001, pp. 973-978.; C. Elkan, The foundations of cost-sensitive learning, in: Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, 2001, pp. 973-978.
[62] N.V. Chawla, A. Lazarevic, L.O. Hall, K.W. Bowyer, SMOTEBoost: improving prediction of the minority class in boosting, in: Proceedings of the Seventh European Conference on Principles and Practice of Knowledge Discovery in Databases, Dubrovnik, Croatia, 2003, pp. 107-119.; N.V. Chawla, A. Lazarevic, L.O. Hall, K.W. Bowyer, SMOTEBoost: improving prediction of the minority class in boosting, in: Proceedings of the Seventh European Conference on Principles and Practice of Knowledge Discovery in Databases, Dubrovnik, Croatia, 2003, pp. 107-119.
[63] Guo, H.; Viktor, H. L., Learning from imbalanced data sets with boosting and data generation: the databoost-IM approach, SIGKDD Explorations Special Issue on Learning from Imbalanced Datasets, 6, 1, 30-39 (2004)
[64] Sun, Y.; Wong, A. K.C.; Wang, Y., An overview of associative classifiers, (The 2006 International Conference on Data Mining (DMIN’06) (June 2006), Las Vegas: Las Vegas Nevada), 138-143
[65] Wang, Y.; Wong, A. K.C., From association to classification: inference using weight of evidence, IEEE Trans. Knowl. Data Eng., 15, 3, 764-767 (2003)
[66] Y. Sun, Cost-sensitive boosting for classification of imbalanced data, Ph.D. Thesis, University of Waterloo, Waterloo, Ont., Canada, 2007.; Y. Sun, Cost-sensitive boosting for classification of imbalanced data, Ph.D. Thesis, University of Waterloo, Waterloo, Ont., Canada, 2007. · Zbl 1122.68505
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.