SMOTE swMATH ID: 34239 Software Authors: N. V. Chawla, K. W. Bowyer, L. O. Hall, W. P. Kegelmeyer Description: SMOTE: Synthetic Minority Over-sampling Technique. An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often real-world data sets are predominately composed of ”normal” examples with only a small percentage of ”abnormal” or ”interesting” examples. It is also the case that the cost of misclassifying an abnormal (interesting) example as a normal example is often much higher than the cost of the reverse error. Under-sampling of the majority (normal) class has been proposed as a good means of increasing the sensitivity of a classifier to the minority class. This paper shows that a combination of our method of over-sampling the minority (abnormal) class and under-sampling the majority (normal) class can achieve better classifier performance (in ROC space) than only under-sampling the majority class. This paper also shows that a combination of our method of over-sampling the minority class and under-sampling the majority class can achieve better classifier performance (in ROC space) than varying the loss ratios in Ripper or class priors in Naive Bayes. Our method of over-sampling the minority class involves creating synthetic minority class examples. Experiments are performed using C4.5, Ripper and a Naive Bayes classifier. The method is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy. Homepage: https://arxiv.org/abs/1106.1813 Related Software: UCI-ml; C4.5; SMOTEBoost; ADASYN; LIBSVM; JStatCom; MWMOTE; KEEL; AdaBoost.MH; AdaCost; R; Imbalanced-learn; Scikit; XGBoost; ElemStatLearn; LIBLINEAR; WEKA; randomForest; HandTill2001; GitHub Cited in: 128 Publications Standard Articles 1 Publication describing the Software, including 1 Publication in zbMATH Year SMOTE: Synthetic minority over-sampling technique. Zbl 0994.68128Chawla, N. V.; Bowyer, K. W.; Hall, L. O.; Kegelmeyer, W. P. 2002 all top 5 Cited by 411 Authors 3 Herrera, Francisco 3 Kegelmeyer, W. Philip 2 Adams, Rod 2 Bellinger, Colin 2 Bowyer, Kevin W. 2 Chawla, Nitesh V. 2 Chou, Kuochen 2 Davey, Neil 2 Dorado, Julian 2 Fernandez-Lozano, Carlos 2 Fernández, Alberto 2 González Castellano, Cristina 2 Hall, Lawrence O. 2 Koziarski, Michał 2 Lessmann, Stefan 2 Mammadov, Musa A. 2 Munteanu, Cristian Robert 2 Sun, Yi 2 Tilakaratne, C. D. 2 Van den Poel, Dirk 2 Wozniak, Michal 2 Xiao, Xuan 2 Zięba, Maciej 1 Abdallah, Zahraa S. 1 Afzal, Hammad 1 Ahmad, Jamal 1 Ahmed, Mehreen 1 Akalin, Altuna 1 Albizri, Abdullah 1 Aminian, Ehsan 1 Ananthakumar, Usha 1 Anderlucci, Laura 1 Angulo, Cecilio 1 Aşkan, Ayşegül 1 Athanasiou, Vasileios 1 Bai, Xiang 1 Ballings, Michel 1 Ban, Tao 1 Banfield, Robert E. 1 Barella, Victor H. 1 Barnett, Christine L. 1 Barnett, Ian J. 1 Bej, Saptarshi 1 Ben-Itzhak, Yaniv 1 Bernardo, Alessio 1 Bhattacharya, Sourangshu 1 Blagus, Rok 1 Blomberg, Jeanette 1 Bodovski, Yosef 1 Bogaert, Matthias 1 Bravo, Cristián 1 Brintrup, A. 1 Brownstein, John S. 1 Buckinx, Wouter 1 Bulavas, Viktoras 1 Cai, Xingjuan 1 Cao, Jingjing 1 Cao, Yanan 1 Cao, Yi 1 Carbonell, Jaime G. 1 Carriquiry, Alicia L. 1 Carvalho, Mateus Araujo 1 Castro, Cristiano Leite de 1 Chaabane, Ikram 1 Chandrasekara, N. V. 1 Chang, Liang 1 Chellappa, Rama 1 Chen, Baiyun 1 Chen, Degang 1 Chen, Gang 1 Chen, Jiaxu 1 Chen, Jie 1 Chen, Yan-Cheng 1 Chen, Zhi 1 Chen, Zizhong 1 Cheng, Fan 1 Cheng, Xiang 1 Chi, Guangqing 1 Chmielnicki, Wiesław 1 Chow, Sy-Miin 1 Chung, Fu-Lai 1 Cieslak, David A. 1 Cohn, Emily L. 1 Coleman, Thomas F. 1 Costa, Yandre M. G. 1 Crone, Sven F. 1 Cui, Yuehua 1 Cuiñas, Rubén F. 1 Das, Swagatam 1 Datta, Shounak 1 Davidson, Padraig 1 Davtyan, Narek 1 De Smedt, Johannes 1 de Souto, Marcílio C. P. 1 del Jesus, María José 1 Della Valle, Emanuele 1 Denton, Brian T. 1 Desaulniers, Guy 1 Dong, Aimei 1 Dong, Zhonghui ...and 311 more Authors all top 5 Cited in 51 Serials 12 Information Sciences 11 Data Mining and Knowledge Discovery 10 Machine Learning 8 Annals of Operations Research 8 European Journal of Operational Research 7 Pattern Recognition 6 Journal of Theoretical Biology 4 International Journal of Approximate Reasoning 4 Neural Networks 4 Computational Statistics and Data Analysis 3 Mathematical Problems in Engineering 2 Applied Mathematics and Computation 2 Computers & Operations Research 2 Journal of Applied Statistics 2 International Journal of Applied Mathematics and Computer Science 2 The Annals of Applied Statistics 2 Algorithms 1 Physics Reports 1 Psychometrika 1 Operations Research 1 Statistics & Probability Letters 1 Journal of Classification 1 Mathematical and Computer Modelling 1 Computational Statistics 1 Communications in Statistics. Simulation and Computation 1 Journal of Statistical Computation and Simulation 1 Foundations of Computing and Decision Sciences 1 Journal of Mathematical Imaging and Vision 1 International Journal of Computer Vision 1 The Journal of Artificial Intelligence Research (JAIR) 1 Complexity 1 Abstract and Applied Analysis 1 Journal of Applied Mathematics and Decision Sciences 1 PAA. Pattern Analysis and Applications 1 Informatica (Vilnius) 1 Nonlinear Analysis. Real World Applications 1 Advances and Applications in Statistics 1 Journal of Systems Science and Complexity 1 Statistical Applications in Genetics and Molecular Biology 1 Advances in Data Analysis and Classification. ADAC 1 Computational & Mathematical Methods in Medicine 1 Statistical Analysis and Data Mining 1 Electronic Journal of Statistics 1 Journal of Agricultural, Biological, and Environmental Statistics 1 International Journal of Advances in Engineering Sciences and Applied Mathematics 1 Frontiers of Computer Science 1 JSIAM Letters 1 Computational Methods for Differential Equations 1 Advances in Data Science and Adaptive Analysis 1 SN Operations Research Forum 1 Chapman & Hall/CRC Computational Biology Series all top 5 Cited in 9 Fields 82 Computer science (68-XX) 76 Statistics (62-XX) 16 Operations research, mathematical programming (90-XX) 15 Biology and other natural sciences (92-XX) 9 Game theory, economics, finance, and other social and behavioral sciences (91-XX) 1 Algebraic topology (55-XX) 1 Numerical analysis (65-XX) 1 Quantum theory (81-XX) 1 Systems theory; control (93-XX) Citations by Year