swMATH ID: 34239
Software Authors: N. V. Chawla, K. W. Bowyer, L. O. Hall, W. P. Kegelmeyer
Description: SMOTE: Synthetic Minority Over-sampling Technique. An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often real-world data sets are predominately composed of ”normal” examples with only a small percentage of ”abnormal” or ”interesting” examples. It is also the case that the cost of misclassifying an abnormal (interesting) example as a normal example is often much higher than the cost of the reverse error. Under-sampling of the majority (normal) class has been proposed as a good means of increasing the sensitivity of a classifier to the minority class. This paper shows that a combination of our method of over-sampling the minority (abnormal) class and under-sampling the majority (normal) class can achieve better classifier performance (in ROC space) than only under-sampling the majority class. This paper also shows that a combination of our method of over-sampling the minority class and under-sampling the majority class can achieve better classifier performance (in ROC space) than varying the loss ratios in Ripper or class priors in Naive Bayes. Our method of over-sampling the minority class involves creating synthetic minority class examples. Experiments are performed using C4.5, Ripper and a Naive Bayes classifier. The method is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.
Homepage: https://arxiv.org/abs/1106.1813
Related Software: UCI-ml; C4.5; SMOTEBoost; ADASYN; LIBSVM; JStatCom; MWMOTE; KEEL; AdaBoost.MH; AdaCost; R; Imbalanced-learn; Scikit; XGBoost; ElemStatLearn; LIBLINEAR; WEKA; randomForest; HandTill2001; GitHub
Cited in: 128 Publications
all top 5

Cited by 411 Authors

3 Herrera, Francisco
3 Kegelmeyer, W. Philip
2 Adams, Rod
2 Bellinger, Colin
2 Bowyer, Kevin W.
2 Chawla, Nitesh V.
2 Chou, Kuochen
2 Davey, Neil
2 Dorado, Julian
2 Fernandez-Lozano, Carlos
2 Fernández, Alberto
2 González Castellano, Cristina
2 Hall, Lawrence O.
2 Koziarski, Michał
2 Lessmann, Stefan
2 Mammadov, Musa A.
2 Munteanu, Cristian Robert
2 Sun, Yi
2 Tilakaratne, C. D.
2 Van den Poel, Dirk
2 Wozniak, Michal
2 Xiao, Xuan
2 Zięba, Maciej
1 Abdallah, Zahraa S.
1 Afzal, Hammad
1 Ahmad, Jamal
1 Ahmed, Mehreen
1 Akalin, Altuna
1 Albizri, Abdullah
1 Aminian, Ehsan
1 Ananthakumar, Usha
1 Anderlucci, Laura
1 Angulo, Cecilio
1 Aşkan, Ayşegül
1 Athanasiou, Vasileios
1 Bai, Xiang
1 Ballings, Michel
1 Ban, Tao
1 Banfield, Robert E.
1 Barella, Victor H.
1 Barnett, Christine L.
1 Barnett, Ian J.
1 Bej, Saptarshi
1 Ben-Itzhak, Yaniv
1 Bernardo, Alessio
1 Bhattacharya, Sourangshu
1 Blagus, Rok
1 Blomberg, Jeanette
1 Bodovski, Yosef
1 Bogaert, Matthias
1 Bravo, Cristián
1 Brintrup, A.
1 Brownstein, John S.
1 Buckinx, Wouter
1 Bulavas, Viktoras
1 Cai, Xingjuan
1 Cao, Jingjing
1 Cao, Yanan
1 Cao, Yi
1 Carbonell, Jaime G.
1 Carriquiry, Alicia L.
1 Carvalho, Mateus Araujo
1 Castro, Cristiano Leite de
1 Chaabane, Ikram
1 Chandrasekara, N. V.
1 Chang, Liang
1 Chellappa, Rama
1 Chen, Baiyun
1 Chen, Degang
1 Chen, Gang
1 Chen, Jiaxu
1 Chen, Jie
1 Chen, Yan-Cheng
1 Chen, Zhi
1 Chen, Zizhong
1 Cheng, Fan
1 Cheng, Xiang
1 Chi, Guangqing
1 Chmielnicki, Wiesław
1 Chow, Sy-Miin
1 Chung, Fu-Lai
1 Cieslak, David A.
1 Cohn, Emily L.
1 Coleman, Thomas F.
1 Costa, Yandre M. G.
1 Crone, Sven F.
1 Cui, Yuehua
1 Cuiñas, Rubén F.
1 Das, Swagatam
1 Datta, Shounak
1 Davidson, Padraig
1 Davtyan, Narek
1 De Smedt, Johannes
1 de Souto, Marcílio C. P.
1 del Jesus, María José
1 Della Valle, Emanuele
1 Denton, Brian T.
1 Desaulniers, Guy
1 Dong, Aimei
1 Dong, Zhonghui
...and 311 more Authors
all top 5

Cited in 51 Serials

12 Information Sciences
11 Data Mining and Knowledge Discovery
10 Machine Learning
8 Annals of Operations Research
8 European Journal of Operational Research
7 Pattern Recognition
6 Journal of Theoretical Biology
4 International Journal of Approximate Reasoning
4 Neural Networks
4 Computational Statistics and Data Analysis
3 Mathematical Problems in Engineering
2 Applied Mathematics and Computation
2 Computers & Operations Research
2 Journal of Applied Statistics
2 International Journal of Applied Mathematics and Computer Science
2 The Annals of Applied Statistics
2 Algorithms
1 Physics Reports
1 Psychometrika
1 Operations Research
1 Statistics & Probability Letters
1 Journal of Classification
1 Mathematical and Computer Modelling
1 Computational Statistics
1 Communications in Statistics. Simulation and Computation
1 Journal of Statistical Computation and Simulation
1 Foundations of Computing and Decision Sciences
1 Journal of Mathematical Imaging and Vision
1 International Journal of Computer Vision
1 The Journal of Artificial Intelligence Research (JAIR)
1 Complexity
1 Abstract and Applied Analysis
1 Journal of Applied Mathematics and Decision Sciences
1 PAA. Pattern Analysis and Applications
1 Informatica (Vilnius)
1 Nonlinear Analysis. Real World Applications
1 Advances and Applications in Statistics
1 Journal of Systems Science and Complexity
1 Statistical Applications in Genetics and Molecular Biology
1 Advances in Data Analysis and Classification. ADAC
1 Computational & Mathematical Methods in Medicine
1 Statistical Analysis and Data Mining
1 Electronic Journal of Statistics
1 Journal of Agricultural, Biological, and Environmental Statistics
1 International Journal of Advances in Engineering Sciences and Applied Mathematics
1 Frontiers of Computer Science
1 JSIAM Letters
1 Computational Methods for Differential Equations
1 Advances in Data Science and Adaptive Analysis
1 SN Operations Research Forum
1 Chapman & Hall/CRC Computational Biology Series

Citations by Year