A MapReduce-based distributed SVM ensemble for scalable image classification and annotation. (English) Zbl 1381.68240

Summary: A combination of classifiers leads to a substantial reduction of classification errors in a wide range of applications. Among them, support vector machine (SVM) ensembles with bagging have shown better performance in classification than a single SVM. However, the training process of SVM ensembles is notably computationally intensive, especially when the number of replicated training datasets is large. This paper presents MRESVM, a MapReduce-based distributed SVM ensemble algorithm for scalable image annotation which re-samples the training dataset based on bootstrapping and trains an SVM on each dataset in parallel using a cluster of computers. A balanced sampling strategy for bootstrapping is introduced to increase the classification accuracy. MRESVM is evaluated in both experimental and simulation environments, and the results show that the MRESVM algorithm reduces the training time significantly while achieving a high level of accuracy in classifications.


68T05 Learning and adaptive systems in artificial intelligence
62H30 Classification and discrimination; cluster analysis (statistical aspects)
68U10 Computing methodologies for image processing
Full Text: DOI


[1] Smeulders, A.; Worring, M.; Santini, S.; Gupta, A.; Jain, A., Content based image retrieval at the end of the early years, IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 12, 1349-1380 (2000)
[2] Tsai, C.; Hung, C., Automatically annotating images with keywords: a review of image annotation, Recent Patents on Computer Science, 1, 1, 55-68 (2008)
[3] Wong, W.; Hsu, S., Application of SVM and ANN for image retrieval, European Journal of Operational Research, 173, 3, 938-950 (2006) · Zbl 1131.68509
[5] Boutell, M.; Luo, J.; Shen, X.; Brown, C. M., Learning multi-label scene classification, Pattern Recognition, 37, 9, 1757-1771 (2004)
[6] Chen, Y.; Wang, J. Z., Image categorization by learning and reasoning with regions, Journal of Machine Learning Research, 5, 913-939 (2004)
[10] Wang, L. P.; Fu, X. J., Data Mining with Computational Intelligence (2005), Springer: Springer Berlin · Zbl 1101.68793
[11] Khandoker, A.; Palaniswami, M.; Karmakar, C., Support vector machines for automated recognition of obstructive sleep apnea syndrome from ECG recordings, IEEE Transactions on Information Technology in Biomedicine, 13, 1, 37-48 (2009)
[12] Cristianini, N.; Shawe-Taylor, J., An Introduction to Support Vector Machines and Other Kernel-based Learning Methods (2000), Cambridge University Press
[13] Waring, C.; Liu, X., Face detection using spectral histograms and SVMs, IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 35, 3, 467-476 (2005)
[16] Kecman, V., Learning and soft computing, support vector machines, (Neural Networks and Fuzzy Logic Models (2001), The MIT Press: The MIT Press Cambridge, MA) · Zbl 0994.68109
[17] (Wang, L. P., Support Vector Machines: Theory and Application (2005), Springer: Springer Berlin)
[18] Sloin, A.; Burshtein, D., Support vector machine training for improved hidden Markov modeling, IEEE Transactions on Signal Processing, 56, 1, 172-188 (2008) · Zbl 1390.68554
[20] Brown, G.; Wyatt, J.; Harris, R.; Yao, X., Diversity creation methods: a survey and categorization, Information Fusion, 6, 1, 5-20 (2005)
[21] Breiman, L., Bagging predictors, Machine Learning, 24, 2, 123-140 (1996) · Zbl 0858.68080
[22] Schapire, R.; Freund, Y.; Bartlett, P.; Lee, W., Boosting the margin: a new explanation for the effectiveness of voting methods, The Annals of Statistics, 26, 1, 1651-1686 (1998) · Zbl 0929.62069
[23] Fumera, G.; Roli, F.; Serrau, A., Dynamics of variance reduction in bagging and other techniques based on randomisation, Multiple Classifier Systems, 316-325 (2005)
[24] Kim, H.; Pang, S.; Je, H.; Kim, D.; Bang, S., Support vector machine ensemble with bagging, (SVM (2002)), 397-407 · Zbl 1064.68596
[25] Yan, G.; Ma, G.; Zhu, L., Support vector machines ensemble based on fuzzy integral for classification, (ISNN (2006)), 974-980
[26] Tao, D.; Tang, X.; Li, X.; Wu, X., Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval, IEEE Transactions on Pattern Analysis and Machine Intelligence, 28, 3, 1088-1099 (2006)
[29] Platt, J. C., Sequential minimal optimization: a fast algorithm for training support vector machines, (Technical Report, Microsoft Research, MSR-TR-98-14 (1998))
[30] Lämmel, R., Google’s MapReduce programming model — revisited, Science of Computer Programming, 70, 1, 1-30 (2008) · Zbl 1129.68414
[32] Davison, A. C.; Hinkley, D.; Schechtman, E., Efficient bootstrap simulation, Biometrika, 73, 3, 555-566 (1986) · Zbl 0613.62018
[33] Chen, J.; Su, C.; Grimson, W.; Liu, J.; Shiue, D., Object segmentation of database images by dual multiscale morphological reconstructions and retrieval applications, IEEE Transactions on Image Processing, 21, 2, 828-843 (2012) · Zbl 1372.94040
[34] Rui, Y.; Huang, T.; Ortega, M.; Mehrotra, S., Relevance feedback: a power tool for interactive content-based image retrieval, IEEE Transactions on Circuits and Systems for Video Technology, 8, 1, 644-655 (1998)
[35] Zhang, L.; Wang, L.; Lin, W., Conjunctive patches subspace learning with side information for collaborative image retrieval, IEEE Transactions on Image Processing, 21, 4, 3707-3720 (2012) · Zbl 1373.94486
[36] Quellec, G.; Lamard, M.; Cazuguel, G.; Cochener, B.; Roux, C., Fast wavelet-based image characterization for highly adaptive image retrieval, IEEE Transactions on Image Processing, 21, 4, 1613-1623 (2012) · Zbl 1373.94337
[37] Zhang, L.; Wang, L.; Lin, W., Generalized biased discriminant analysis for content-based image retrieval, IEEE Transactions on System, Man, Cybernetics, Part B: Cybernetics, 42, 1, 282-290 (2012)
[38] Zhang, L.; Wang, L.; Lin, W., Semi-supervised biased maximum margin analysis for interactive image retrieval, IEEE Transactions on Image Processing, 21, 4, 2294-2308 (2012) · Zbl 1373.94485
[39] Mason, L.; Bartlett, P.; Baxter, J., Improved generalization through explicit optimization of margins, Machine Learning, 38, 3, 243-255 (2000) · Zbl 0954.68134
[40] Silva, C.; Lotric, U.; Ribeiro, B.; Dobnikar, A., Distributed text classification with an ensemble kernel-based learning approach, IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews, 40, 3, 287-297 (2010)
[43] Re, M.; Valentini, G., Prediction of gene function using ensembles of SVMs and heterogeneous data sources, (Applications of Supervised and Unsupervised Ensemble Methods (2009)), 79-91
[45] Lei, Z.; Yang, Y.; Wu, Z., Ensemble of support vector machine for text-independent speaker recognition, International Journal Computer Science and Network Security, 6, 1, 163-167 (2006)
[47] Brassard, G.; Bratley, P., Algorithmic: Theory and Practice (1988), Prentice-Hall · Zbl 0643.68003
[49] Valentini, G.; Dietterich, T., Bias-variance analysis of support vector machines for the development of SVM-based ensemble methods, Journal of Machine Learning Research, 5, 725-775 (2004) · Zbl 1222.68323
[52] Breiman, L., Bias, variance and arcing classifiers, (Technical Report TR 460 (1996), Statistics Department: Statistics Department University of California, Berkeley, CA)
[53] Heskes, T., Bias/variance decomposition for likelihood-based estimators, Neural Computation, 10, 2, 1425-1433 (1998)
[54] Friedman, H., On bias, variance, 0/1 loss and the curse of dimensionality, Data Mining and Knowledge Discovery, 1, 1, 55-77 (1997)
[59] Sikora, T., The MPEG-7 visual standard for content description—an overview, IEEE Transactions on Circuits and Systems for Video Technology, 11, 2, 696-702 (2001)
[62] Liu, Y.; Li, M.; Alham, N. K.; Hammoud, S., HSim, a MapReduce simulator in enabling cloud computing, Future Generation Computer Systems, 29, 1, 300-308 (2013)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.