×

zbMATH — the first resource for mathematics

Scaling up sparse support vector machines by simultaneous feature and sample reduction. (English) Zbl 1434.68415
Summary: Sparse support vector machine (SVM) is a popular classification technique that can simultaneously learn a small set of the most interpretable features and identify the support vectors. It has achieved great successes in many real-world applications. However, for large-scale problems involving a huge number of samples and ultra-high dimensional features, solving sparse SVMs remains challenging. By noting that sparse SVMs induce sparsities in both feature and sample spaces, we propose a novel approach, which is based on accurate estimations of the primal and dual optima of sparse SVMs, to simultaneously identify the inactive features and samples that are guaranteed to be irrelevant to the outputs. Thus, we can remove the identified inactive samples and features from the training phase, leading to substantial savings in the computational cost without sacrificing the accuracy. Moreover, we show that our method can be extended to multi-class sparse support vector machines. To the best of our knowledge, the proposed method is the first static feature and sample reduction method for sparse SVMs and multi-class sparse SVMs. Experiments on both synthetic and real data sets demonstrate that our approach significantly outperforms state-of-the-art methods and the speedup gained by our approach can be orders of magnitude.
MSC:
68T05 Learning and adaptive systems in artificial intelligence
62H25 Factor analysis and principal components; correspondence analysis
62H30 Classification and discrimination; cluster analysis (statistical aspects)
Software:
Celer; LIBLINEAR; LIBSVM
PDF BibTeX XML Cite
Full Text: Link
References:
[1] Heinz H Bauschke, Patrick L Combettes, et al.Convex analysis and monotone operator theory in Hilbert spaces, volume 408. Springer, 2011.
[2] Jinbo Bi, Kristin Bennett, Mark Embrechts, Curt Breneman, and Minghu Song. Dimensionality reduction via sparse support vector machines.The Journal of Machine Learning
[3] Antoine Bonnefoy, Valentin Emiya, Liva Ralaivola, and R´emi Gribonval. A dynamic screening principle for the lasso. InSignal Processing Conference (EUSIPCO), 2014 Proceedings of
[4] Bryan Catanzaro, Narayanan Sundaram, and Kurt Keutzer. Fast support vector machine training and classification on graphics processors. InProceedings of the 25th international
[5] Chih-Chung Chang and Chih-Jen Lin. Libsvm: a library for support vector machines.ACM Transactions on Intelligent Systems and Technology (TIST), 2(3):27, 2011.
[6] Laurent El Ghaoui, Vivian Viallon, and Tarek Rabbani. Safe feature elimination in sparse supervised learning.Pacific Journal of Optimization, 8:667-698, 2012.
[7] Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. Liblinear: A library for large linear classification.The Journal of Machine Learning Research, 9: 1871-1874, 2008.
[8] Trevor Hastie, Saharon Rosset, Robert Tibshirani, and Ji Zhu. The entire regularization path for the support vector machine.The Journal of Machine Learning Research, 5: 1391-1415, 2004.
[9] Trevor Hastie, Robert Tibshirani, and Martin Wainwright.Statistical learning with sparsity: the lasso and generalizations. CRC Press, 2015.
[10] Cho-Jui Hsieh, Kai-Wei Chang, Chih-Jen Lin, S Sathiya Keerthi, and Sellamanickam Sundararajan. A dual coordinate descent method for large-scale linear svm. InProceedings
[11] Thorsten Joachims.Text categorization with support vector machines: Learning with many relevant features. Springer, 1998.
[12] Ron Kohavi. A study of cross-validation and bootstrap for accuracy estimation and model selection. InInternational Joint Conference on Artificial Intelligence, pages 1137-1145, 1995.
[13] Irene Kotsia and Ioannis Pitas. Facial expression recognition in image sequences using geometric deformation features and support vector machines.Image Processing, IEEE
[14] Mathurin Massias, Joseph Salmon, and Alexandre Gramfort. Celer: a fast solver for the lasso with dual extrapolation. InInternational Conference on Machine Learning, pages 3321-3330, 2018.
[15] Nicolai Meinshausen and Peter B¨uhlmann. Stability selection.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(4):417-473, 2010.
[16] Johannes Mohr and Klaus Obermayer. A topographic support vector machine: Classification using local label configurations. InAdvances in Neural Information Processing Systems, pages 929-936, 2004.
[17] Harikrishna Narasimhan and Shivani Agarwal. Svm pauc tight: a new support vector method for optimizing partial auc based on a tight convex upper bound. InProceedings
[18] Eugene Ndiaye, Olivier Fercoq, Alexandre Gramfort, and Joseph Salmon. Gap safe screening rules for sparse-group lasso. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett, editors,Advances in Neural Information Processing Systems 29, pages 388-396. Curran Associates, Inc., 2016.
[19] Kohei Ogawa, Yoshiki Suzuki, and Ichiro Takeuchi. Safe screening of non-support vectors in pathwise svm computation. InProceedings of the 30th International Conference on
[20] Ralph Tyrell Rockafellar.Convex analysis. Princeton university press, 1970.
[21] Shai Shalev-Shwartz and Tong Zhang. Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization.Mathematical Programming, 155(1-2):105-145, 2016.
[22] Shai Shalev-Shwartz, Yoram Singer, Nathan Srebro, and Andrew Cotter. Pegasos: Primal estimated sub-gradient solver for svm.Mathematical programming, 127(1):3-30, 2011.
[23] Atsushi Shibagaki, Masayuki Karasuyama, Kohei Hatano, and Ichiro Takeuchi. Simultaneous safe screening of features and samples in doubly sparse modeling. InProceedings of The
[24] Robert Tibshirani, Jacob Bien, Jerome Friedman, Trevor Hastie, Noah Simon, Jonathan Taylor, and Ryan J Tibshirani. Strong rules for discarding predictors in lasso-type problems.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 74
[25] Jie Wang, Peter Wonka, and Jieping Ye. Scaling svm and least absolute deviations via exact data reduction. InProceedings of the 31st International Conference on Machine Learning
[26] Jie Wang, Jiayu Zhou, Jun Liu, Peter Wonka, and Jieping Ye. A safe screening rule for sparse logistic regression. InAdvances in Neural Information Processing Systems, pages 1053-1061, 2014b.
[27] Jie Wang, Peter Wonka, and Jieping Ye. Lasso screening rules via dual polytope projection. Journal of Machine Learning Research, 16:1063-1101, 2015.
[28] Li Wang, Ji Zhu, and Hui Zou. The doubly regularized support vector machine.Statistica Sinica, pages 589-615, 2006.
[29] Zhen James Xiang and Peter J Ramadge. Fast lasso screening tests based on correlations. In Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on, pages 2137-2140. IEEE, 2012.
[30] Tao Yang, Jie Wang, Qian Sun, Derrek P Hibar, Neda Jahanshad, Li Liu, Yalin Wang, Liang Zhan, Paul M Thompson, and Jieping Ye. Detecting genetic risk factors for alzheimer’s disease in whole genome sequence data via lasso screening. InBiomedical Imaging (ISBI),
[31] Yuya Yoshikawa, Tomoharu Iwata, and Hiroshi Sawada. Latent support measure machines for bag-of-words data classification. InAdvances in Neural Information Processing Systems, pages 1961-1969, 2014.
[32] Weizhong Zhang, Bin Hong, Wei Liu, Jieping Ye, Deng Cai, Xiaofei He, and Jie Wang.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.