×

Incremental relevance sample-feature machine: a fast marginal likelihood maximization approach for joint feature selection and classification. (English) Zbl 1414.68067

Summary: The recently proposed Relevance Sample-Feature Machine (RSFM) performs joint feature selection and classification with state-of-the-art performance in terms of accuracy and sparsity. However, it suffers from high computational cost for large training sets. To accelerate its training procedure, we introduce a new variant of this algorithm named Incremental Relevance Sample-Feature Machine (IRSFM). In IRSFM, the marginal likelihood maximization approach is changed such that the model learning follows a constructive procedure (starting with an empty model, it iteratively adds or omits basis functions to construct the learned model). Our extensive experiments on various data sets and comparison with various competing algorithms demonstrate the effectiveness of the proposed IRSFM in terms of accuracy, sparsity and run-time. While the IRSFM achieves almost the same classification accuracy as the RSFM, it benefits from sparser learned model both in sample and feature domains and much less training time than RSFM especially for large data sets.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
62F15 Bayesian inference

Software:

SHOGUN
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Williams, O.; Blake, A.; Cipolla, R., Sparse Bayesian learning for efficient visual tracking, IEEE Trans. Pattern Anal. Mach. Intell., 27, 1292-1304 (2005)
[2] Thayananthan, A.; Navaratnam, R.; Stenger, B.; Torr, P. H.S.; Cipolla, R., Pose estimation and tracking using multivariate regression, Pattern Recognit. Lett., 29, 9, 1302-1310 (2008)
[3] Tzikas, D.; Likas, A.; Galatsanos, N., Large scale multi-kernel relevance vector machine for object detection, Int. J. Artif. Intell. Tools, 16, 6, 967-979 (2007)
[4] Zhao, F.; Strother, S., Bayesian kernel methods for analysis of functional neuroimages, IEEE Trans. Med. Imag., 26, 1613-1624 (2007)
[5] Li, Y.; Campbell, C.; Tipping, M., Bayesian automatic relevance determination algorithms for classifying gene expression data, Bioinformatics, 18, 10, 1332-1339 (2002)
[6] Wei, L.; Yang, Y.; Nishikawa, R. M.; Wernick, M. N.; Edwards, A., Relevance vector machine for automatic detection of clustered microcalcifications, IEEE Trans. Med. Imag., 24, 1278-1285 (2005)
[7] Mohsenzadeh, Y.; Dash, S.; Crawford, J. D., A state space model for spatial updating of remembered visual targets during eye movements, Front. Syst. Neurosci., 10, 39 (2016)
[8] Kiaee, F.; Sheikhzadeh, H.; Mahabadi, S. E., Sparse Bayesian mixed-effects extreme learning machine an approach for unobserved clustered heterogeneity, Neurocomputing, 175, January (part A), 411-420 (2016)
[9] Mohsenzadeh, Y.; Sheikhzadeh, H., Gaussian Kernel width optimization for sparse Bayesian learning, IEEE Trans. Neural Netw. Learn. Syst., 26, 4, 709-719 (2015)
[12] Tipping, M., Sparse Bayesian learning and the relevance vector machine, J. Mach. Learn. Res., 1, 211-244 (2001) · Zbl 0997.68109
[14] Nicolaou, M. A.; Gunes, H.; Pantic, M., Output associative RVM regression for dimensional and continuous emotion prediction, Image Vis. Comput., 30, 3, 186-196 (2012)
[17] Kiaee, F.; Sheikhzadeh, H.; Mahabadi, S. E., Relevance vector machine for survival analysis, IEEE Trans. Neural Netw. Learn. Syst., 27, 3, 648-660 (2015)
[20] Psorakis, I.; Damoulas, T.; Girolami, M., Multiclass relevance vector machinessparsity and accuracy, IEEE Trans. Neural Netw., 21, 1588-1598 (2010)
[21] Tzikas, D.; Likas, A.; Galatsanos, N., Sparse Bayesian modeling with adaptive kernel learning, IEEE Trans. Neural Netw., 20, 926-937 (2009)
[29] Guyon, I.; Elisseeff, A., An introduction to variable and feature selection, J. Mach. Learn. Res., 3, 1157-1182 (2003) · Zbl 1102.68556
[30] Krishnapuram, B.; Hartemink, A. J.; Figueiredo, M. A.T., A Bayesian approach to joint feature selection and classifier design, IEEE Trans. Pattern Anal. Mach. Intell., 26, 1105-1111 (2004)
[31] Lapedriza, A.; Segu, S.; Masip, D.; Vitria, J., A sparse Bayesian approach for joint feature selection and classifier learning, Pattern Anal. Appl., 11, 3, 299-308 (2008)
[32] Zhao, P.; Yu, B., Stagewise lasso, J. Mach. Learn. Res., 8, 2701-2726 (2007) · Zbl 1222.68345
[33] Nguyen, M. H.; De la Torre, F., Optimal feature selection for support vector machines, Pattern Recognit., 43, no.3, 584-591 (2010) · Zbl 1187.68411
[35] Lanckriet, G. R.G.; Cristianini, N.; Bartlett, P.; Ghaoui, L. E.; Jordan, M. I., Learning the kernel matrix with semidefinite programming, J. Mach. Learn. Res., 5, 27-72 (2004) · Zbl 1222.68241
[36] Sonnenburg, S.; Rätsch, G.; Schäfer, C., A general and efficient multiple kernel learning algorithm, (Weiss, Y.; Schölkopf, B.; Platt, J., Advances in Neural Information Processing Systems, vol. 18 (2006), MIT Press: MIT Press Cambridge, MA), 1273-1280
[38] Mohsenzadeh, Y.; Sheikhzadeh, H.; Reza, Ali M.; Bathaee, N.; Kalayeh, M. M., The relevance sample-feature machinea sparse Bayesian learning approach to joint feature-sample selection, IEEE Trans. Cybern., 43, 2241-2254 (2013)
[39] Sonnenburg, S.; Rätsch, G.; Schäfer, C.; Schölkopf, B., Large scale multiple kernel learning, J. Mach. Learn. Res., 7, 1531-1565 (2006) · Zbl 1222.90072
[40] Damoulas, T.; Girolami, M. A., Combining feature spaces for classification, Pattern Recognit., 42, 11, 2671-2683 (2009) · Zbl 1175.68319
[41] Gönen, M.; Alpaydin, E., Multiple kernel learning algorithms, J. Mach. Learn. Res., 12, 2211-2268 (2011) · Zbl 1280.68167
[42] MacKay, D., The evidence framework applied to classification networks, Neural Comput., 4, 5, 720-736 (1992)
[44] Ben-Dor, A.; Bruhn, L.; Friedman, N.; Nachman, I.; Schummer, M.; Yakhini, Z., Tissue classification with gene expression profiles, J. Comput. Biol., 7, 3-4, 559-583 (2000)
[46] Figueiredo, M. A.T., Adaptive sparseness for supervised learning, IEEE Trans. Pattern Anal. Mach. Intell., 25, 1150-1159 (2003)
[48] Williams, C. K.I.; Barber, D., Bayesian classification with Gaussian process, IEEE Trans. Pattern Anal. Mach. Intell., 20, 1342-1351 (1998)
[50] Manocha, S.; Girolami, M. A., An empirical analysis of the probabilistic k-nearest neighbour classifier, Pattern Recognit. Lett., 28, 13, 1818-1824 (2007)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.