Online multiple kernel classification. (English) Zbl 1260.68332

Summary: Although both online learning and kernel learning have been studied extensively in machine learning, there is limited effort in addressing the intersecting research problems of these two important topics. As an attempt to fill the gap, we address a new research problem, termed online multiple kernel classification (OMKC), which learns a kernel-based prediction function by selecting a subset of predefined kernel functions in an online learning fashion. OMKC is in general more challenging than typical online learning because both the kernel classifiers and the subset of selected kernels are unknown, and more importantly the solutions to the kernel classifiers and their combination weights are correlated. The proposed algorithms are based on the fusion of two online learning algorithms, i.e., the Perceptron algorithm that learns a classifier for a given kernel, and the Hedge algorithm that combines classifiers by linear weights. We develop stochastic selection strategies that randomly select a subset of kernels for combination and model updating, thus improving the learning efficiency. Our empirical study with 15 data sets shows promising performance of the proposed algorithms for OMKC in both learning efficiency and prediction accuracy.


68T05 Learning and adaptive systems in artificial intelligence
Full Text: DOI Link


[1] Agmon, S. (1954). The relaxation method for linear inequalities. Canadian Journal of Mathematics, 6(3), 382-392. · Zbl 0055.35001
[2] Argyriou, A.; Hauser, R.; Micchelli, C. A.; Pontil, M., A dc-programming algorithm for kernel selection, 41-48 (2006)
[3] Auer, P., Cesa-Bianchi, N., Freund, Y., & Schapire, R. E. (2003). The nonstochastic multiarmed bandit problem. SIAM Journal on Computing, 32(1), 48-77. · Zbl 1029.68087
[4] Bach, F. R.; Lanckriet, G. R. G.; Jordan, M. I., Multiple kernel learning, conic duality, and the smo algorithm, 6 (2004), New York
[5] Cavallanti, G., Cesa-Bianchi, N., & Gentile, C. (2007). Tracking the best hyperplane with a simple budget perceptron. Machine Learning, 69(2-3), 143-167. · Zbl 1470.68090
[6] Cesa-Bianchi, N., & Lugosi, G. (2006). Prediction, learning, and games. New York: Cambridge University Press. · Zbl 1114.91001
[7] Cesa-Bianchi, N., Mansour, Y., & Stoltz, G. (2007). Improved second-order bounds for prediction with expert advice. Machine Learning, 66(2-3), 321-352. · Zbl 1471.91074
[8] Chapelle, O.; Weston, J.; Schölkopf, B., Cluster kernels for semi-supervised learning, 585-592 (2002)
[9] Chaudhuri, K.; Freund, Y.; Hsu, D., A parameter-free hedging algorithm, No. 22, 297-305 (2009)
[10] Chen, J.; Ye, J., Training svm with indefinite kernels, Helsinki, Finland
[11] Chen, J.; Ye, J., Multiple indefinite kernel learning with mixed norm regularization, Montreal, Canada
[12] Chen, Y.; Gupta, M. R.; Recht, B., Learning kernels from indefinite similarities, Montreal, Quebec, Canada
[13] Crammer, K., & Singer, Y. (2003). Ultraconservative online algorithms for multiclass problems. Journal of Machine Learning Research, 3, 951-991. · Zbl 1112.68497
[14] Crammer, K.; Kandola, J. S.; Singer, Y., Online classification on a budget (2003)
[15] Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., & Singer, Y. (2006). Online passive-aggressive algorithms. Journal of Machine Learning Research, 7, 551-585. · Zbl 1222.68177
[16] Cristianini, N.; Shawe-Taylor, J.; Elisseeff, A.; Kandola, J. S., On kernel-target alignment, 367-373 (2001)
[17] Dekel, O., From online to batch learning with cutoff-averaging, 377-384 (2008)
[18] Dekel, O.; Singer, Y., Data-driven online to batch conversions (2005)
[19] Dekel, O., Shalev-Shwartz, S., & Singer, Y. (2008). The forgetron: a kernel-based perceptron on a budget. SIAM Journal on Computing, 37(5), 1342-1372. · Zbl 1151.68579
[20] Dredze, M.; Crammer, K., Confidence-weighted linear classification, 264-271 (2008), New York
[21] Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119-139. · Zbl 0880.68103
[22] Freund, Y., & Schapire, R. E. (1999). Large margin classification using the perceptron algorithm. Machine Learning, 37(3), 277-296. · Zbl 0944.68535
[23] Gentile, C. (2001). A new approximate maximal margin classification algorithm. Journal of Machine Learning Research, 2, 213-242. · Zbl 1037.68124
[24] Herrmann, D. J. L.; Bousquet, O., On the complexity of learning the kernel matrix, 399-406 (2002)
[25] Ho, T. K.; Kleinberg, E. M., Building projectable classifiers of arbitrary complexity, 880 (1996), Washington
[26] Hoi, S. C. H.; Lyu, M. R.; Chang, E. Y., Learning the unified kernel machines for classification, Philadelphia, US
[27] Hoi, S. C.; Jin, R.; Lyu, M. R., Learning non-parametric kernel matrices from pairwise constraints, OR, US
[28] Hutter, M., & Poland, J. (2005). Adaptive online prediction by following the perturbed leader. Journal of Machine Learning Research, 6, 639-660. · Zbl 1222.68223
[29] Ji, S., Sun, L., Jin, R., & Ye, J. (2008). Multi-label multiple kernel learning. Advances in Neural Information Processing Systems.
[30] Jie, L.; Orabona, F.; Fornoni, M.; Caputo, B.; Cesa-bianchi, N., Om-2: an online multi-class multi-kernel learning algorithm (2010)
[31] Jin, R.; Hoi, S. C. H.; Yang, T., Online multiple kernel learning: algorithms and mistake bounds, 390-404 (2010) · Zbl 1306.68131
[32] Jyrki Kivinen, A. J. S., & Williamson, R. C. (2004). Online learning with kernels. IEEE Transactions on Signal Processing, 52(8), 2165-2176. · Zbl 1369.68281
[33] Kalai, A., & Vempala, S. (2005). Efficient algorithms for online decision problems. Journal of Computer and System Sciences, 71(3), 291-307. · Zbl 1094.68112
[34] Kashima, H.; Tsuda, K.; Inokuchi, A., Marginalized kernels between labeled graphs, 321-328 (2003)
[35] Kivinen, J.; Smola, A. J.; Williamson, R. C., Online learning with kernels, 785-792 (2001)
[36] Kondor, R. I.; Lafferty, J. D., Diffusion kernels on graphs and other discrete input spaces, 315-322 (2002)
[37] Kulis, B.; Sustik, M.; Dhillon, I., Learning low-rank kernel matrices, 505-512 (2006), Pittsburgh, Pennsylvania
[38] Kulis, B., Sustik, M. A., & Dhillon, I. S. (2009). Low-rank kernel learning with Bregman matrix divergences. Journal of Machine Learning Research, 10, 341-376. · Zbl 1235.68166
[39] Kwok, J.; Tsang, I., Learning with idealized kernels, 400-407 (2003)
[40] Lanckriet, G. R. G., Cristianini, N., Bartlett, P., Ghaoui, L. E., & Jordan, M. I. (2004). Learning the kernel matrix with semidefinite programming. Journal of Machine Learning Research, 5, 27-72. · Zbl 1222.68241
[41] Lee, W.-J.; Verzakov, S.; Duin, R. P. W., Kernel combination versus classifier combination, 22-31 (2007)
[42] Li, Y., & Long, P. M. (2002). The relaxed online maximum margin algorithm. Machine Learning, 46(1-3), 361-387. · Zbl 0998.68110
[43] Littlestone, N.; Warmuth, M. K., The weighted majority algorithm, Research Triangle Park, North Carolina · Zbl 0804.68121
[44] Littlestone, N., & Warmuth, M. K. (1994). The weighted majority algorithm. Information and Computation, 108(2), 212-261. · Zbl 0804.68121
[45] Martins, A. F. T., Smith, N. A., Xing, E. P., Aguiar, P. M. Q., & Figueiredo, M. A. T. (2011). Online learning of structured predictors with multiple kernels. Journal of Machine Learning Research. Proceedings Track, 15, 507-515.
[46] Novikoff, A., On convergence proofs on perceptrons, No. XII, 615-622 (1962)
[47] Orabona, F.; Keshet, J.; Caputo, B., The projectron: a bounded kernel-based perceptron, 720-727 (2008)
[48] Orabona, F.; Luo, J.; Caputo, B., Online-batch strongly convex multi kernel learning, 787-794 (2010)
[49] Platt, J. C. (1999). Fast training of support vector machines using sequential minimal optimization (pp. 185-208).
[50] Rakotomamonjy, A., Bach, F. R., Canu, S., & Grandvalet, Y. (2008). Simplemkl. Journal of Machine Learning Research, 11, 2491-2521. · Zbl 1225.68208
[51] Rosenblatt, F. (1958). The perceptron: a probabilistic model for information storage and organization in the brain. Psychological Review, 65, 386-407.
[52] Shalev-Shwartz, S. (2007). Online learning: theory, algorithms, and applications. Ph.D. thesis. · Zbl 1470.68172
[53] Shawe-Taylor, J., & Cristianini, N. (2004). Kernel methods for pattern analysis. New York: Cambridge University Press.
[54] Sonnenburg, S., Rätsch, G., Schäfer, C., & Schölkopf, B. (2006). Large scale multiple kernel learning. Journal of Machine Learning Research, 7, 1531-1565. · Zbl 1222.90072
[55] Tang, L.; Chen, J.; Ye, J., On multiple kernel learning with multiple labels, 1255-1260 (2009)
[56] Vovk, V. (1998). A game of prediction with expert advice. Journal of Computer and System Sciences, 56(2), 153-173. · Zbl 0945.68528
[57] Wang, J.; Zhao, P.; Hoi, S. C. H., Exact soft confidence-weighted learning (2012)
[58] Xu, Z.; Jin, R.; King, I.; Lyu, M. R., An extended level method for efficient multiple kernel learning, No. 22 (2008)
[59] Yang, H.; Xu, Z.; King, I.; Lyu, M. R., Online learning for group lasso, 1191-1198 (2010)
[60] Yang, T.; Jin, R.; Jain, A. K., Learning from noisy side information by generalized maximum entropy model, 1199-1206 (2010)
[61] Zhao, P., Hoi, S. C. H., & Jin, R. (2011). Double updating online learning. Journal of Machine Learning Research, 12, 1587-1615. · Zbl 1280.68222
[62] Zhao, P.; Wang, J.; Wu, P.; Jin, R.; Hoi, S. C. H., Fast bounded online gradient descent algorithms for scalable kernel-based online learning (2012)
[63] Zhu, X.; Kandola, J. S.; Ghahramani, Z.; Lafferty, J. D., Nonparametric transforms of graph kernels for semi-supervised learning (2004)
[64] Zhuang, J.; Tsang, I. W.; Hoi, S. C. H., Simplenpkl: simple non-parametric kernel learning, 1273-1280 (2009)
[65] Zhuang, J., Tsang, I. W., & Hoi, S. C. H. (2011a). A family of simple non-parametric kernel learning algorithms. Journal of Machine Learning Research, 12, 1313-1347. · Zbl 1280.68223
[66] Zhuang, J., Tsang, I. W., & Hoi, S. C. H. (2011b). Two-layer multiple kernel learning. Journal of Machine Learning Research. Proceedings Track, 15, 909-917.
[67] Zhuang, J., Wang, J., Hoi, S. C. H., & Lan, X. (2011c). Unsupervised multiple kernel learning. Journal of Machine Learning Research. Proceedings Track, 20, 129-144.
[68] Zien, A.; Ong, C. S., Multiclass multiple kernel learning, Corvalis, Oregon
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.