# zbMATH — the first resource for mathematics

Fast learning rate of non-sparse multiple kernel learning and optimal regularization strategies. (English) Zbl 1410.62087
The study provides a framework for a fast learning rate for multiple kernel learning (MKL) with arbitrary mixed-norm-type regularization, discussing the bound both in homogeneous as well as inhomogeneous settings. The aim of the paper is to theoretically demonstrate the superior performance of dense type MKL methods obtained within numerical experimentation.
##### MSC:
 62H15 Hypothesis testing in multivariate analysis 68T05 Learning and adaptive systems in artificial intelligence
##### Software:
DeMat; hgam; SpicyMKL
Full Text:
##### References:
 [1] J. Aflalo, A. Ben-Tal, C. Bhattacharyya, J. S. Nath, and S. Raman. Variable sparsity kernel learning., Journal of Machine Learning Research, 12:565–592, 2011. · Zbl 1280.68152 [2] A. Argyriou, R. Hauser, C. A. Micchelli, and M. Pontil. A DC-programming algorithm for kernel selection. In, the 23st International Conference on Machine Learning, 2006. [3] F. R. Bach. Consistency of the group lasso and multiple kernel learning., Journal of Machine Learning Research, 9 :1179–1225, 2008. · Zbl 1225.68147 [4] F. R. Bach. Exploring large feature spaces with hierarchical multiple kernel learning. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems 21, pages 105–112. 2009. [5] F. R. Bach, G. Lanckriet, and M. Jordan. Multiple kernel learning, conic duality, and the SMO algorithm. In, the 21st International Conference on Machine Learning, pages 41–48, 2004. [6] P. Bartlett, O. Bousquet, and S. Mendelson. Local Rademacher complexities., The Annals of Statistics, 33 :1487–1537, 2005. · Zbl 1083.62034 [7] P. Bartlett, M. Jordan, and D. McAuliffe. Convexity, classification, and risk bounds., Journal of the American Statistical Association, 101:138–156, 2006. · Zbl 1118.62330 [8] C. Bennett and R. Sharpley., Interpolation of Operators. Academic Press, Boston, 1988. · Zbl 0647.46057 [9] O. Bousquet. A Bennett concentration inequality and its application to suprema of empirical process., C. R. Acad. Sci. Paris Ser. I Math., 334:495–500, 2002. · Zbl 1001.60021 [10] U. Chakraborty, editor., Advances in Differential Evolution (Studies in Computational Intelligence). Springer, 2008. [11] C. Cortes, M. Mohri, and A. Rostamizadeh. Learning non-linear combinations of kernels. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 396–404. 2009a. [12] C. Cortes, M. Mohri, and A. Rostamizadeh. $$L_2$$ regularization for learning kernels. In, the 25th Conference on Uncertainty in Artificial Intelligence (UAI 2009), 2009b. Montréal, Canada. · Zbl 1283.68286 [13] C. Cortes, M. Mohri, and A. Rostamizadeh. Generalization bounds for learning kernels. In, Proceedings of the 27th International Conference on Machine Learning, 2010. · Zbl 1283.68286 [14] D. E. Edmunds and H. Triebel., Function Spaces, Entropy Numbers, Differential Operators. Cambridge, Cambridge, 1996. · Zbl 0865.46020 [15] E. Giné and R. Nickl., Mathematical Foundations of Infinite-Dimensional Statistical Models. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2015. [16] G. S. Kimeldorf and G. Wahba. Some results on Tchebycheffian spline functions., Journal of Mathematical Analysis and Applications, 33:82–95, 1971. · Zbl 0201.39702 [17] M. Kloft, U. Brefeld, S. Sonnenburg, P. Laskov, K.-R. Müller, and A. Zien. Efficient and accurate $$ℓ_p$$-norm multiple kernel learning. In, Advances in Neural Information Processing Systems 22, pages 997 –1005, Cambridge, MA, 2009. MIT Press. · Zbl 1433.68356 [18] M. Kloft, U. Rückert, and P. L. Bartlett. A unifying view of multiple kernel learning. In, Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML/PKDD), 2010. · Zbl 1280.68173 [19] M. Kloft, U. Brefeld, S. Sonnenburg, and A. Zien. $$ℓ_p$$-norm multiple kernel learning, 2011. · Zbl 1118.62065 [20] V. Koltchinskii. Local Rademacher complexities and oracle inequalities in risk minimization., The Annals of Statistics, 34 :2593–2656, 2006. · Zbl 1280.68173 [21] V. Koltchinskii and M. Yuan. Sparse recovery in large ensembles of kernel machines. In, Proceedings of the Annual Conference on Learning Theory, pages 229–238, 2008. · Zbl 1118.62065 [22] V. Koltchinskii and M. Yuan. Sparsity in multiple kernel learning., The Annals of Statistics, 38(6) :3660–3695, 2010. · Zbl 1186.90004 [23] K. P.. R. M. S.. J. A. Lampinen., Differential Evolution - A Practical Approach to Global Optimization. Springer, 2005. · Zbl 1204.62086 [24] G. Lanckriet, N. Cristianini, L. E. Ghaoui, P. Bartlett, and M. Jordan. Learning the kernel matrix with semi-definite programming., Journal of Machine Learning Research, 5:27–72, 2004. · Zbl 1186.90004 [25] M. Ledoux and M. Talagrand., Probability in Banach Spaces. Isoperimetry and Processes. Springer, New York, 1991. MR1102015. · Zbl 1222.68241 [26] L. Meier, S. van de Geer, and P. Bühlmann. High-dimensional additive modeling., The Annals of Statistics, 37(6B) :3779–3821, 2009. · Zbl 0748.60004 [27] C. A. Micchelli and M. Pontil. Learning the kernel function via regularization., Journal of Machine Learning Research, 6 :1099–1125, 2005. · Zbl 1360.62186 [28] C. A. Micchelli, M. Pontil, Q. Wu, and D.-X. Zhou. Error bounds for learning the kernel., Analysis and Applications, 14(06):849–868, 2016. · Zbl 1222.68265 [29] C. S. Ong, A. J. Smola, and R. C. Williamson. Learning the kernel with hyperkernels., Journal of Machine Learning Research, 6 :1043–1071, 2005. · Zbl 1392.68357 [30] G. Raskutti, M. Wainwright, and B. Yu. Lower bounds on minimax rates for nonparametric regression with additive sparsity and smoothness. In, Advances in Neural Information Processing Systems 22, pages 1563–1570. MIT Press, Cambridge, MA, 2009. · Zbl 1222.68277 [31] G. Raskutti, M. Wainwright, and B. Yu. Minimax-optimal rates for sparse additive models over kernel classes via convex programming. Technical report, 2010., arXiv:1008.3654. · Zbl 1283.62071 [32] B. Schölkopf and A. J. Smola., Learning with Kernels. MIT Press, Cambridge, MA, 2002. · Zbl 1283.62071 [33] J. Shawe-Taylor. Kernel learning for novelty detection. In, NIPS 2008 Workshop on Kernel Learning: Automatic Selection of Optimal Kernels, Whistler, 2008. · Zbl 1143.68561 [34] J. Shawe-Taylor and N. Cristianini., Kernel Methods for Pattern Analysis. Cambridge University Press, 2004. · Zbl 1203.68171 [35] N. Srebro and S. Ben-David. Learning bounds for support vector machines with learned kernels. In, Proceedings of the Annual Conference on Learning Theory, 2006. · Zbl 0994.68074 [36] I. Steinwart., Support Vector Machines. Springer, 2008. · Zbl 1143.68561 [37] I. Steinwart, D. Hush, and C. Scovel. Optimal rates for regularized least squares regression. In, Proceedings of the Annual Conference on Learning Theory, pages 79–93, 2009. · Zbl 1203.68171 [38] T. Suzuki and M. Sugiyama. Fast learning rate of multiple kernel learning: trade-off between sparsity and smoothness., The Annals of Statistics, 41(3) :1381–1405, 2013. · Zbl 0893.60001 [39] T. Suzuki and R. Tomioka. SpicyMKL: A fast algorithm for multiple kernel learning with thousands of kernels., Machine Learning, 85:77–108, 2011. · Zbl 1273.62090 [40] M. Talagrand. New concentration inequalities in product spaces., Inventiones Mathematicae, 126:505–563, 1996. · Zbl 1237.68166 [41] R. Tomioka and T. Suzuki. Sparsity-accuracy trade-off in MKL. In, NIPS 2009 Workshop: Understanding Multiple Kernel Learning Methods, Whistler, 2009. · Zbl 0893.60001 [42] S. van de Geer., Empirical Processes in M-Estimation. Cambridge University Press, 2000. · Zbl 1179.62073 [43] A. W. van der Vaart and J. A. Wellner., Weak Convergence and Empirical Processes: With Applications to Statistics. Springer, New York, 1996. · Zbl 1179.62073 [44] M. Varma and B. R. Babu. More generality in efficient multiple kernel learning. In, The 26th International Conference on Machine Learning, 2009. · Zbl 0862.60002 [45] Q. Wu, Y. Ying, and D.-X. Zhou. Multi-kernel regularized classifiers., Journal of Complexity, 23(1):108–134, 2007. · Zbl 1171.65043 [46] Y. Yang and A. Barron. Information-theoretic determination of minimax rates of convergence., The Annals of Statistics, 27(5) :1564–1599, 1999. · Zbl 1171.65043 [47] Y. Ying and C. Campbell. Generalization bounds for learning the kernel. In S. Dasgupta and A. Klivans, editors, Proceedings of the Annual Conference on Learning Theory, Montreal Quebec, 2009. Omnipress. · Zbl 0978.62008 [48] Y. Ying and D.-X. Zhou. Learnability of gaussians with flexible variances., Journal of Machine Learning Research, 8(Feb):249–276, 2007. · Zbl 1222.68339 [49] M. Yuan and Y. Lin. Model selection and estimation in regression with grouped variables., Journal of The Royal Statistical Society Series B, 68(1):49–67, 2006. · Zbl 1222.68339 [50] M. Yuan and Y. Lin. Model selection and estimation in regression with grouped variables., Journal of The Royal Statistical Society Series B, 68(1):49-67, 2006. · Zbl 1141.62030
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.