zbMATH — the first resource for mathematics

Sparsity in multiple kernel learning. (English) Zbl 1204.62086
Summary: The problem of multiple kernel learning based on penalized empirical risk minimization is discussed. The complexity penalty is determined jointly by the empirical \(L_{2}\) norms and the reproducing kernel Hilbert space (RKHS) norms induced by the kernels with a data-driven choice of regularization parameters. The main focus is on the case when the total number of kernels is large, but only a relatively small number of them is needed to represent the target function, so that the problem is sparse. The goal is to establish oracle inequalities for the excess risk of the resulting prediction rule showing that the method is adaptive both to the unknown design distribution and to the sparsity of the problem.

62G99 Nonparametric inference
62G08 Nonparametric regression and quantile regression
46N30 Applications of functional analysis in probability theory and statistics
62F12 Asymptotic properties of parametric estimators
62J07 Ridge regression; shrinkage estimators (Lasso)
60E15 Inequalities; stochastic orderings
Full Text: DOI arXiv
[1] Aronszajn, N. (1950). Theory of reproducing kernels. Trans. Amer. Math. Soc. 68 337-404. JSTOR: · Zbl 0037.20701
[2] Bach, F. (2008). Consistency of the group lasso and multiple kernel learning. J. Mach. Learn. Res. 9 1179-1225. · Zbl 1225.68147
[3] Bickel, P., Ritov, Y. and Tsybakov, A. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705-1732. · Zbl 1173.62022
[4] Bousquet, O. and Herrmann, D. (2003). On the complexity of learning the kernel matrix. In Advances in Neural Information Processing Systems 15 415-422. MIT Press, Cambridge.
[5] Blanchard, G., Bousquet, O. and Massart, P. (2008). Statistical performance of support vector machines. Ann. Statist. 36 489-531. · Zbl 1133.62044
[6] Bousquet, O. (2002). A Bennett concentration inequality and its applications to suprema of empirical processes. C. R. Math. Acad. Sci. Paris 334 495-500. · Zbl 1001.60021
[7] Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n. Ann. Statist. 35 2313-2351. · Zbl 1139.62019
[8] Crammer, K., Keshet, J. and Singer, Y. (2003). Kernel design using boosting. In Advances in Neural Information Processing Systems 15 553-560. MIT Press, Cambridge.
[9] Koltchinskii, V. (2008). Oracle inequalities in empirical risk minimization and sparse recovery problems. Ecole d’Eté de Probabilités de Saint-Flour, Lecture Notes.
[10] Koltchinskii, V. (2009a). Sparsity in penalized empirical risk minimization. Ann. Inst. H. Poincaré Probab. Statist. 45 7-57. · Zbl 1168.62044
[11] Koltchinskii, V. (2009b). The Dantzig selector and sparsity oracle inequalities. Bernoulli 15 799-828. · Zbl 1452.62486
[12] Koltchinskii, V. (2009c). Sparse recovery in convex hulls via entropy penalization. Ann. Statist. 37 1332-1359. · Zbl 1269.62039
[13] Koltchinskii, V. and Yuan, M. (2008). Sparse recovery in large ensembles of kernel machines. In Proceedings of 19th Annual Conference on Learning Theory 229-238. Omnipress, Madison, WI.
[14] Lanckriet, G., Cristianini, N., Bartlett, P., Ghaoui, L. and Jordan, M. (2004). Learning the kernel matrix with semidefinite programming. J. Mach. Learn. Res. 5 27-72. · Zbl 1222.68241
[15] Ledoux, M. and Talagrand, M. (1991). Probability in Banach Spaces . Springer, New York. · Zbl 0748.60004
[16] Lin, Y. and Zhang, H. (2006). Component selection and smoothing in multivariate nonparametric regression. Ann. Statist. 34 2272-2297. · Zbl 1106.62041
[17] Meier, L., van de Geer, S. and Bühlmann, P. (2009). High-dimensional additive modeling. Ann. Statist. 37 3779-3821. · Zbl 1360.62186
[18] Mendelson, S. (2002). Geometric parameters of kernel machines. In COLT 2002. Lecture Notes in Artificial Intelligence 2375 29-43. Springer, Berlin. · Zbl 1050.68070
[19] Micchelli, C. and Pontil, M. (2005). Learning the kernel function via regularization. J. Mach. Learn. Res. 6 1099-1125. · Zbl 1222.68265
[20] Raskutti, G., Wainwright, M. and Yu, B. (2009). Lower bounds on minimax rates for nonparametric regression with additive sparsity and smoothness. In Advances in Neural Information Processing Systems (NIPS 22) 1563-1570. Curran Associates, Red Hook, NY.
[21] Ravikumar, P., Liu, H., Lafferty, J. and Wasserman, L. (2008). SpAM: Sparse additive models. In Advances in Neural Information Processing Systems (NIPS 20) 1201-1208. Curran Associates, Red Hook, NY.
[22] Srebro, N. and Ben-David, S. (2006). Learning bounds for support vector machines with learned kernels. In Learning Theory. Lecture Notes in Comput. Sci. 4005 169-183. Springer, Berlin. · Zbl 1143.68561
[23] Talagrand, M. (1996). New concentration inequalities for product measures. Invent. Math. 126 505-563. · Zbl 0893.60001
[24] Tsybakov, A. B. (2009). Introduction to Nonparametric Estimation . Springer, New York. · Zbl 1176.62032
[25] van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes. With Applications to Statistics . Springer, New York. · Zbl 0862.60002
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.