×

Worst-case regret analysis of computationally budgeted online kernel selection. (English) Zbl 1491.68163

Summary: We study the problem of online kernel selection under computational constraints, where the memory or time of kernel selection and online prediction procedures is restricted to a fixed budget. In this paper, we analyze the worst-case lower bounds on the regret of online kernel selection algorithm with a subset of the observed examples, and design algorithms enjoying corresponding upper bounds. We also identify the condition under which online kernel selection with time constraints is different from that with memory constraints. To design algorithms, we reduce the problems to two sequential decision problems, that is, the problem of prediction with expert advice and the multi-armed bandit problem with an additional observation. Our algorithms invent some new techniques, such as memory sharing, hypothesis space discretization and decoupled exploration-exploitation scheme. Numerical experiments on online regression and classification are conducted to verify our theoretical results.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
68W27 Online algorithms; streaming algorithms

Software:

Forgetron
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Agarwal, A., Duchi, J.C., Bartlett, P.L., & Levrard, C. (2011). Oracle inequalities for computationally budgeted model selection. In Proceedings of the 24th Annual Conference on Learning Theory (pp. 69-86).
[2] Agarwal, A., Luo, H., Neyshabur, B., & Schapire, R.E. (2017). Corralling a band of bandit algorithms. In Proceedings of the 30th Annual Conference on Learning Theory (pp. 12-38).
[3] Bubeck, S.; Cesa-Bianchi, N., Regret analysis of stochastic and nonstochastic multi-armed bandit problems, Foundations and Trends®in Machine Learning, 5, 1, 1-122 (2012) · Zbl 1281.91051 · doi:10.1561/2200000024
[4] Bubeck, S.; Devanur, NR; Huang, Z.; Niazadeh, R., Multi-scale online learning: Theory and applications to online auctions and pricing, Journal of Machine Learning Research, 20, 62, 1-37 (2019) · Zbl 1489.68406
[5] Cesa-Bianchi, N., & Lugosi, G. (2006). Prediction, learning, and games. Cambridge University Press. · Zbl 1114.91001
[6] Cesa-Bianchi, N., Mansour, Y., & Shamir, O. (2015). On the complexity of learning with kernels. In Proceedings of the 28th Annual Conference on Learning Theory (pp. 297-325).
[7] Crammer, K.; Kandola, JS; Singer, Y., Online classification on a budget, Advances in Neural Information Processing Systems, 16, 225-232 (2003)
[8] Cutkosky, A.; Boahen, K., Online convex optimization with unconstrained domains and losses, Advances in Neural Information Processing Systems, 29, 748-756 (2016)
[9] Dekel, O.; Shalev-Shwartz, S.; Singer, Y., The forgetron: A kernel-based perceptron on a budget, SIAM Journal on Computing, 37, 5, 1342-1372 (2008) · Zbl 1151.68579 · doi:10.1137/060666998
[10] Foster, DJ; Kale, S.; Mohri, M.; Sridharan, K., Parameter-free online learning via model selection, Advances in Neural Information Processing Systems, 30, 6022-6032 (2017)
[11] Foster, DJ; Krishnamurthy, A.; Luo, H., Model selection for contextual bandits, Advances in Neural Information Processing Systems, 32, 14741-14752 (2019)
[12] Hoi, SCH; Jin, R.; Zhao, P.; Yang, T., Online multiple kernel classification, Machine Learning, 90, 2, 289-316 (2013) · Zbl 1260.68332 · doi:10.1007/s10994-012-5319-2
[13] Jézéquel, R.; Gaillard, P.; Rudi, A., Efficient online learning with kernels for adversarial large scale problems, Advances in Neural Information Processing Systems, 32, 9427-9436 (2019)
[14] Jin, R., Hoi, S.C.H., & Yang, T. (2010). Online multiple kernel learning: Algorithms and mistake bounds. In Proceedings of the 21st International Conference on Algorithmic Learning Theory (pp. 390-404) · Zbl 1306.68131
[15] Kivinen, J.; Smola, AJ; Williamson, RC, Online learning with kernels, IEEE Transactions on Signal Processing, 52, 8, 2165-2176 (2004) · Zbl 1369.68281 · doi:10.1109/TSP.2004.830991
[16] Koppel, A.; Warnell, G.; Stump, E.; Ribeiro, A., Parsimonious online learning with kernels via sparse projections in function space, Journal of Machine Learning Research, 20, 3, 1-44 (2019) · Zbl 1483.68304
[17] Kothari, P.K., & Livni, R. (2020). On the expressive power of kernel methods and the efficiency of kernel learning by association schemes. In Proceedings of the31st International Conferences on Algorithmic Learning Theory (pp 422-450).
[18] Lu, J.; Hoi, SCH; Wang, J.; Zhao, P.; Liu, Z., Large scale online kernel learning, Journal of Machine Learning Research, 17, 47, 1-43 (2016) · Zbl 1360.68690
[19] McMahan, B.; Abernethy, J., Minimax optimal algorithms for unconstrained linear optimization, Advances in Neural Information Processing Systems, 26, 2724-2732 (2013)
[20] McMahan, H.B., & Orabona, F. (2014). Unconstrained online linear learning in hilbert spaces: Minimax algorithms and normal approximations. In Proceedings of The 27th Conference on Learning Theory (pp. 1020-1039).
[21] Muthukumar, V., Ray, M., Sahai, A., & Bartlett, P. (2019). Best of many worlds: Robust model selection for online supervised learning. In Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (pp. 3177-3186).
[22] Nguyen, T.D., Le, T., Bui, H., & Phung, D. (2017). Large-scale online kernel learning with random feature reparameterization. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (pp. 2543-2549).
[23] Orabona, F., Dimension-free exponentiated gradient, Advances in Neural Information Processing Systems, 26, 1806-1814 (2013)
[24] Orabona, F.; Keshet, J.; Caputo, B., Bounded kernel-based online learning, Journal of Machine Learning Research, 10, 2643-2666 (2009) · Zbl 1235.68179
[25] Sahoo, D., Hoi, S.C.H., & Li, B. (2014). Online multiple kernel regression. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD (pp. 293-302).
[26] Seldin, Y., Bartlett, P.L., Crammer, K., & Abbasi-Yadkori, Y. (2014). Prediction with limited advice and multiarmed bandits with paid observations. In Proceedings of the 31st International Conference on Machine Learning (pp. 280-287).
[27] Wang, Z.; Crammer, K.; Vucetic, S., Breaking the curse of kernelization: Budgeted stochastic gradient descent for large-scale SVM training, Journal of Machine Learning Research, 13, 1, 3103-3131 (2012) · Zbl 1433.68383
[28] Yang, T., Mahdavi, M., Jin, R., Yi, J., & Hoi, S.C.H. (2012). Online kernel selection: Algorithms and evaluations. In Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence (pp 1197-1202).
[29] Zhang, L., Yi, J., Jin, R., Lin, M., & He, X. (2013). Online kernel learning with a near optimal sparsity bound. In Proceedings of the 30th International Conference on Machine Learning (pp. 621-629).
[30] Zhang, X., & Liao, S. (2018). Online kernel selection via incremental sketched kernel alignment. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (pp. 3118-3124).
[31] Zhang, X., & Liao, S. (2020). Hypothesis sketching for online kernel selection in continuous kernel space. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (pp. 2498-2504).
[32] Zhao, P., Wang, J., Wu, P., Jin, R., & Hoi, SCH. (2012). Fast bounded online gradient descent algorithms for scalable kernel-based online learning. In Proceedings of the 29th International Conference on Machine Learning (pp. 1075-1082).
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.