×

Average stability is invariant to data preconditioning. Implications to exp-concave empirical risk minimization. (English) Zbl 1473.62257

Summary: We show that the average stability notion introduced by M. Kearns and D. Ron [“Algorithmic stability and sanity-check bounds for leave-one-out cross-validation”, Neural Comput. 11, No. 6, 1427–1453 (1999; doi:10.1162/089976699300016304)], O. Bousquet and A. Elisseeff [J. Mach. Learn. Res. 2, No. 3, 499–526 (2002; Zbl 1007.68083)] is invariant to data preconditioning, for a wide class of generalized linear models that includes most of the known exp-concave losses. In other words, when analyzing the stability rate of a given algorithm, we may assume the optimal preconditioning of the data. This implies that, at least from a statistical perspective, explicit regularization is not required in order to compensate for ill-conditioned data, which stands in contrast to a widely common approach that includes a regularization for analyzing the sample complexity of generalized linear models. Several important implications of our findings include: a) We demonstrate that the excess risk of empirical risk minimization (ERM) is controlled by the preconditioned stability rate. This immediately yields a relatively short and elegant proof for the fast rates attained by ERM in our context. b) We complement the recent bounds of M. Hardt, B. and Y. Singer [“Train faster, generalize better: stability of stochastic gradient descent”, Preprint, arXiv:1509.01240] on the stability rate of the Stochastic Gradient Descent algorithm.

MSC:

62J12 Generalized linear models (logistic models)
68T05 Learning and adaptive systems in artificial intelligence

Citations:

Zbl 1007.68083
PDFBibTeX XMLCite
Full Text: arXiv Link

References:

[1] Katy S Azoury and Manfred K Warmuth. Relative loss bounds for on-line density estimation with the exponential family of distributions. Machine Learning, 43(3):211–246, 2001. · Zbl 0988.68173
[2] Peter L Bartlett and Shahar Mendelson. Rademacher and gaussian complexities: Risk bounds and structural results. The Journal of Machine Learning Research, 3:463–482, 2003. · Zbl 1084.68549
[3] Peter L Bartlett, Olivier Bousquet, and Shahar Mendelson. Local rademacher complexities. Annals of Statistics, pages 1497–1537, 2005. · Zbl 1083.62034
[4] Olivier Bousquet and Andr´e Elisseeff. Stability and generalization. The Journal of Machine Learning Research, 2:499–526, 2002. · Zbl 1007.68083
[5] S´ebastien Bubeck. Convex optimization: Algorithms and complexity. Foundations and Trends in Machine Learning, 8(3-4):231–357, 2015.R · Zbl 1365.90196
[6] Nicolo Cesa-Bianchi, Alex Conconi, and Claudio Gentile. On the generalization ability of online learning algorithms. Information Theory, IEEE Transactions on, 50(9):2050–2057, 2004. · Zbl 1295.68182
[7] John B Conway. A course in functional analysis, volume 96. Springer Science & Business Media, 2013. · Zbl 0706.46003
[8] Alon Gonen, Francesco Orabona, and Shai Shalev-Shwartz. Solving ridge regression using sketched preconditioned svrg. arXiv preprint arXiv:1602.02350, 2016.
[9] Moritz Hardt, Benjamin Recht, and Yoram Singer. Train faster, generalize better: Stability of stochastic gradient descent. arXiv preprint arXiv:1509.01240, 2015.
[10] Elad Hazan, Amit Agarwal, and Satyen Kale. Logarithmic regret algorithms for online convex optimization. Machine Learning, 69(2-3):169–192, 2007. · Zbl 1143.90371
[11] Sham M Kakade, Karthik Sridharan, and Ambuj Tewari. On the complexity of linear prediction: Risk bounds, margin bounds, and regularization.In Advances in neural information processing systems, pages 793–800, 2009.
[12] Michael Kearns and Dana Ron. Algorithmic stability and sanity-check bounds for leaveone-out cross-validation. Neural Computation, 11(6):1427–1453, 1999.
[13] Jyrki Kivinen and Manfred K Warmuth. Averaging expert predictions. In Computational Learning Theory, pages 153–167. Springer, 1999.
[14] V Koltchinskii. Oracle inequalities in empirical risk minimization and sparse recovery problems: Lecture notes. Technical report, Technical report, Ecole dete de Probabilites de Saint-Flour, 2008. 12.6, 2008. · Zbl 1223.91002
[15] Tomer Koren and Kfir Levy. Fast rates for exp-concave empirical risk minimization. In Advances in Neural Information Processing Systems, pages 1477–1485, 2015. 12
[16] Mehrdad Mahdavi, Lijun Zhang, and Rong Jin. Lower and upper bounds on the generalization of stochastic exponentially concave optimization. In Proceedings of The 28th Conference on Learning Theory, pages 1305–1320, 2015.
[17] Nishant A Mehta. From exp-concavity to variance control: O (1/n) rates and online-tobatch conversion with high probability. CoRR, 2016.
[18] Sayan Mukherjee, Partha Niyogi, Tomaso Poggio, and Ryan Rifkin. Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization. Advances in Computational Mathematics, 25(1-3):161–193, 2006. · Zbl 1099.68693
[19] Francesco Orabona, Nicolo Cesa-Bianchi, and Claudio Gentile. Beyond logarithmic bounds in online learning. In AISTATS, pages 823–831, 2012.
[20] Shai Shalev-Shwartz. Online learning and online convex optimization. Foundations and Trends in Machine Learning, 4(2):107–194, 2011. · Zbl 1253.68190
[21] Shai Shalev-Shwartz and Shai Ben-David. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, 2014. · Zbl 1305.68005
[22] Shai Shalev-Shwartz, Ohad Shamir, Nathan Srebro, and Karthik Sridharan. Learnability, stability and uniform convergence. The Journal of Machine Learning Research, 11:2635– 2670, 2010. · Zbl 1242.68247
[23] Ohad Shamir. The sample complexity of learning linear predictors with the squared loss. arXiv preprint arXiv:1406.5143, 2014. · Zbl 1351.68233
[24] Volodya Vovk. Competitive on-line statistics. International Statistical Review, 69(2):213– 248, 2001. · Zbl 1211.62200
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.