zbMATH — the first resource for mathematics

Fast greedy \(\mathcal{C} \)-bound minimization with guarantees. (English) Zbl 07289245
Summary: The \(\mathcal{C} \)-bound is a tight bound on the true risk of a majority vote classifier that relies on the individual quality and pairwise disagreement of the voters and provides PAC-Bayesian generalization guarantees. Based on this bound, MinCq is a classification algorithm that returns a dense distribution on a finite set of voters by minimizing it. Introduced later and inspired by boosting, CqBoost uses a column generation approach to build a sparse \(\mathcal{C} \)-bound optimal distribution on a possibly infinite set of voters. However, both approaches have a high computational learning time because they minimize the \(\mathcal{C} \)-bound by solving a quadratic program. Yet, one advantage of CqBoost is its experimental ability to provide sparse solutions. In this work, we address the problem of accelerating the \(\mathcal{C} \)-bound minimization process while keeping the sparsity of the solution and without losing accuracy. We present CB-Boost, a computationally efficient classification algorithm relying on a greedy-boosting-based-\( \mathcal{C} \)-bound optimization. An in-depth analysis proves the optimality of the greedy minimization process and quantifies the decrease of the \(\mathcal{C} \)-bound operated by the algorithm. Generalization guarantees are then drawn based on already existing PAC-Bayesian theorems. In addition, we experimentally evaluate the relevance of CB-Boost in terms of the three main properties we expect about it: accuracy, sparsity, and computational efficiency compared to MinCq, CqBoost, Adaboost and other ensemble methods. As observed in these experiments, CB-Boost not only achieves results comparable to the state of the art, but also provides \(\mathcal{C} \)-bound sub-optimal weights with very few computational demand while keeping the sparsity property of CqBoost.
68T05 Learning and adaptive systems in artificial intelligence
Full Text: DOI
[1] Breiman, L., Bagging predictors, Machine Learning, 24, 2, 123-140 (1996) · Zbl 0858.68080
[2] Breiman, L., Random forests, Machine Learning, 45, 1, 5-32 (2001) · Zbl 1007.68152
[3] Catoni, O. (2007). PAC-Bayesian supervised classification: the thermodynamics of statistical learning. arXiv preprint arXiv:0712.0248. · Zbl 1277.62015
[4] Cortes, C., Mohri, M., & Syed, U. (2014). Deep boosting. In: Proceedings of the thirty-first international conference on machine learning (ICML) (2014).
[5] Demiriz, A.; Bennett, KP; Shawe-Taylor, J., Linear programming boosting via column generation, Machine Learning, 46, 1, 225-254 (2002) · Zbl 0998.68105
[6] Dua, D., & Graff, C. (2017). UCI machine learning repository.
[7] Dziugaite, G.K., & Roy, D.M. (2018). Data-dependent PAC-Bayes priors via differential privacy. In: Advances in neural information processing systems, pp. 8440-8450.
[8] Freund, Y.; Schapire, RE, A decision-theoretic generalization of on-line learning and an application to boosting, Journal of Computer and System Sciences, 55, 1, 119-139 (1997) · Zbl 0880.68103
[9] Friedman, JH, Greedy function approximation: A gradient boosting machine, The Annals of Statistics, 29, 5, 1189-1232 (2001) · Zbl 1043.62034
[10] Germain, P., Lacasse, A., Laviolette, F., & Marchand, M. (2009). PAC-Bayesian learning of linear classifiers. In: Proceedings of the 26th ICML, pp. 353-360. ACM.
[11] Germain, P.; Lacasse, A.; Laviolette, F.; Marchand, M.; Roy, JF, Risk bounds for the majority vote: From a PAC-Bayesian analysis to a learning algorithm, Journal of Machine Learning Research, 16, 1, 787-860 (2015) · Zbl 1337.68223
[12] Lacasse, A., Laviolette, F., Marchand, M., Germain, P., & Usunier, N. (2006). PAC-Bayes bounds for the risk of the majority vote and the variance of the Gibbs classifier. In: B. Schölkopf, J.C. Platt, T. Hoffman (Eds.) Advances in neural information processing systems 19, pp. 769-776. MIT Press.
[13] Lampert, C.H., Nickisch, H., & Harmeling, S. (2009). Learning to detect unseen object classes by between-class attribute transfer. In: 2009 IEEE Conference on computer vision and pattern recognition, pp. 951-958. IEEE.
[14] Lanckriet, GRG; Cristianini, N.; Bartlett, P.; Ghaoui, LE; Jordan, MI, Learning the kernel matrix with semidefinite programming, Journal of Machine Learning Research, 5, 27-72 (2004) · Zbl 1222.68241
[15] Langford, J., & Shawe-Taylor, J. (2003). PAC-Bayes & margins. In: Advances in neural information processing systems, pp. 439-446.
[16] LeCun, Y., & Cortes, C. (2010). MNIST handwritten digit database.
[17] Marchand, M.; Taylor, JS, The set covering machine, Journal of Machine Learning Research, 3, 723-746 (2003) · Zbl 1112.68392
[18] McAllester, DA, Some PAC-Bayesian theorems, Machine Learning, 37, 3, 355-363 (1999) · Zbl 0945.68157
[19] McAllester, DA, PAC-Bayesian stochastic model selection, Machine Learning, 51, 1, 5-21 (2003) · Zbl 1056.68122
[20] Parrado-Hernández, E.; Ambroladze, A.; Shawe-Taylor, J.; Sun, S., PAC-Bayes bounds with data dependent priors, Journal of Machine Learning Research, 13, 3507-3531 (2012) · Zbl 1433.68373
[21] Roy, J.F., Marchand, M., & Laviolette, F. (2016) A column generation bound minimization approach with PAC-Bayesian generalization guarantees. In: A. Gretton, C.C. Robert (eds.) Proceedings of the 19th international conference on artificial intelligence and statistics, proceedings of machine learning research, vol. 51, pp. 1241-1249. PMLR, Cadiz, Spain. http://proceedings.mlr.press/v51/roy16.html.
[22] Schapire, RE; Freund, Y., Boosting: foundations and algorithms (2012), Cambridge: The MIT Press, Cambridge · Zbl 1278.68021
[23] Seeger, M., PAC-Bayesian generalisation error bounds for gaussian process classification, Journal of Machine Learning Research, 3, 233-269 (2002) · Zbl 1088.68745
[24] Seldin, Y.; Cesa-Bianchi, N.; Auer, P.; Laviolette, F.; Shawe-Taylor, J., PAC-Bayes-bernstein inequality for martingales and its application to multiarmed bandits, Proceedings of the Workshop on On-line Trading of Exploration and Exploitation, 2, 98-111 (2012)
[25] Sonnenburg, S.; Rätsch, G.; Schäfer, C.; Schölkopf, B., Large scale multiple kernel learning, Journal of Machine Learning Research, 7, 1531-1565 (2006) · Zbl 1222.90072
[26] Valiant, LG, A theory of the learnable, Communications of the ACM, 27, 11, 1134-1142 (1984) · Zbl 0587.68077
[27] Zhu, J.; Rosset, S.; Zou, H.; Hastie, T., Multi-class adaboost, Statistics and its Interface (2006)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.