Sparse weighted voting classifier selection and its linear programming relaxations.(English)Zbl 1243.68239

Summary: We consider the problem of minimizing the number of misclassifications of a weighted voting classifier, plus a penalty proportional to the number of nonzero weights. We first prove that its optimum is at least as hard to approximate as the minimum disagreement halfspace problem for a wide range of penalty parameter values. After formulating the problem as a mixed integer program (MIP), we show that common “soft margin” linear programming (LP) formulations for constructing weighted voting classsifiers are equivalent to an LP relaxation of our formulation. We show that this relaxation is very weak, with a potentially exponential integrality gap. However, we also show that augmenting the relaxation with certain valid inequalities tightens it considerably, yielding a linear upper bound on the gap for all values of the penalty parameter that exceed a reasonable threshold. Unlike earlier techniques proposed for similar problems [P. S. Bradley, O. L. Mangasarian and W. N. Street, INFORMS J. Comput. 10, No. 2, 209–217 (1998; Zbl 1034.90529); J. Weston et al., J. Mach. Learn. Res. 3, No. 7–8, 1439–1461 (2003; Zbl 1102.68605)], our approach provides bounds on the optimal solution value.

MSC:

 68T05 Learning and adaptive systems in artificial intelligence 90C11 Mixed integer programming 90C05 Linear programming 90C90 Applications of mathematical programming 68Q17 Computational difficulty of problems (lower bounds, completeness, difficulty of approximation, etc.)

Citations:

Zbl 1034.90529; Zbl 1102.68605

Software:

 [1] Alon, N.; Vu, V., Anti-Hadamard matrices, coin weighing, threshold gates, and indecomposable hypergraphs, J. combin. theory ser. A, 79, 133-160, (1997) · Zbl 0890.05011 [2] Amaldi, E.; Kann, V., On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems, Theoret. comput. sci., 209, 237-260, (1998) · Zbl 0915.68072 [3] Arora, S.; Babai, L.; Stern, J.; Sweedyk, Z., The hardness of approximate optima in lattices, codes, and systems of linear equations, J. comput. system sci., 54, 317-331, (1997) · Zbl 0877.68067 [4] Bradley, P.; Mangasarian, O., Feature selection via mathematical programming, INFORMS J. comput., 10, 209-217, (1998) · Zbl 1034.90529 [5] Brenner, J., The Hadamard maximum determinant problem, Amer. math. monthly, 79, 626-630, (1972) · Zbl 0249.15003 [6] Demiriz, A.; Bennett, K.; Shawe-Taylor, J., Linear programming boosting via column generation, Mach. learn., 46, 225-254, (2002) · Zbl 0998.68105 [7] Dinur, I.; Safra, S., On the hardness of approximating label-cover, Inform. process. lett., 89, 247-254, (2004) · Zbl 1178.68676 [8] Freund, Y.; Schapire, R., A decision-theoretic generalization of on-line learning and an application to boosting, J. comput. system sci., 55, 119-139, (1997) · Zbl 0880.68103 [9] N. Goldberg, J. Eckstein, Boosting classifiers with tightened $$L_0$$-relaxation penalties, in: Proceedings of the Twenty-Seventh International Conference on Machine Learning, 2010, pp. 383-390. [10] T. Graepel, R. Herbrich, B. Schölkopf, A. Smola, P. Bartlett, K.R. Müller, K. Obermayer, R. Williamson, Classification on proximity data with LP-machines, in: International Conference of Artificial Neural Networks, 1999, pp. 304-309. [11] Hastad, J., On the size of weights for threshold gates, SIAM J. discrete math., 7, 484-492, (1994) · Zbl 0811.68100 [12] Höffgen, K.U.; Simon, H.; Horn, K.V., Robust trainability of single neurons, J. comput. system sci., 50, 114-125, (1995) · Zbl 0939.68771 [13] Muroga, S.; Toda, I.; Takasu, S., Theory of majority decision elements, J. franklin inst., 271, 376-418, (1961) · Zbl 0196.51705 [14] Weston, J.; Elisseeff, A.; Schölkopf, B.; Tipping, M., Use of the zero norm with linear models and kernel methods, J. Mach. learn. res., 3, 1439-1461, (2003) · Zbl 1102.68605