×

Shifting: one-inclusion mistake bounds and sample compression. (English) Zbl 1158.68452

Summary: We present new expected risk bounds for binary and multiclass prediction, and resolve several recent conjectures on sample compressibility due to Kuzmin and Warmuth. By exploiting the combinatorial structure of concept class \(\mathcal F\), Haussler et al. achieved a VC\((\mathcal F)/n\) bound for the natural one-inclusion prediction strategy. The key step in their proof is a \(d=VC(\mathcal F)/n\) bound on the graph density of a subgraph of the hypercube-one-inclusion graph. The first main result of this paper is a density bound of \(n(\binom{n-1}{\leqslant d-1})/(\binom{n}{\leqslant d})<d\), which positively resolves a conjecture of Kuzmin and Warmuth relating to their unlabeled Peeling compression scheme and also leads to an improved one-inclusion mistake bound. The proof uses a new form of VC-invariant shifting and a group-theoretic symmetrization. Our second main result is an algebraic topological property of maximum classes of VC-dimension \(d\) as being \(d\)-contractible simplicial complexes, extending the well-known characterization that \(d=1\) maximum classes are trees.
We negatively resolve a minimum degree conjecture of Kuzmin and Warmuth-the second part to a conjectured proof of correctness for Peeling-that every class has one-inclusion minimum degree at most its VC-dimension. Our final main result is a \(k\)-class analogue of the \(d/n\) mistake bound, replacing the VC-dimension by the Pollard pseudo-dimension and the one-inclusion strategy by its natural hypergraph generalization. This result improves on known PAC-based expected risk bounds by a factor of \(O(\log n)\) and is shown to be optimal up to an \(O(\log k)\) factor. The combinatorial technique of shifting takes a central role in understanding the one-inclusion (hyper)graph and is a running theme throughout.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
PDF BibTeX XML Cite
Full Text: DOI Link

References:

[1] N. Alon, D. Haussler, E. Welzl, Partitioning and geometric embedding of range spaces of finite Vapnik-Chervonenkis dimension, in: Proceedings of the 3rd Annual ACM Symposium on Computational Geometry, 1987, pp. 331-340
[2] Ben-David, S.; Cesa-Bianchi, N.; Haussler, D.; Long, P.M., Characterizations of learnability for classes of \(\{0, \ldots, n \}\)-valued functions, J. comput. system sci., 50, 1, 74-86, (1995) · Zbl 0827.68095
[3] Ben-David, S.; Litman, A., Combinatorial variability of vapnik – chervonenkis classes with applications to sample compression schemes, Discrete appl. math., 86, 1, 3-25, (1998) · Zbl 0905.68048
[4] Blumer, A.; Ehrenfeucht, A.; Haussler, D.; Warmuth, M.K., Learnability and the vapnik – chervonenkis dimension, J. assoc. comput. Mach., 36, 4, 929-965, (1989) · Zbl 0697.68079
[5] Dudley, R.M., The structure of some vapnik – chervonenkis classes, (), 495-507 · Zbl 1372.28001
[6] Ehrenfeucht, A.; Haussler, D.; Kearns, M.; Valiant, L., A general lower bound on the number of examples needed for learning, Inform. and comput., 82, 3, 247-261, (1989) · Zbl 0679.68158
[7] Firsov, F., On isometric embeddings of a graph into a Boolean cube, Cybernetics, 1, 112-113, (1965)
[8] S. Floyd, On space-bounded learning and the Vapnik-Chervonenkis dimension, PhD thesis, Technical report TR-89-061, International Computer Science Institute, Berkeley, CA, 1989 · Zbl 0747.68041
[9] Floyd, S.; Warmuth, M.K., Sample compression, learnability, and the vapnik – chervonenkis dimension, Machine learning, 21, 3, 269-304, (1995)
[10] Gärtner, B.; Welzl, E., Vapnik – chervonenkis dimension and (pseudo-)hyperplane arrangements, Discrete comput. geom., 12, 399-432, (1994) · Zbl 0813.52013
[11] Hall, P., On representatives of subsets, J. London math. soc., 10, 26-30, (1935) · Zbl 0010.34503
[12] Haussler, D., Sphere packing numbers for subsets of the Boolean n-cube with bounded vapnik – chervonenkis dimension, J. combin. theory ser. A, 69, 2, 217-232, (1995) · Zbl 0818.60005
[13] Haussler, D.; Littlestone, N.; Warmuth, M.K., Predicting \(\{0, 1 \}\) functions on randomly drawn points, Inform. and comput., 115, 2, 284-293, (1994) · Zbl 0938.68785
[14] Havel, I.; Morávek, J., B-valuations of graphs, Czechoslovak math. J., 22, 338-351, (1972) · Zbl 0247.05148
[15] Hlavička, J., Race-free assignment in asynchronous switching circuits, Inform. process. Mach., 13, (1967)
[16] Kuzmin, D.; Warmuth, M.K., Unlabeled compression schemes for maximum classes, (), 591-605 · Zbl 1137.68547
[17] Kuzmin, D.; Warmuth, M.K., Unlabeled compression schemes for maximum classes, J. Mach. learn. res., 8, Sep, 2047-2081, (2007) · Zbl 1222.68239
[18] Li, Y.; Long, P.M.; Srinivasan, A., The one-inclusion graph algorithm is near optimal for the prediction model of learning, IEEE trans. inform. theory, 47, 3, 1257-1261, (2002) · Zbl 0998.68095
[19] Littlestone, N.; Warmuth, M.K., Relating data compression and learnability, (1986), unpublished manuscript
[20] Livingston, M.; Stout, Q.F., Embeddings in hypercubes, Math. comput. model., 11, 222-227, (1988)
[21] Rubinstein, B.I.P.; Bartlett, P.L.; Rubinstein, J.H., Shifting, one-inclusion mistake bounds and tight multiclass expected risk bounds, (), 1193-1200
[22] B.I.P. Rubinstein, J.H. Rubinstein, Geometric & topological representations of maximum classes with applications to sample compression, in: Proceedings of the 21st Annual Conference on Learning Theory, 2008, pp. 299-310
[23] Sauer, N., On the density of families of sets, J. combin. theory ser. A, 13, 145-147, (1972) · Zbl 0248.05005
[24] Warmuth, M., Sample compression, learnability, and the vapnik – chervonenkis dimension, () · Zbl 0697.68079
[25] M.K. Warmuth, Compressing to VC dimension many points, in: Proceedings of the 16th Annual Conference on Learning Theory, COLT03, open problem paper, 2003
[26] E. Welzl, Complete range spaces, unpublished notes, 1987
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.