×

Learning sets with separating kernels. (English) Zbl 1345.94022

Summary: We consider the problem of learning a set from random samples. We show how relevant geometric and topological properties of a set can be studied analytically using concepts from the theory of reproducing kernel Hilbert spaces. A new kind of reproducing kernel, that we call separating kernel, plays a crucial role in our study and is analyzed in detail. We prove a new analytic characterization of the support of a distribution, that naturally leads to a family of regularized learning algorithms which are provably universally consistent and stable with respect to random sampling. Numerical experiments show that the proposed approach is competitive, and often better, than other state of the art techniques.

MSC:

94A15 Information theory (general)

Software:

SVM
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Allard, W. K.; Chen, G.; Maggioni, M., Multiscale geometric methods for data sets II: Geometric multi-resolution analysis, Appl. Comput. Harmon. Anal., 32, 435-462 (2012) · Zbl 1242.42038
[2] Aronszajn, N., Theory of reproducing kernels, Trans. Amer. Math. Soc., 68, 337-404 (1950) · Zbl 0037.20701
[3] Bauer, F.; Pereverzev, S.; Rosasco, L., On regularization algorithms in learning theory, J. Complexity, 23, 52-72 (2007) · Zbl 1109.68088
[4] Belkin, M.; Niyogi, P., Laplacian eigenmaps for dimensionality reduction and data representation, Neural Comput., 15, 1373-1396 (2003) · Zbl 1085.68119
[5] Belkin, M.; Niyogi, P.; Sindhwani, V., Manifold regularization: a geometric framework for learning from labeled and unlabeled examples, J. Mach. Learn. Res., 7, 2399-2434 (2006) · Zbl 1222.68144
[6] Belkin, M.; Niyogi, P., Towards a theoretical foundation for Laplacian-based manifold methods, J. Comput. System Sci., 74, 1289-1308 (2008) · Zbl 1157.68056
[7] Berg, C.; Christensen, J.; Ressel, P., Harmonic Analysis on Semigroups (1984), Springer-Verlag: Springer-Verlag New York
[8] Biau, G.; Cadre, B.; Mason, D.; Pelletier, B., Asymptotic normality in density support estimation, Electron. J. Probab., 14, 2617-2635 (2009) · Zbl 1185.62071
[9] Birman, M.; Solomyak, M., Double operator integrals in a Hilbert space, Integral Equations Operator Theory, 47, 131-168 (2003) · Zbl 1054.47030
[10] Bousquet, O.; Boucheron, S.; Lugosi, G., Theory of classification: A survey of recent advances, ESAIM Probab. Stat., 9, 323-375 (2004) · Zbl 1136.62355
[11] Canu, S.; Grandvalet, Y.; Guigue, V.; Rakotomamonjy, A., SVM and Kernel Methods Matlab Toolbox. Perception Systèmes et Information (2005), INSA de Rouen: INSA de Rouen Rouen, France
[12] Caponnetto, A., Optimal rates for regularization operators in learning theory (2006), MIT (CSAIL): MIT (CSAIL) Cambridge, MA, Technical report
[13] Caponnetto, A.; De Vito, E., Optimal rates for the regularized least-squares algorithm, Found. Comput. Math., 7, 331-368 (2007) · Zbl 1129.68058
[14] Caponnetto, A.; Yao, Y., Cross-validation based adaptation for regularization operators in learning theory, Anal. Appl., 8, 161-183 (2010) · Zbl 1209.68405
[15] Carmeli, C.; De Vito, E.; Toigo, A., Vector valued reproducing kernel Hilbert spaces of integrable functions and Mercer theorem, Anal. Appl., 4, 377-408 (2006) · Zbl 1116.46019
[16] Carmeli, C.; De Vito, E.; Toigo, A.; Umanità, V., Vector valued reproducing kernel Hilbert spaces and universality, Anal. Appl., 8, 19-61 (2010) · Zbl 1195.46025
[17] Chandola, V.; Banerjee, A.; Kumar, V., Anomaly detection: A survey, ACM Comput. Surv., 41, 1-58 (2009)
[18] Chen, G.; Little, A.; Maggioni, M.; Rosasco, L., Some recent advances in multiscale geometric analysis of point clouds, (Wavelets and Multiscale Analysis (2011), Birkhäuser/Springer: Birkhäuser/Springer New York), 199-225 · Zbl 1336.42025
[19] Coifman, R.; Lafon, S., Geometric harmonics: a novel tool for multiscale out-of-sample extension of empirical functions, Appl. Comput. Harmon. Anal., 21, 31-52 (2006) · Zbl 1095.68095
[20] Coifman, R.; Lafon, S.; Lee, A.; Maggioni, M.; Nadler, B.; Warner, S.; Zucker, S., Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps, Proc. Natl. Acad. Sci. USA, 102, 7426-7431 (2005) · Zbl 1405.42043
[21] Coifman, R.; Lafon, S.; Lee, A.; Maggioni, M.; Nadler, B.; Warner, S.; Zucker, S., Geometric diffusions as a tool for harmonic analysis and structure definition of data: multiscale methods, Proc. Natl. Acad. Sci. USA, 102, 7432-7438 (2005) · Zbl 1405.42044
[22] Cucker, F.; Smale, S., On the mathematical foundations of learning, Bull. Aust. Math. Soc., 39, 1-49 (2002) · Zbl 0983.68162
[23] Cuevas, A.; Fraiman, R., A plug-in approach to support estimation, Ann. Statist., 25, 2300-2312 (1997) · Zbl 0897.62034
[24] Cuevas, A.; Fraiman, R., Set estimation, (New Perspectives in Stochastic Geometry (2010), Oxford Univ. Press: Oxford Univ. Press Oxford), 374-397 · Zbl 1192.62164
[25] Cuevas, A.; Rodríguez-Casal, A., Set estimation: an overview and some recent developments, (Recent Advances and Trends in Nonparametric Statistics (2003), Elsevier B.V.: Elsevier B.V. Amsterdam), 251-264
[26] De Mol, C.; De Vito, E.; Rosasco, L., Elastic-net regularization in learning theory, J. Complexity, 25, 201-230 (2009) · Zbl 1319.62087
[27] De Silva, V.; Tenenbaum, J. B.; Langford, J. C., A global geometric framework for nonlinear dimensionality reduction, Science, 290, 2319-2323 (2000)
[28] De Vito, E.; Rosasco, L.; Toigo, A., Spectral regularization for support estimation, (Adv. Neural Inf. Process. Syst., vol. 24 (2010), MIT Press: MIT Press Cambridge, MA), 487-495
[29] Devroye, L.; Györfi, L.; Lugosi, G., A Probabilistic Theory of Pattern Recognition (1996), Springer: Springer New York · Zbl 0853.68150
[30] Devroye, L.; Wise, G. L., Detection of abnormal behavior via nonparametric estimation of the support, SIAM J. Appl. Math., 38, 480-488 (1980) · Zbl 0479.62028
[31] Dieudonné, J., Treatise on Analysis, vol. II (1976), Academic Press: Academic Press New York, San Francisco and London
[32] Dudley, R. M., Real Analysis and Probability (2002), Cambridge University Press: Cambridge University Press Cambridge · Zbl 1023.60001
[33] Dümbgen, L.; Walther, G., Rates of convergence for random approximations of convex sets, Adv. in Appl. Probab., 28, 384-393 (1996) · Zbl 0861.60022
[34] Engl, H. W.; Hanke, M.; Neubauer, A., Regularization of inverse problems (1996), Kluwer Academic Publishers Group: Kluwer Academic Publishers Group Dordrecht · Zbl 0859.65054
[35] Folland, G., A Course in Abstract Harmonic Analysis (1995), CRC Press: CRC Press Boca Raton, FL · Zbl 0857.43001
[36] Györfi, L.; Kohler, M.; Krzyzak, A.; Walk, H., A Distribution-Free Theory of Nonparametric Regression (2002), Springer-Verlag: Springer-Verlag New York, Berlin, Paris · Zbl 1021.62024
[37] Halmos, P., A Hilbert Space Problem Book (1982), Springer-Verlag: Springer-Verlag New York · Zbl 0144.38704
[38] Hastie, T.; Tibshirani, R.; Friedman, J., The Elements of Statistical Learning (2001), Springer: Springer New York
[39] Hoffmann, H., Kernel PCA for novelty detection, Pattern Recognit., 40, 863-874 (2007) · Zbl 1118.68140
[40] Kazhdan, M.; Bolitho, M.; Hoppe, H., Poisson surface reconstruction, (Proceedings of the Fourth Eurographics Symposium on Geometry Processing (2006), Eurographics Association: Eurographics Association Aire-la-Ville), 61-70
[41] Kelley, J. L., General Topology, Grad. Texts in Math., vol. 27 (1975), Springer-Verlag: Springer-Verlag New York-Berlin, reprint of the 1955 edition [Van Nostrand, Toronto, Ont.] · Zbl 0306.54002
[42] Kolluri, R.; Shewchuk, J.; O’Brien, J., Spectral surface reconstruction from noisy point clouds, (Proceedings of the 2004 Eurographics/ACM SIGGRAPH Symposium on Geometry Processing (2004), ACM: ACM New York), 11-21
[43] Koltchinskii, V., Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems (2011), Springer: Springer Heidelberg · Zbl 1223.91002
[44] Korostelëv, A. P.; Tsybakov, A. B., Minimax Theory of Image Reconstruction (1993), Springer-Verlag: Springer-Verlag New York · Zbl 0833.62039
[45] Lang, S., Real and Functional Analysis (1993), Springer-Verlag: Springer-Verlag New York · Zbl 0831.46001
[46] Lerman, G.; Zhang, T., Probabilistic recovery of multiple subspaces in point clouds by geometric \(\ell_p\) minimization (2010)
[47] Levin, D., Mesh-independent surface interpolation, (Geometric Modeling for Scientific Visualization (2003), Springer-Verlag), 37-49
[48] Lo Gerfo, L.; Rosasco, L.; Odone, F.; De Vito, E.; Verri, A., Spectral algorithms for supervised learning, Neural Comput., 20, 1873-1897 (2008) · Zbl 1147.68643
[49] Maurer, A.; Pontil, M., \(K\)-dimensional coding schemes in Hilbert spaces, IEEE Trans. Inform. Theory, 56, 5839-5846 (2010) · Zbl 1366.94305
[50] Mercer, J., Functions of positive and negative type and their connection with the theory of integral equations, Philos. Trans. R. Soc. A, 209, 415-446 (1909) · JFM 40.0408.02
[51] Narayanan, H.; Mitter, S., Sample complexity of testing the manifold hypothesis, (Adv. Neural Inf. Process. Syst., vol. 24 (2010), MIT Press: MIT Press Cambridge, MA), 1786-1794
[52] Niyogi, P.; Smale, S.; Weinberger, S., A topological view of unsupervised learning from noisy data, SIAM J. Comput., 40, 646-663 (2011) · Zbl 1230.62085
[53] Pinelis, I., Optimum bounds for the distributions of martingales in Banach spaces, Ann. Probab., 22, 1679-1706 (1994) · Zbl 0836.60015
[54] Pinelis, I., Correction: optimum bounds for the distributions of martingales in Banach spaces, Ann. Probab., 27, 2119 (1999)
[55] Poggio, T.; Smale, S., The mathematics of learning: dealing with data, Notices Amer. Math. Soc., 50, 537-544 (2003) · Zbl 1083.68100
[56] Reitzner, M., Random polytopes and the Efron-Stein jackknife inequality, Ann. Probab., 31, 2136-2166 (2003) · Zbl 1058.60010
[57] Rifkin, R.; Lippert, R., Notes on regularized least squares (2007), Massachusetts Institute of Technology, MIT (CSAIL): Massachusetts Institute of Technology, MIT (CSAIL) Cambridge, MA, Technical report
[58] Rosasco, L.; Belkin, M.; De Vito, E., On learning with integral operators, J. Mach. Learn. Res., 11, 905-934 (2010) · Zbl 1242.62059
[59] Roweis, S. T.; Saul, L. K., Nonlinear dimensionality reduction by locally linear embedding, Science, 290, 2323-2326 (2000)
[60] Schölkopf, B.; Giesen, J.; Spalinger, S., Kernel methods for implicit surface modeling, (Advances in Neural Information Processing Systems, vol. 17 (2005), MIT Press: MIT Press Cambridge, MA), 1193-1200
[61] Schölkopf, B.; Platt, J.; Shawe-Taylor, J.; Smola, A.; Williamson, R., Estimating the support of a high-dimensional distribution, Neural Comput., 13, 1443-1471 (2001) · Zbl 1009.62029
[62] Schölkopf, B.; Smola, A.; Müller, K.-R., Nonlinear component analysis as a kernel eigenvalue problem, Neural Comput., 10, 1299-1319 (1998)
[63] Scott, C. D.; Nowak, R. D., Learning minimum volume sets, J. Mach. Learn. Res., 7, 665-704 (2006) · Zbl 1222.68300
[64] Simon, B., Trace Ideals and Their Applications (1979), Cambridge University Press: Cambridge University Press Cambridge · Zbl 0423.47001
[65] Smale, S.; Zhou, D. X., Geometry of probability spaces, Constr. Approx., 30, 311-323 (2009) · Zbl 1187.68270
[66] Stein, E. M.; Weiss, G., Introduction to Fourier Analysis on Euclidean Spaces (1971), Princeton University Press: Princeton University Press Princeton, NJ · Zbl 0232.42007
[67] Steinwart, I., On the influence of the kernel on the consistency of support vector machines, J. Mach. Learn. Res., 2, 67-93 (2002) · Zbl 1009.68143
[68] Steinwart, I.; Christmann, A., Support Vector Machines (2008), Springer: Springer New York · Zbl 1203.68171
[69] Steinwart, I.; Hush, D.; Scovel, C., A classification framework for anomaly detection, J. Mach. Learn. Res., 6, 211-232 (2005) · Zbl 1222.68309
[70] Steinwart, I.; Scovel, C., Mercer’s theorem on general domains: on the interaction between measures, kernels, and RKHSs, Constr. Approx., 35, 363-417 (2012) · Zbl 1252.46018
[71] Tikhonov, V. Y.; Arsenin, A. N., Solutions of Ill-Posed Problems (1977), Winston: Winston New York · Zbl 0354.65028
[72] Tsybakov, A. B., On nonparametric estimation of density level sets, Ann. Statist., 25, 948-969 (1997) · Zbl 0881.62039
[73] van der Vaart, A.; Wellner, J., Weak Convergence and Empirical Processes (1996), Springer-Verlag: Springer-Verlag New York · Zbl 0862.60002
[74] Vapnik, V. N., Statistical Learning Theory (1998), John Wiley & Sons Inc.: John Wiley & Sons Inc. New York · Zbl 0935.62007
[75] Vert, R.; Vert, J.-P., Consistency and convergence rates of one-class SVMs and related algorithms, J. Mach. Learn. Res., 7, 817-854 (2006) · Zbl 1222.68324
[76] Vidal, R.; Ma, Y.; Shankar, S., Generalized principal component analysis (GPCA), IEEE Trans. Pattern Anal., 27, 1945-1959 (2005)
[77] Zwald, L.; Blanchard, G., On the convergence of eigenspaces in kernel principal component analysis, (Adv. Neural Inf. Process. Syst., vol. 18 (2006), MIT Press: MIT Press Cambridge, MA), 1649-1656
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.