zbMATH — the first resource for mathematics

Kernel methods in machine learning. (English) Zbl 1151.30007
Authors’ abstract: We review machine learning methods employing positive definite kernels. These methods formulate learning and estimation problems in a reproducing kernel Hilbert space (RKHS) of functions defined on the data domain, expanded in terms of a kernel. Working in linear spaces of function has the benefit of facilitating the construction and analysis of learning algorithms while at the same time allowing large classes of functions. The latter include nonlinear functions as well as functions defined on nonvectorial data.
We cover a wide range of methods, ranging from binary classifiers to sophisticated methods for estimation with structured data.

30C40 Kernel functions in one complex variable and applications
68T05 Learning and adaptive systems in artificial intelligence
Full Text: DOI arXiv
[1] Aizerman, M. A., Braverman, É. M. and Rozonoér, L. I. (1964). Theoretical foundations of the potential function method in pattern recognition learning. Autom. Remote Control 25 821-837. · Zbl 0151.24701
[2] Allwein, E. L., Schapire, R. E. and Singer, Y. (2000). Reducing multiclass to binary: A unifying approach for margin classifiers. In Proc. 17th International Conf. Machine Learning (P. Langley, ed.) 9-16. Morgan Kaufmann, San Francisco, CA. · Zbl 1013.68175
[3] Alon, N., Ben-David, S., Cesa-Bianchi, N. and Haussler, D. (1993). Scale-sensitive dimensions, uniform convergence, and learnability. In Proc. of the 34rd Annual Symposium on Foundations of Computer Science 292-301. IEEE Computer Society Press, Los Alamitos, CA. · Zbl 0891.68086
[4] Altun, Y., Hofmann, T. and Smola, A. J. (2004). Gaussian process classification for segmenting and annotating sequences. In Proc. International Conf. Machine Learning 25-32. ACM Press, New York.
[5] Altun, Y., Smola, A. J. and Hofmann, T. (2004). Exponential families for conditional random fields. In Uncertainty in Artificial Intelligence ( UAI ) 2-9. AUAI Press, Arlington, VA.
[6] Altun, Y., Tsochantaridis, I. and Hofmann, T. (2003). Hidden Markov support vector machines. In Proc. Intl. Conf. Machine Learning 3-10. AAAI Press, Menlo Park, CA.
[7] Aronszajn, N. (1950). Theory of reproducing kernels. Trans. Amer. Math. Soc. 68 337-404. JSTOR: · Zbl 0037.20701
[8] Bach, F. R. and Jordan, M. I. (2002). Kernel independent component analysis. J. Mach. Learn. Res. 3 1-48. · Zbl 1088.68689
[9] Bakir, G., Hofmann, T., Schölkopf, B., Smola, A., Taskar, B. and Vishwanathan, S. V. N. (2007). Predicting Structured Data . MIT Press, Cambridge, MA.
[10] Bamber, D. (1975). The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. J. Math. Psych. 12 387-415. · Zbl 0327.92017
[11] Barndorff-Nielsen, O. E. (1978). Information and Exponential Families in Statistical Theory . Wiley, New York. · Zbl 0387.62011
[12] Bartlett, P. L. and Mendelson, S. (2002). Rademacher and gaussian complexities: Risk bounds and structural results. J. Mach. Learn. Res. 3 463-482. · Zbl 1084.68549
[13] Basilico, J. and Hofmann, T. (2004). Unifying collaborative and content-based filtering. In Proc. Intl. Conf. Machine Learning 65-72. ACM Press, New York.
[14] Baum, L. E. (1972). An inequality and associated maximization technique in statistical estimation of probabilistic functions of a Markov process. Inequalities 3 1-8.
[15] Ben-David, S., Eiron, N. and Long, P. (2003). On the difficulty of approximately maximizing agreements. J. Comput. System Sci. 66 496-514. · Zbl 1053.68054
[16] Bennett, K. P., Demiriz, A. and Shawe-Taylor, J. (2000). A column generation algorithm for boosting. In Proc. 17th International Conf. Machine Learning (P. Langley, ed.) 65-72. Morgan Kaufmann, San Francisco, CA.
[17] Bennett, K. P. and Mangasarian, O. L. (1992). Robust linear programming discrimination of two linearly inseparable sets. Optim. Methods Softw. 1 23-34.
[18] Berg, C., Christensen, J. P. R. and Ressel, P. (1984). Harmonic Analysis on Semigroups . Springer, New York. · Zbl 0619.43001
[19] Bertsimas, D. and Tsitsiklis, J. (1997). Introduction to Linear Programming . Athena Scientific, Nashua, NH.
[20] Bloomfield, P. and Steiger, W. (1983). Least Absolute Deviations : Theory , Applications and Algorithms . Birkhäuser, Boston. · Zbl 0536.62049
[21] Bochner, S. (1933). Monotone Funktionen, Stieltjessche Integrale und harmonische Analyse. Math. Ann. 108 378-410. · Zbl 0007.10803
[22] Borgwardt, K. M., Gretton, A., Rasch, M. J., Kriegel, H.-P., Schölkopf, B. and Smola, A. J. (2006). Integrating structured biological data by kernel maximum mean discrepancy. Bioinformatics ( ISMB ) 22 e49-e57.
[23] Boser, B., Guyon, I. and Vapnik, V. (1992). A training algorithm for optimal margin classifiers. In Proc. Annual Conf. Computational Learning Theory (D. Haussler, ed.) 144-152. ACM Press, Pittsburgh, PA.
[24] Bousquet, O., Boucheron, S. and Lugosi, G. (2005). Theory of classification: A survey of recent advances. ESAIM Probab. Statist. 9 323-375. · Zbl 1136.62355
[25] Burges, C. J. C. (1998). A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov. 2 121-167.
[26] Cardoso, J.-F. (1998). Blind signal separation: Statistical principles. Proceedings of the IEEE 90 2009-2026.
[27] Chapelle, O. and Harchaoui, Z. (2005). A machine learning approach to conjoint analysis. In Advances in Neural Information Processing Systems 17 (L. K. Saul, Y. Weiss and L. Bottou, eds.) 257-264. MIT Press, Cambridge, MA.
[28] Chen, A. and Bickel, P. (2005). Consistent independent component analysis and prewhitening. IEEE Trans. Signal Process. 53 3625-3632. · Zbl 1373.62292
[29] Chen, S., Donoho, D. and Saunders, M. (1999). Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20 33-61. · Zbl 0919.94002
[30] Collins, M. (2000). Discriminative reranking for natural language parsing. In Proc. 17th International Conf. Machine Learning (P. Langley, ed.) 175-182. Morgan Kaufmann, San Francisco, CA.
[31] Collins, M. and Duffy, N. (2001). Convolution kernels for natural language. In Advances in Neural Information Processing Systems 14 (T. G. Dietterich, S. Becker and Z. Ghahramani, eds.) 625-632. MIT Press, Cambridge, MA.
[32] Cook, D., Buja, A. and Cabrera, J. (1993). Projection pursuit indices based on orthonormal function expansions. J. Comput. Graph. Statist. 2 225-250. JSTOR: · Zbl 04516293
[33] Cortes, C., Mohri, M. and Weston, J. (2005). A general regression technique for learning transductions. In ICML’05 : Proceedings of the 22nd International Conference on Machine Learning 153-160. ACM Press, New York.
[34] Cortes, C. and Vapnik, V. (1995). Support vector networks. Machine Learning 20 273-297. · Zbl 0831.68098
[35] Crammer, K. and Singer, Y. (2001). On the algorithmic implementation of multiclass kernel-based vector machines. J. Mach. Learn. Res. 2 265-292. · Zbl 1037.68110
[36] Crammer, K. and Singer, Y. (2005). Loss bounds for online category ranking. In Proc. Annual Conf. Computational Learning Theory (P. Auer and R. Meir, eds.) 48-62. Springer, Berlin. · Zbl 1137.68528
[37] Cristianini, N. and Shawe-Taylor, J. (2000). An Introduction to Support Vector Machines . Cambridge Univ. Press. · Zbl 0994.68074
[38] Cristianini, N., Shawe-Taylor, J., Elisseeff, A. and Kandola, J. (2002). On kernel-target alignment. In Advances in Neural Information Processing Systems 14 (T. G. Dietterich, S. Becker and Z. Ghahramani, eds.) 367-373. MIT Press, Cambridge, MA.
[39] Culotta, A., Kulp, D. and McCallum, A. (2005). Gene prediction with conditional random fields. Technical Report UM-CS-2005-028, Univ. Massachusetts, Amherst.
[40] Darroch, J. N. and Ratcliff, D. (1972). Generalized iterative scaling for log-linear models. Ann. Math. Statist. 43 1470-1480. · Zbl 0251.62020
[41] Das, D. and Sen, P. (1994). Restricted canonical correlations. Linear Algebra Appl. 210 29-47. · Zbl 0804.62054
[42] Dauxois, J. and Nkiet, G. M. (1998). Nonlinear canonical analysis and independence tests. Ann. Statist. 26 1254-1278. · Zbl 0934.62061
[43] Dawid, A. P. (1992). Applications of a general propagation algorithm for probabilistic expert systems. Stat. Comput. 2 25-36.
[44] DeCoste, D. and Schölkopf, B. (2002). Training invariant support vector machines. Machine Learning 46 161-190. · Zbl 0998.68102
[45] Dekel, O., Manning, C. and Singer, Y. (2004). Log-linear models for label ranking. In Advances in Neural Information Processing Systems 16 (S. Thrun, L. Saul and B. Schölkopf, eds.) 497-504. MIT Press, Cambridge, MA.
[46] Della Pietra, S., Della Pietra, V. and Lafferty, J. (1997). Inducing features of random fields. IEEE Trans. Pattern Anal. Machine Intelligence 19 380-393.
[47] Einmal, J. H. J. and Mason, D. M. (1992). Generalized quantile processes. Ann. Statist. 20 1062-1078. · Zbl 0757.60012
[48] Elisseeff, A. and Weston, J. (2001). A kernel method for multi-labeled classification. In Advances in Neural Information Processing Systems 14 681-687. MIT Press, Cambridge, MA.
[49] Fiedler, M. (1973). Algebraic connectivity of graphs. Czechoslovak Math. J. 23 298-305. · Zbl 0265.05119
[50] FitzGerald, C. H., Micchelli, C. A. and Pinkus, A. (1995). Functions that preserve families of positive semidefinite matrices. Linear Algebra Appl. 221 83-102. · Zbl 0852.43004
[51] Fletcher, R. (1989). Practical Methods of Optimization . Wiley, New York. · Zbl 0905.65002
[52] Fortet, R. and Mourier, E. (1953). Convergence de la réparation empirique vers la réparation théorique. Ann. Scient. École Norm. Sup. 70 266-285. · Zbl 0053.09601
[53] Freund, Y. and Schapire, R. E. (1996). Experiments with a new boosting algorithm. In Proceedings of the International Conference on Machine Learing 148-146. Morgan Kaufmann, San Francisco, CA.
[54] Friedman, J. H. (1987). Exploratory projection pursuit. J. Amer. Statist. Assoc. 82 249-266. JSTOR: · Zbl 0664.62060
[55] Friedman, J. H. and Tukey, J. W. (1974). A projection pursuit algorithm for exploratory data analysis. IEEE Trans. Comput. C-23 881-890. · Zbl 0284.68079
[56] Gärtner, T. (2003). A survey of kernels for structured data. SIGKDD Explorations 5 49-58.
[57] Green, P. and Yandell, B. (1985). Semi-parametric generalized linear models. Proceedings 2nd International GLIM Conference. Lecture Notes in Statist. 32 44-55. Springer, New York.
[58] Gretton, A., Bousquet, O., Smola, A. and Schölkopf, B. (2005). Measuring statistical dependence with Hilbert-Schmidt norms. In Proceedings Algorithmic Learning Theory (S. Jain, H. U. Simon and E. Tomita, eds.) 63-77. Springer, Berlin. · Zbl 1168.62354
[59] Gretton, A., Smola, A., Bousquet, O., Herbrich, R., Belitski, A., Augath, M., Murayama, Y., Pauls, J., Schölkopf, B. and Logothetis, N. (2005). Kernel constrained covariance for dependence measurement. In Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics (R. G. Cowell and Z. Ghahramani, eds.) 112-119. Society for Artificial Intelligence and Statistics, New Jersey.
[60] Ham, J., Lee, D., Mika, S. and Schölkopf, B. (2004). A kernel view of the dimensionality reduction of manifolds. In Proceedings of the Twenty-First International Conference on Machine Learning 369-376. ACM Press, New York.
[61] Hammersley, J. M. and Clifford, P. E. (1971). Markov fields on finite graphs and lattices. Unpublished manuscript.
[62] Haussler, D. (1999). Convolutional kernels on discrete structures. Technical Report UCSC-CRL-99-10, Computer Science Dept., UC Santa Cruz.
[63] Hein, M., Bousquet, O. and Schölkopf, B. (2005). Maximal margin classification for metric spaces. J. Comput. System Sci. 71 333-359. · Zbl 1094.68084
[64] Herbrich, R. (2002). Learning Kernel Classifiers : Theory and Algorithms . MIT Press, Cambridge, MA. · Zbl 1063.62092
[65] Herbrich, R., Graepel, T. and Obermayer, K. (2000). Large margin rank boundaries for ordinal regression. In Advances in Large Margin Classifiers (A. J. Smola, P. L. Bartlett, B. Schölkopf and D. Schuurmans, eds.) 115-132. MIT Press, Cambridge, MA.
[66] Hettich, R. and Kortanek, K. O. (1993). Semi-infinite programming: Theory, methods, and applications. SIAM Rev. 35 380-429. JSTOR: · Zbl 0784.90090
[67] Hilbert, D. (1904). Grundzüge einer allgemeinen Theorie der linearen Integralgleichungen. Nachr. Akad. Wiss. Göttingen Math.-Phys. Kl. II 49-91. · JFM 35.0378.02
[68] Hoerl, A. E. and Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12 55-67. · Zbl 0202.17205
[69] Hofmann, T., Schölkopf, B. and Smola, A. J. (2006). A review of kernel methods in machine learning. Technical Report 156, Max-Planck-Institut für biologische Kybernetik. · Zbl 1151.30007
[70] Hotelling, H. (1936). Relations between two sets of variates. Biometrika 28 321-377. · Zbl 0015.40705
[71] Huber, P. J. (1981). Robust Statistics . Wiley, New York. · Zbl 0536.62025
[72] Huber, P. J. (1985). Projection pursuit. Ann. Statist. 13 435-475. · Zbl 0595.62059
[73] Hyvärinen, A., Karhunen, J. and Oja, E. (2001). Independent Component Analysis . Wiley, New York.
[74] Jaakkola, T. S. and Haussler, D. (1999). Probabilistic kernel regression models. In Proceedings of the 7th International Workshop on AI and Statistics . Morgan Kaufmann, San Francisco, CA.
[75] Jebara, T. and Kondor, I. (2003). Bhattacharyya and expected likelihood kernels. Proceedings of the Sixteenth Annual Conference on Computational Learning Theory (B. Schölkopf and M. Warmuth, eds.) 57-71. Lecture Notes in Comput. Sci. 2777 . Springer, Heidelberg. · Zbl 1274.68321
[76] Jensen, F. V., Lauritzen, S. L. and Olesen, K. G. (1990). Bayesian updates in causal probabilistic networks by local computation. Comput. Statist. Quaterly 4 269-282. · Zbl 0715.68076
[77] Joachims, T. (2002). Learning to Classify Text Using Support Vector Machines : Methods , Theory , and Algorithms . Kluwer Academic, Boston.
[78] Joachims, T. (2005). A support vector method for multivariate performance measures. In Proc. Intl. Conf. Machine Learning 377-384. Morgan Kaufmann, San Francisco, CA.
[79] Jones, M. C. and Sibson, R. (1987). What is projection pursuit? J. Roy. Statist. Soc. Ser. A 150 1-36. JSTOR: · Zbl 0632.62059
[80] Jordan, M. I., Bartlett, P. L. and McAuliffe, J. D. (2003). Convexity, classification, and risk bounds. Technical Report 638, Univ. California, Berkeley. · Zbl 1118.62330
[81] Karush, W. (1939). Minima of functions of several variables with inequalities as side constraints. Master’s thesis, Dept. Mathematics, Univ. Chicago.
[82] Kashima, H., Tsuda, K. and Inokuchi, A. (2003). Marginalized kernels between labeled graphs. In Proc. Intl. Conf. Machine Learning 321-328. Morgan Kaufmann, San Francisco, CA.
[83] Kettenring, J. R. (1971). Canonical analysis of several sets of variables. Biometrika 58 433-451. JSTOR: · Zbl 0225.62072
[84] Kim, K., Franz, M. O. and Schölkopf, B. (2005). Iterative kernel principal component analysis for image modeling. IEEE Trans. Pattern Analysis and Machine Intelligence 27 1351-1366.
[85] Kimeldorf, G. S. and Wahba, G. (1971). Some results on Tchebycheffian spline functions. J. Math. Anal. Appl. 33 82-95. · Zbl 0201.39702
[86] Koltchinskii, V. (2001). Rademacher penalties and structural risk minimization. IEEE Trans. Inform. Theory 47 1902-1914. · Zbl 1008.62614
[87] Kondor, I. R. and Lafferty, J. D. (2002). Diffusion kernels on graphs and other discrete structures. In Proc. International Conf. Machine Learning 315-322. Morgan Kaufmann, San Francisco, CA.
[88] Kuhn, H. W. and Tucker, A. W. (1951). Nonlinear programming. Proc. 2nd Berkeley Symposium on Mathematical Statistics and Probabilistics 481-492. Univ. California Press, Berkeley. · Zbl 0044.05903
[89] Lafferty, J., Zhu, X. and Liu, Y. (2004). Kernel conditional random fields: Representation and clique selection. In Proc. International Conf. Machine Learning 21 64. Morgan Kaufmann, San Francisco, CA.
[90] Lafferty, J. D., McCallum, A. and Pereira, F. (2001). Conditional random fields: Probabilistic modeling for segmenting and labeling sequence data. In Proc. International Conf. Machine Learning 18 282-289. Morgan Kaufmann, San Francisco, CA.
[91] Lee, T.-W., Girolami, M., Bell, A. and Sejnowski, T. (2000). A unifying framework for independent component analysis. Comput. Math. Appl. 39 1-21. · Zbl 1054.94004
[92] Leslie, C., Eskin, E. and Noble, W. S. (2002). The spectrum kernel: A string kernel for SVM protein classification. In Proceedings of the Pacific Symposium on Biocomputing 564-575. World Scientific Publishing, Singapore.
[93] Loève, M. (1978). Probability Theory II , 4th ed. Springer, New York. · Zbl 0385.60001
[94] Magerman, D. M. (1996). Learning grammatical structure using statistical decision-trees. Proceedings ICGI. Lecture Notes in Artificial Intelligence 1147 1-21. Springer, Berlin.
[95] Mangasarian, O. L. (1965). Linear and nonlinear separation of patterns by linear programming. Oper. Res. 13 444-452. JSTOR: · Zbl 0127.36701
[96] McCallum, A., Bellare, K. and Pereira, F. (2005). A conditional random field for discriminatively-trained finite-state string edit distance. In Conference on Uncertainty in AI ( UAI ) 388 . AUAI Press, Arlington, VA.
[97] McCullagh, P. and Nelder, J. A. (1983). Generalized Linear Models . Chapman and Hall, London. · Zbl 0588.62104
[98] Mendelson, S. (2003). A few notes on statistical learning theory. Advanced Lectures on Machine Learning (S. Mendelson and A. J. Smola, eds.). Lecture Notes in Artificial Intelligence 2600 1-40. Springer, Heidelberg. · Zbl 1019.68093
[99] Mercer, J. (1909). Functions of positive and negative type and their connection with the theory of integral equations. Philos. Trans. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. A 209 415-446. · JFM 40.0408.02
[100] Mika, S., Rätsch, G., Weston, J., Schölkopf, B., Smola, A. J. and Müller, K.-R. (2003). Learning discriminative and invariant nonlinear features. IEEE Trans. Pattern Analysis and Machine Intelligence 25 623-628.
[101] Minsky, M. and Papert, S. (1969). Perceptrons : An Introduction to Computational Geometry . MIT Press, Cambridge, MA. · Zbl 0197.43702
[102] Morozov, V. A. (1984). Methods for Solving Incorrectly Posed Problems . Springer, New York. · Zbl 0549.65031
[103] Murray, M. K. and Rice, J. W. (1993). Differential Geometry and Statistics . Chapman and Hall, London. · Zbl 0804.53001
[104] Oliver, N., Schölkopf, B. and Smola, A. J. (2000). Natural regularization in SVMs. In Advances in Large Margin Classifiers (A. J. Smola, P. L. Bartlett, B. Schölkopf and D. Schuurmans, eds.) 51-60. MIT Press, Cambridge, MA.
[105] O’Sullivan, F., Yandell, B. and Raynor, W. (1986). Automatic smoothing of regression functions in generalized linear models. J. Amer. Statist. Assoc. 81 96-103. JSTOR:
[106] Parzen, E. (1970). Statistical inference on time series by RKHS methods. In Proceedings 12th Biennial Seminar (R. Pyke, ed.) 1-37. Canadian Mathematical Congress, Montreal. · Zbl 0253.60053
[107] Platt, J. (1999). Fast training of support vector machines using sequential minimal optimization. In Advances in Kernel Methods-Support Vector Learning (B. Schölkopf, C. J. C. Burges and A. J. Smola, eds.) 185-208. MIT Press, Cambridge, MA.
[108] Poggio, T. (1975). On optimal nonlinear associative recall. Biological Cybernetics 19 201-209. · Zbl 0321.94001
[109] Poggio, T. and Girosi, F. (1990). Networks for approximation and learning. Proceedings of the IEEE 78 1481-1497. · Zbl 1226.92005
[110] Press, W. H., Teukolsky, S. A., Vetterling, W. T. and Flannery, B. P. (1994). Numerical Recipes in C. The Art of Scientific Computation . Cambridge Univ. Press. · Zbl 1078.65500
[111] Rasmussen, C. E. and Williams, C. K. I. (2006). Gaussian Processes for Machine Learning . MIT Press, Cambridge, MA. · Zbl 1177.68165
[112] Rätsch, G., Sonnenburg, S., Srinivasan, J., Witte, H., Müller, K.-R., Sommer, R. J. and Schölkopf, B. (2007). Improving the Caenorhabditis elegans genome annotation using machine learning. PLoS Computational Biology 3 e20 doi:10.1371/journal.pcbi.0030020.
[113] Rényi, A. (1959). On measures of dependence. Acta Math. Acad. Sci. Hungar. 10 441-451. · Zbl 0091.14403
[114] Rockafellar, R. T. (1970). Convex Analysis . Princeton Univ. Press. · Zbl 0193.18401
[115] Schoenberg, I. J. (1938). Metric spaces and completely monotone functions. Ann. Math. 39 811-841. · Zbl 0019.41503
[116] Schölkopf, B. (1997). Support Vector Learning . R. Oldenbourg Verlag, Munich. Available at http://www.kernel-machines.org. · Zbl 0915.68137
[117] Schölkopf, B., Platt, J., Shawe-Taylor, J., Smola, A. J. and Williamson, R. C. (2001). Estimating the support of a high-dimensional distribution. Neural Comput. 13 1443-1471. · Zbl 1009.62029
[118] Schölkopf, B. and Smola, A. (2002). Learning with Kernels . MIT Press, Cambridge, MA. · Zbl 1019.68094
[119] Schölkopf, B., Smola, A. J. and Müller, K.-R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 10 1299-1319.
[120] Schölkopf, B., Smola, A. J., Williamson, R. C. and Bartlett, P. L. (2000). New support vector algorithms. Neural Comput. 12 1207-1245.
[121] Schölkopf, B., Tsuda, K. and Vert, J.-P. (2004). Kernel Methods in Computational Biology . MIT Press, Cambridge, MA.
[122] Sha, F. and Pereira, F. (2003). Shallow parsing with conditional random fields. In Proceedings of HLT-NAACL 213-220. Association for Computational Linguistics, Edmonton, Canada.
[123] Shawe-Taylor, J. and Cristianini, N. (2004). Kernel Methods for Pattern Analysis . Cambridge Univ. Press. · Zbl 0994.68074
[124] Smola, A. J., Bartlett, P. L., Schölkopf, B. and Schuurmans, D. (2000). Advances in Large Margin Classifiers . MIT Press, Cambridge, MA. · Zbl 0988.68145
[125] Smola, A. J. and Kondor, I. R. (2003). Kernels and regularization on graphs. Proc. Annual Conf. Computational Learning Theory (B. Schölkopf and M. K. Warmuth, eds.). Lecture Notes in Comput. Sci. 2726 144-158. Springer, Heidelberg. · Zbl 1274.68351
[126] Smola, A. J. and Schölkopf, B. (1998). On a kernel-based method for pattern recognition, regression, approximation and operator inversion. Algorithmica 22 211-231. · Zbl 0910.68189
[127] Smola, A. J., Schölkopf, B. and Müller, K.-R. (1998). The connection between regularization operators and support vector kernels. Neural Networks 11 637-649.
[128] Steinwart, I. (2002). On the influence of the kernel on the consistency of support vector machines. J. Mach. Learn. Res. 2 67-93. · Zbl 1009.68143
[129] Steinwart, I. (2002). Support vector machines are universally consistent. J. Complexity 18 768-791. · Zbl 1030.68074
[130] Stewart, J. (1976). Positive definite functions and generalizations, an historical survey. Rocky Mountain J. Math. 6 409-434. · Zbl 0337.42017
[131] Stitson, M., Gammerman, A., Vapnik, V., Vovk, V., Watkins, C. and Weston, J. (1999). Support vector regression with ANOVA decomposition kernels. In Advances in Kernel Methods-Support Vector Learning (B. Schölkopf, C. J. C. Burges and A. J. Smola, eds.) 285-292. MIT Press, Cambridge, MA.
[132] Taskar, B., Guestrin, C. and Koller, D. (2004). Max-margin Markov networks. In Advances in Neural Information Processing Systems 16 (S. Thrun, L. Saul and B. Schölkopf, eds.) 25-32. MIT Press, Cambridge, MA.
[133] Taskar, B., Klein, D., Collins, M., Koller, D. and Manning, C. (2004). Max-margin parsing. In Empirical Methods in Natural Language Processing 1-8. Association for Computational Linguistics, Barcelona, Spain.
[134] Tax, D. M. J. and Duin, R. P. W. (1999). Data domain description by support vectors. In Proceedings ESANN (M. Verleysen, ed.) 251-256. D Facto, Brussels.
[135] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 58 267-288. JSTOR: · Zbl 0850.62538
[136] Tikhonov, A. N. (1963). Solution of incorrectly formulated problems and the regularization method. Soviet Math. Dokl. 4 1035-1038. · Zbl 0141.11001
[137] Tsochantaridis, I., Joachims, T., Hofmann, T. and Altun, Y. (2005). Large margin methods for structured and interdependent output variables. J. Mach. Learn. Res. 6 1453-1484. · Zbl 1222.68321
[138] van Rijsbergen, C. (1979). Information Retrieval , 2nd ed. Butterworths, London. · Zbl 0227.68052
[139] Vapnik, V. (1982). Estimation of Dependences Based on Empirical Data . Springer, Berlin. · Zbl 0499.62005
[140] Vapnik, V. (1995). The Nature of Statistical Learning Theory . Springer, New York. · Zbl 0833.62008
[141] Vapnik, V. (1998). Statistical Learning Theory . Wiley, New York. · Zbl 0935.62007
[142] Vapnik, V. and Chervonenkis, A. (1971). On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab. Appl. 16 264-281. · Zbl 0247.60005
[143] Vapnik, V. and Chervonenkis, A. (1991). The necessary and sufficient conditions for consistency in the empirical risk minimization method. Pattern Recognition and Image Analysis 1 283-305.
[144] Vapnik, V., Golowich, S. and Smola, A. J. (1997). Support vector method for function approximation, regression estimation, and signal processing. In Advances in Neural Information Processing Systems 9 (M. C. Mozer, M. I. Jordan and T. Petsche, eds.) 281-287. MIT Press, Cambridge, MA.
[145] Vapnik, V. and Lerner, A. (1963). Pattern recognition using generalized portrait method. Autom. Remote Control 24 774-780.
[146] Vishwanathan, S. V. N. and Smola, A. J. (2004). Fast kernels for string and tree matching. In Kernel Methods in Computational Biology (B. Schölkopf, K. Tsuda and J. P. Vert, eds.) 113-130. MIT Press, Cambridge, MA.
[147] Vishwanathan, S. V. N., Smola, A. J. and Vidal, R. (2007). Binet-Cauchy kernels on dynamical systems and its application to the analysis of dynamic scenes. Internat. J. Computer Vision 73 95-119.
[148] Wahba, G. (1990). Spline Models for Observational Data . SIAM, Philadelphia. · Zbl 0813.62001
[149] Wahba, G., Wang, Y., Gu, C., Klein, R. and Klein, B. (1995). Smoothing spline ANOVA for exponential families, with application to the Wisconsin Epidemiological Study of Diabetic Retinopathy. Ann. Statist. 23 1865-1895. · Zbl 0854.62042
[150] Wainwright, M. J. and Jordan, M. I. (2003). Graphical models, exponential families, and variational inference. Technical Report 649, Dept. Statistics, Univ. California, Berkeley. · Zbl 1193.62107
[151] Watkins, C. (2000). Dynamic alignment kernels. In Advances in Large Margin Classifiers (A. J. Smola, P. L. Bartlett, B. Schölkopf and D. Schuurmans, eds.) 39-50. MIT Press, Cambridge, MA.
[152] Wendland, H. (2005). Scattered Data Approximation . Cambridge Univ. Press. · Zbl 1075.65021
[153] Weston, J., Chapelle, O., Elisseeff, A., Schölkopf, B. and Vapnik, V. (2003). Kernel dependency estimation. In Advances in Neural Information Processing Systems 15 (S. T. S. Becker and K. Obermayer, eds.) 873-880. MIT Press, Cambridge, MA.
[154] Whittaker, J. (1990). Graphical Models in Applied Multivariate Statistics . Wiley, New York. · Zbl 0732.62056
[155] Yang, H. H. and Amari, S.-I. (1997). Adaptive on-line learning algorithms for blind separation-maximum entropy and minimum mutual information. Neural Comput. 9 1457-1482.
[156] Zettlemoyer, L. S. and Collins, M. (2005). Learning to map sentences to logical form: Structured classification with probabilistic categorial grammars. In Uncertainty in Artificial Intelligence UAI 658-666. AUAI Press, Arlington, Virginia.
[157] Zien, A., Rätsch, G., Mika, S., Schölkopf, B., Lengauer, T. and Müller, K.-R. (2000). Engineering support vector machine kernels that recognize translation initiation sites. Bioinformatics 16 799-807.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.