On the mathematical foundations of learning. (English) Zbl 0983.68162

Summary: A main theme of this report is the relationship of approximation to learning and the primary role of sampling (inductive inference). We try to emphasize relations of the theory of learning to the main stream of mathematics.


68T05 Learning and adaptive systems in artificial intelligence
68P30 Coding and information theory (compaction, compression, models of communication, encoding schemes, etc.) (aspects in computer science)
Full Text: DOI


[1] Lars V. Ahlfors, Complex analysis, 3rd ed., McGraw-Hill Book Co., New York, 1978. An introduction to the theory of analytic functions of one complex variable; International Series in Pure and Applied Mathematics. · Zbl 0395.30001
[2] N. Aronszajn, Theory of reproducing kernels, Transactions of the Amer. Math. Soc.68 (1950), 337-404. · Zbl 0037.20701
[3] Andrew R. Barron, Complexity regularization with application to artificial neural networks, Nonparametric functional estimation and related topics (Spetses, 1990) NATO Adv. Sci. Inst. Ser. C Math. Phys. Sci., vol. 335, Kluwer Acad. Publ., Dordrecht, 1991, pp. 561 – 576. · Zbl 0739.62001
[4] Jöran Bergh and Jörgen Löfström, Interpolation spaces. An introduction, Springer-Verlag, Berlin-New York, 1976. Grundlehren der Mathematischen Wissenschaften, No. 223. · Zbl 0344.46071
[5] M. Š. Birman and M. Z. Solomjak, Piecewise polynomial approximations of functions of classes \?_{\?}^{\?}, Mat. Sb. (N.S.) 73 (115) (1967), 331 – 355 (Russian). · Zbl 0173.16001
[6] Christopher M. Bishop, Neural networks for pattern recognition, The Clarendon Press, Oxford University Press, New York, 1995. With a foreword by Geoffrey Hinton. · Zbl 0868.68096
[7] Åke Björck, Numerical methods for least squares problems, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1996. · Zbl 0847.65023
[8] Lenore Blum, Felipe Cucker, Michael Shub, and Steve Smale, Complexity and real computation, Springer-Verlag, New York, 1998. With a foreword by Richard M. Karp. · Zbl 0872.68036
[9] Bernd Carl and Irmtraud Stephani, Entropy, compactness and the approximation of operators, Cambridge Tracts in Mathematics, vol. 98, Cambridge University Press, Cambridge, 1990. · Zbl 0705.47017
[10] Peter Craven and Grace Wahba, Smoothing noisy data with spline functions. Estimating the correct degree of smoothing by the method of generalized cross-validation, Numer. Math. 31 (1978/79), no. 4, 377 – 403. · Zbl 0377.65007 · doi:10.1007/BF01404567
[11] M. J. Donahue, L. Gurvits, C. Darken, and E. Sontag, Rates of convex approximation in non-Hilbert spaces, Constr. Approx. 13 (1997), no. 2, 187 – 220. · Zbl 0876.41016 · doi:10.1007/s003659900038
[12] Lokenath Debnath and Piotr Mikusiński, Introduction to Hilbert spaces with applications, 2nd ed., Academic Press, Inc., San Diego, CA, 1999. · Zbl 0940.46001
[13] J. P. Dedieu and M. Shub, Newton’s method for overdetermined systems of equations, Math. Comp. 69 (2000), no. 231, 1099 – 1115. · Zbl 0949.65061
[14] Ronald A. DeVore and George G. Lorentz, Constructive approximation, Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], vol. 303, Springer-Verlag, Berlin, 1993. · Zbl 0797.41016
[15] Jean Duchon, Splines minimizing rotation-invariant semi-norms in Sobolev spaces, Constructive theory of functions of several variables (Proc. Conf., Math. Res. Inst., Oberwolfach, 1976) Springer, Berlin, 1977, pp. 85 – 100. Lecture Notes in Math., Vol. 571.
[16] D. E. Edmunds and H. Triebel, Function spaces, entropy numbers, differential operators, Cambridge Tracts in Mathematics, vol. 120, Cambridge University Press, Cambridge, 1996. · Zbl 0865.46020
[17] Theodoros Evgeniou, Massimiliano Pontil, and Tomaso Poggio, Regularization networks and support vector machines, Adv. Comput. Math. 13 (2000), no. 1, 1 – 50. · Zbl 0939.68098 · doi:10.1023/A:1018946025316
[18] David Haussler, Decision-theoretic generalizations of the PAC model for neural net and other learning applications, Inform. and Comput. 100 (1992), no. 1, 78 – 150. · Zbl 0762.68050 · doi:10.1016/0890-5401(92)90010-D
[19] Harry Hochstadt, Integral equations, John Wiley & Sons, New York-London-Sydney, 1973. Pure and Applied Mathematics. · Zbl 0259.45001
[20] A. N. Kolmogorov and S. V. Fomīn, Introductory real analysis, Dover Publications, Inc., New York, 1975. Translated from the second Russian edition and edited by Richard A. Silverman; Corrected reprinting.
[21] A. N. Kolmogorov and V. M. Tihomirov, \?-entropy and \?-capacity of sets in function spaces, Uspehi Mat. Nauk 14 (1959), no. 2 (86), 3 – 86 (Russian). · Zbl 0090.33503
[22] Wee Sun Lee, Peter L. Bartlett, and Robert C. Williamson, The importance of convexity in learning with squared loss, IEEE Trans. Inform. Theory 44 (1998), no. 5, 1974 – 1980. · Zbl 0935.68091 · doi:10.1109/18.705577
[23] Peter Li and Shing-Tung Yau, On the parabolic kernel of the Schrödinger operator, Acta Math. 156 (1986), no. 3-4, 153 – 201. · Zbl 0611.58045 · doi:10.1007/BF02399203
[24] George G. Lorentz, Manfred v. Golitschek, and Yuly Makovoz, Constructive approximation, Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], vol. 304, Springer-Verlag, Berlin, 1996. Advanced problems. · Zbl 0910.41001
[25] W.S. McCulloch and W. Pitts, A logical calculus of the ideas immanent in nervous activity, Bulletin of Mathematical Biophysics 5 (1943), 115-133. · Zbl 0063.03860
[26] Jean Meinguet, Multivariate interpolation at arbitrary points made simple, Z. Angew. Math. Phys. 30 (1979), no. 2, 292 – 304 (English, with French summary). · Zbl 0428.41008 · doi:10.1007/BF01601941
[27] M.L. Minsky and S.A. Papert, Perceptrons, MIT Press, 1969. · Zbl 0197.43702
[28] P. Niyogi, The informational complexity of learning, Kluwer Academic Publishers, 1998. · Zbl 0976.68125
[29] A. Pietsch, Eigenvalues and \?-numbers, Mathematik und ihre Anwendungen in Physik und Technik [Mathematics and its Applications in Physics and Technology], vol. 43, Akademische Verlagsgesellschaft Geest & Portig K.-G., Leipzig, 1987. Albrecht Pietsch, Eigenvalues and \?-numbers, Cambridge Studies in Advanced Mathematics, vol. 13, Cambridge University Press, Cambridge, 1987.
[30] Allan Pinkus, \?-widths in approximation theory, Ergebnisse der Mathematik und ihrer Grenzgebiete (3) [Results in Mathematics and Related Areas (3)], vol. 7, Springer-Verlag, Berlin, 1985. · Zbl 0551.41001
[31] T. Poggio and C.R. Shelton, Machine learning, machine vision, and the brain, AI Magazine 20 (1999), 37-55.
[32] David Pollard, Convergence of stochastic processes, Springer Series in Statistics, Springer-Verlag, New York, 1984. · Zbl 0544.60045
[33] G.V. Rozenblum, M.A. Shubin, and M.Z. Solomyak, Partial differential equations vii: Spectral theory of differential operators, Encyclopaedia of Mathematical Sciences, vol. 64, Springer-Verlag, 1994.
[34] I.J. Schoenberg, Metric spaces and completely monotone functions, Ann. of Math. 39 (1938), 811-841.
[35] Igor R. Shafarevich, Basic algebraic geometry. 1, 2nd ed., Springer-Verlag, Berlin, 1994. Varieties in projective space; Translated from the 1988 Russian edition and with notes by Miles Reid.
[36] S. Smale, On the Morse index theorem, J. Math. Mech. 14 (1965), 1049 – 1055. S. Smale, Corrigendum: ”On the Morse index theorem”, J. Math. Mech. 16 (1967), 1069 – 1070. · Zbl 0166.36102
[37] -, Mathematical problems for the next century, Mathematics: Frontiers and Perspectives , AMS, 2000, pp. 271-294. CMP 2000:13
[38] S. Smale and D.-X. Zhou, Estimating the approximation error in learning theory, Preprint, 2001.
[39] Michael E. Taylor, Partial differential equations, Texts in Applied Mathematics, vol. 23, Springer-Verlag, New York, 1996. Basic theory. Michael E. Taylor, Partial differential equations. I, Applied Mathematical Sciences, vol. 115, Springer-Verlag, New York, 1996. Basic theory. · Zbl 0869.35001
[40] L.G. Valiant, A theory of the learnable, Communications of the ACM27 (1984), 1134-1142. · Zbl 0587.68077
[41] S. van de Geer, Empirical processes in m-estimation, Cambridge University Press, 2000.
[42] Vladimir N. Vapnik, Statistical learning theory, Adaptive and Learning Systems for Signal Processing, Communications, and Control, John Wiley & Sons, Inc., New York, 1998. A Wiley-Interscience Publication. · Zbl 0935.62007
[43] P. Venuvinod, Intelligent production machines: benefiting from synergy amongst modelling, sensing and learning, Intelligent Production Machines: Myths and Realities, CRC Press LLC, 2000, pp. 215-252.
[44] A.G. Vitushkin, Estimation of the complexity of the tabulation problem, Nauka (in Russian), 1959, English Translation appeared as Theory of the Transmission and Processing of the Information, Pergamon Press, 1961.
[45] Grace Wahba, Spline models for observational data, CBMS-NSF Regional Conference Series in Applied Mathematics, vol. 59, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1990. · Zbl 0813.62001
[46] R. Williamson, A. Smola, and B. Schölkopf, Generalization performance of regularization networks and support vector machines via entropy numbers of compact operators, Tech. Report NC2-TR-1998-019, NeuroCOLT2, 1998. · Zbl 1008.62507
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.