×

Decision theoretic generalizations of the PAC model for neural net and other learning applications. (English) Zbl 0762.68050

The paper introduces an extension of the probably approximately correct model of learing from examples. The class of functions on an instance space that usually are defined only on the set \(\{0,1\}\) has been changed so that values are members of arbitrary sets. The new extended model is based on statistical decision theory. The results of the paper are based on a proposed generalized notion of dimension of Vapnik-Chervonenkis that is applicable to real-valued functions. Due to the extended model, it is possible to formulate distribution-independent upper bounds on the size of the instance space for learning in feed-forward neural networks.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
PDF BibTeX XML Cite
Full Text: DOI

References:

[1] Abe, N.; Warmuth, M., On the computational complexity of approximating distributions by probabilistic automata, (), 52-66
[2] Alexander, K., Rates of growth for weighted empirical processes, (), 475-493 · Zbl 1372.60051
[3] Alexander, K., Rates of growth and sample moduli for weighted empirical processes indexed by sets, Probab. theory related fields, 75, 379-423, (1987) · Zbl 0596.60029
[4] Angluin, D., Queries and concept learning, Mach. learning, 2, 319-342, (1988)
[5] Angluin, D.; Laird, P., Learning from noisy examples, Mach. learning, 2, 4, 343-370, (1988)
[6] Angluin, D.; Valiant, L.G., Fast probabilistic algorithms for Hamiltonian circuits and matchings, J. comput. system sci., 18, 155-193, (1979) · Zbl 0437.05040
[7] Anthony, M.; Shawe-Taylor, J., A result of vapnik with applications, () · Zbl 0822.68088
[8] Assouad, P., Densité et dimension, Ann. inst. Fourier, 33, 3, 233-282, (1983) · Zbl 0504.60006
[9] Barron, A., Statistical properties of artificial neural networks, (), 280-285
[10] Barron, A.; Cover, T., Minimum complexity density estimation, IEEE trans. inform. theory, (1990), to appear
[11] Barto, A.G.; Anandan, P., Pattern recognizing stochastic learning automata, IEEE trans. systems man cybernet., 15, 360-374, (1985) · Zbl 0561.68042
[12] Baum, E.; Haussler, D., What size net gives valid generalization?, Neural comput., 1, 1, 151-160, (1989)
[13] Benedek, G.M.; Itai, A., Learnability by fixed distributions, (), 80-90
[14] Berger, J., ()
[15] Billingsley, P., ()
[16] Blumer, A.; Ehrenfeucht, A.; Haussler, D.; Warmuth, M.K., Learnability and the vapnik-chervonenkis dimension, J. assoc. comput. Mach., 36, 4, 929-965, (1989) · Zbl 0697.68079
[17] Breiman, L.; Friedman, J.H.; Olshen, R.A.; Stone, C.J., Classification and regression trees, (1984), Wadsworth · Zbl 0541.62042
[18] Buntine, W.L., A theory of learning classification rules, ()
[19] Buntine, W.; Weigend, A., Bayesian back propagation, (1991), unpublished manuscript · Zbl 0761.62031
[20] {\scClarke, B., and Barron, A.} (manuscript), Entropy, risk and the Bayesian central limit theorem.
[21] Clarke, B.; Barron, A., Information-theoretic asymptotics of Bayes methods, IEEE trans. inform. theory, 36, 3, 453-471, (1990) · Zbl 0709.62008
[22] Cover, T.M., Capacity problems for linear machines, (), 283-289
[23] Denker, J.; Schwartz, D.; Wittner, B.; Solla, S.; Howard, R.; Jackel, L.; Hopfield, J., Automatic learning, rule extraction and generalization, Complex systems, 1, 877-922, (1987) · Zbl 0662.68085
[24] Devroye, L., Automatic pattern recognition: A study of the probability of error, IEEE trans. pattern anal. Mach. intelligence, 10, 4, 530-543, (1988) · Zbl 0661.62056
[25] Duda, R.O.; Hart, P.E., ()
[26] Dudley, R.M., Central limit theorems for empirical measures, Ann. probab., 6, 6, 899-929, (1978) · Zbl 0404.60016
[27] Dudley, R.M., A course on empirical processes, (), 2-142 · Zbl 0554.60029
[28] Dudley, R.M., Universal donsker classes and metric entropy, Ann. probab., 15, 4, 1306-1326, (1987) · Zbl 0631.60004
[29] Durbin, R.; Rumelhart, D.E., Product units: A computationally powerful and biologically plausible extension to backpropagation networks, Neural comput., 1, 1, 133-142, (1989)
[30] Edelsbrunner, H., ()
[31] Ehrenfeucht, A.; Haussler, D.; Kearns, M.; Valiant, L., A general lower bound on the number of examples needed for learning, Inform. comput., 82, 247-261, (1989) · Zbl 0679.68158
[32] Farmer, J.D., Information dimension and the probabilistic structure of chaos, Z. naturforsch. A, 37, 1304-1325, (1982)
[33] Farmer, J.D.; Ott, E.; Yorke, J.A., The dimension of chaotic attractors, Physica D, 7, 153-180, (1983)
[34] Ferguson, T., ()
[35] Gullapalli, V., A stochastic reinforcement algorithm for learning real-valued functions, Neural networks, 3, 6, 671-692, (1990)
[36] Gyorgy, G.; Tishby, N., Statistical theory of learning a rule, ()
[37] Haussler, D., Quantifying inductive bias: AI learning algorithms and Valiant’s learning framework, Artifical intelligence, 36, 177-221, (1988) · Zbl 0651.68104
[38] Haussler, D., Learning conjunctive concepts in structural domains, Mach. learning, 4, 7-40, (1989)
[39] Haussler, D., Decision theoretic generalizations of the PAC learning model, (), 21-41
[40] Haussler, D.; Kearns, M.; Littlestone, N.; Warmuth, M.K., Equivalence of models for polynomial learnability, Inform. comput., 95, 129-161, (1991) · Zbl 0743.68115
[41] Haussler, D.; Kearns, M.; Schapire, R., Bounds on the sample complexity of Bayesian learning using information theory and the VC dimension, (), 61-74
[42] Haussler, D.; Littlestone, N.; Warmuth, M., (), Technical Report UCSC-CRL-90-54
[43] Inform Comput., to appear.
[44] Haussler, D.; Welzl, E., Epsilon nets and simplex range queries, Discrete comput. geom., 2, 127-151, (1987) · Zbl 0619.68056
[45] Kearns, M.; Li, M.; Pitt, L.; Valiant, L., On the learnability of Boolean formulae, (), 285-295, New York
[46] Kearns, M.; Schapire, R., Efficient distribution-free learning of probabilistic concepts, (), 382-391
[47] Kiefer, J., ()
[48] Kolmogorov, A.N.; Tihomirov, V.M., Ε-entropy and ε-capacity of sets in functional spaces, Amer. math. soc. transl. ser. 2, 17, 277-364, (1961)
[49] Kulkarni, S., (), Technical Report LIDS-P-1910
[50] Kullback, S., ()
[51] LeCun, Y.; Denker, J.; Solla, S., Optimal brain damage, (), 589
[52] Lindley, D.V., The present position in Bayesian statistics, Statist. sci., 5, 1, 44-89, (1990) · Zbl 0955.62515
[53] Lineal, N.; Mansour, Y.; Rivest, R., Results on learnability and the vapnik-chervonenkis dimension, (), 56-68
[54] Littlestone, N., Learning quickly when irrelevant attributes abound: A new linearthreshold algorithm, Mach. learning, 2, 285-318, (1988)
[55] MacKay, D., Bayesian methods for adaptive models, ()
[56] Mandelbrot, B.B., ()
[57] McCullagh, P.; Nelder, J.A., ()
[58] Moody, J.; Darken, C., Fast learning in networks of locally-tuned processing units, Neural comput., 1, 2, 281-294, (1989)
[59] Narendra, K.S.; Thathachar, M.A.L., ()
[60] Natarajan, B.K., Learning over classes of distributions, (), 408-409 · Zbl 0761.68080
[61] Natarajan, B.K., (), Technical Report HPL-SAL-89-29
[62] Natarajan, B.K., Some results on learning, (1989), Technical Report CMU-RI-TR-89-6, Carnegie Mellon · Zbl 0747.68056
[63] Natarajan, B.K.; Tadepalli, P., Two new frameworks for learning, (), 402-415
[64] Nobel, A.; Dembo, A., (), Technical Report 74
[65] Nolan, D.; Pollard, D., U-processes: rates of convergence, Ann. statist., 15, 2, 780-799, (1987) · Zbl 0624.60048
[66] Nowlan, S., Maximum likelihood competitive learning, (), 574-582
[67] Nowlan, S.; Hinton, G., (), Technical Report
[68] Opper, M.; Haussler, D., Calculation of the learning curve of Bayes optimal classification algorithm for learning a perceptron with noise, (), 75-87
[69] Opper, M.; Haussler, D., Generalization performance of Bayes optimal classification algorithm for learning a perceptron, Phys. rev. lett., 66, 20, 2677-2680, (1991) · Zbl 0968.92502
[70] Poggio, T.; Girosi, F., (), Technical Report A.I. Memo No. 1140
[71] Pollard, D., ()
[72] Pollard, D., Rates of uniform almost-sure convergence for empirical processes indexed by unbounded classes of functions, (1986), manuscript
[73] Pollard, D., Empirical processes: theory and applications, () · Zbl 0741.60001
[74] Quiroz, A.J., Metric entropy and learnability, (1989), Universidad Simón Bolívar Caracas, Venezuela, unpublished mannuscript
[75] Renyi, A., ()
[76] Rissanen, J., Stochastic complexity and modeling, Ann. statist., 14, 3, 1080-1100, (1986) · Zbl 0602.62008
[77] Rumelhart, D.E.; McClelland, J.L., (), Volume 1: Foundations
[78] Sauer, N., On the density of families of sets, J. combin. theory ser. A, 13, 145-147, (1972) · Zbl 0248.05005
[79] Shackelford, G.; Volper, D., Learning k-DNF with noise in the attributes, (), 97-103
[80] Simmons, G.F., ()
[81] Sloan, R., Types of noise in data for concept learning, (), 91-96
[82] Sompolinsky, H.; Tishby, N.; Seung, H.S., Learning from examples in large neural networks, Phys. rev. lett., 65, 1683-1686, (1990)
[83] Talagrand, M., Sharper bounds for empirical processes, (1991), unpublished manuscript · Zbl 0798.60051
[84] Tishby, N.; Levin, E.; Solla, S., Consistent inference of probabilities in layered networks: predictions and generalizations, (), 403-409
[85] Touretsky, D., ()
[86] Touretsky, D., ()
[87] Valiant, L.G., A theory of the learnable, Comm. ACM, 27, 11, 1134-1142, (1984) · Zbl 0587.68077
[88] Vapnik, V.N., ()
[89] Vapnik, V.N., Inductive principles of the search for empirical dependences (methods based on weak convergence of probability measures, () · Zbl 0746.68075
[90] {\scVapnik V. N., and Chervonenkis A. Ya.}, On the uniform convergence of relative frequencies of events to their probabilities, Theory Probab. Appl.{\bf16} (2), 264-280 · Zbl 0247.60005
[91] Weigend, A.; Huberman, B.; Rumelhart, D., Predicting the future: A connectionist approach, Internat. J. neural systems, 1, 193-209, (1990)
[92] Weiss, S.; Kulikowski, C., ()
[93] Welzl, E., Partition trees for triangle counting and other range search problems, (), 23-33, Urbana, IL
[94] Wenocur, R.S.; Dudley, R.M., Some special vapnik-chervonenkis classes, Discrete math., 33, 313-318, (1981) · Zbl 0459.60008
[95] White, H., Connectionist nonparametric regression: multilayer feedforward networks can learn arbitrary mappings, Neural networks, 3, 535-549, (1990)
[96] White, H., Learning in artificial neural networks: A statistical perspective, Neural comput., 1, 4, 425-464, (1990)
[97] Yamanishi, K., A learning criterion for stochastic rules, (), 67-81
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.