# zbMATH — the first resource for mathematics

About the maximum information and maximum likelihood principles. (English) Zbl 1274.62644
Summary: Neural networks with radial basis functions are considered, and the Shannon information in their outputs concerning the inputs. The role of information-preserving input transformations is discussed when the network is specified by the maximum information principle and by the maximum likelihood principle. A transformation is found which simplifies the input structure in the sense that it minimizes the entropy in the class of all information preserving transformations. Such transformations need not be unique, under some assumptions they may be any minimal sufficient statistics.

##### MSC:
 62M45 Neural nets and related approaches to inference from stochastic processes 62B10 Statistical aspects of information-theoretic topics 68T05 Learning and adaptive systems in artificial intelligence
Full Text:
##### References:
 [1] Atick J. J., Redlich A. N.: Towards a theory of early visual processing. Neural Computation 2 (1990), 308-320 · Zbl 0705.92006 · doi:10.1162/neco.1990.2.3.308 [2] Attneave F.: Some informational aspects of visual perception. Psychological Review 61 (1954), 183-193 · doi:10.1037/h0054663 [3] Becker S., Hinton G. E.: A self-organizing neural network that discovers surfaces in random-dot stereograms. Nature (London) 355 (1992), 161-163 · doi:10.1038/355161a0 [4] Bromhead D. S., Lowe D.: Multivariate functional interpolation and adaptive networks. Complex Systems 2 (1988), 321-355 · Zbl 0657.68085 [5] Casdagli M.: Nonlinear prediction of chaotic time-series. Physica 35D (1989), 335-356 · Zbl 0671.62099 · doi:10.1016/0167-2789(89)90074-2 [6] Cover T. M., Thomas J. B.: Elements of Information Theory. Wiley, New York 1991 · Zbl 1140.94001 · doi:10.1002/047174882X [7] Dempster A. P., Laird N. M., Rubin D. B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B 39 (1977), 1-38 · Zbl 0364.62022 [8] Devroye L., Győrfi L.: Nonparametric Density Estimation: The $$L_1$$ View. John Wiley, New York 1985 [9] Devroye L., Győrfi L., Lugosi G.: A Probabilistic Theory of Pattern Recognition. Springer, New York 1996 · Zbl 0853.68150 [10] Haykin S.: Neural Networks: A Comprehensive Foundation. MacMillan, New York 1994 · Zbl 0934.68076 [11] Hertz J., Krogh A., Palmer R. G.: Introduction to the Theory of Neural Computation. Addison-Wesley, New York, Menlo Park CA, Amsterdam 1991 [12] Jacobs R. A., Jordan M. I.: A competitive modular connectionist architecture. Advances in Neural Information Processing Systems (R. P. Lippmann, J. E. Moody and D. J. Touretzky, Morgan Kaufman, San Mateo CA 1991, Vol. 3. pp. 767-773 [13] Kay J.: Feature discovery under contextual supervision using mutual information. International Joint Conference on Neural Networks, Baltimore MD 1992, Vol. 4, pp. 79-84 [14] Liese F., Vajda I.: Convex Statistical Distances. Teubner Verlag, Leipzig 1987 · Zbl 0656.62004 [15] Linsker R.: Self-organization in perceptual network. Computer 21 (1988), 105-117 · Zbl 05087941 · doi:10.1109/2.36 [16] Linsker R.: Perceptual neural organization: Some approaches based on network models and information theory. Annual Review of Neuroscience 13 (1990), 257-281 · doi:10.1146/annurev.ne.13.030190.001353 [17] Lowe D.: Adaptive radial basis function nonlinearities, and the problem of generalization. First IEE International Conference on Artificial Neural Networks, 1989, pp. 95-99 [18] Moody J., Darken C.: Fast learning in locally-tuned processing units. Neural Computation 1 (1989), 281-294 · doi:10.1162/neco.1989.1.2.281 [19] Palm H. CH.: A new method for generating statistical classifiers assuming linear mixtures of Gaussiian densities. Proceedings of the 12th IAPR Int. Conference on Pattern Recognition, IEEE Computer Society Press Jerusalem 1994, Vol. II., pp. 483-486 [20] Plumbley M. D.: A Hebbian/anti-Hebbian network which optimizes information capacity by orthonormalizing the principle subspace. IEE Artificial Neural Networks Conference, ANN-93, Brighton 1992, pp. 86-90 [21] Plumbley M. D., Fallside F.: An information-theoretic approach to unsupervised connectionist models. Proceedings of the 1988 Connectionist Models Summer School, (D. Touretzky, G. Hinton and T. Sejnowski, Morgan Kaufmann, San Mateo 1988, pp. 239-245 [22] Poggio T., Girosi F.: Regularization algorithms for learning that are eqivalent to multilayer networks. Science 247 (1990), 978-982 · Zbl 1226.92005 · doi:10.1126/science.247.4945.978 [23] Rissanen J.: Stochastic Complexity in Statistical Inquiry. World Scientific, New Jersey 1989 · Zbl 0800.68508 [24] Specht D. F.: Probabilistic neural networks for classification, mapping or associative memory. Proc. of the IEEE Int. Conference on Neural Networks, 1988, Vol. I., pp. 525-532 [25] Shannon C. E.: A mathematical theory of communication. Bell System Technical Journal 27 (1948), 379-423, 623-656 · Zbl 1154.94303 · doi:10.1002/j.1538-7305.1948.tb01338.x [26] Streit L. R., Luginbuhl T. E.: Maximum likelihood training of probabilistic neural networks. IEEE Trans. Neural Networks 5 (1994), 5, 764-783 · doi:10.1109/72.317728 [27] Vajda I., Grim J.: Bayesian optimality of decisions is achievable by RBF neural networks. IEEE Trans. Neural Networks, submitted [28] Ukrainec A., Haykin S.: A modular neural network for unhancement of errors-polar radar targets. Neural Networks 9 (1996), 141-168 · Zbl 05477594 · doi:10.1016/0893-6080(95)00062-3 [29] Uttley A. M.: The transmission of information and the effect of local feedback in theoretical and neural networks. Brain Research 102 (1966), 23-35 [30] Watanabe S., Fukumizu K.: Probabilistic design of layered neural networks based on their unified framework. IEEE Trans. Neural Networks 6 (1995), 3, 691-702 · doi:10.1109/72.377974 [31] Xu L., Jordan M. I.: EM learning on a generalized finite mixture model for combining multiple classifiers. World Congress on Neural Networks, 1993, Vol. 4, pp. 227-230 [32] Xu L., Krzyżak A., Oja E.: Rival penalized competitive learning for clustering analysis, RBF net and curve detection. IEEE Trans. Neural Networks 4 (1993), 636-649 · doi:10.1109/72.238318
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.