##
**An information-geometric approach to a theory of pragmatic structuring.**
*(English)*
Zbl 1010.62007

From the paper: In the field of neural networks, so-called infomax principles like the principle of “maximum information preservation” by R. Linsker [Computer 21, 105-117 (1988)] are formulated to derive learning rules that improve the information processing properties of neural systems. These principles, which are based on information-theoretic measures, are intended to describe the mechanism of learning in the brain. There, the starting point is a low-dimensional and biophysiologically motivated parametrization of the neural system, which need not necessarily be compatible with the given optimization principle. In contrast to this, we establish theoretical results about the low complexity of optimal solutions for the optimization problem of frequently used measures like the mutual information in an unconstrained and more theoretical setting. We do not comment on applications to modeling neural networks.

Within the framework of information geometry, the interaction among units of a stochastic system is quantified in terms of the Kullback-Leibler divergence of the underlying joint probability distribution from an appropriate exponential family. In the present paper, the main example for such a family is given by the set of all factorizable random fields. Motivated by this example, the locally farthest points from an arbitrary exponential family \({\mathcal E}\) are studied. In the corresponding dynamical setting, such points can be generated by the structuring process with respect to \({\mathcal E}\) as a repelling set. The main results concern the low complexity of such distributions which can be controlled by the dimension of \({\mathcal E}\).

Within the framework of information geometry, the interaction among units of a stochastic system is quantified in terms of the Kullback-Leibler divergence of the underlying joint probability distribution from an appropriate exponential family. In the present paper, the main example for such a family is given by the set of all factorizable random fields. Motivated by this example, the locally farthest points from an arbitrary exponential family \({\mathcal E}\) are studied. In the corresponding dynamical setting, such points can be generated by the structuring process with respect to \({\mathcal E}\) as a repelling set. The main results concern the low complexity of such distributions which can be controlled by the dimension of \({\mathcal E}\).

### MSC:

62B10 | Statistical aspects of information-theoretic topics |

62M40 | Random fields; image analysis |

62M45 | Neural nets and related approaches to inference from stochastic processes |

68T05 | Learning and adaptive systems in artificial intelligence |

92B20 | Neural networks for/in biological studies, artificial life and related topics |

92C20 | Neural biology |

### Keywords:

information geometry; Kullback-Leibler divergence; mutual information; infomax principle; stochastic interaction; exponential family
Full Text:
DOI

### References:

[1] | AMARI, S.-I. (1985). Differential-Geometric Methods in Statistics. Lecture Notes in Statist. 28. Springer, Berlin. · Zbl 0559.62001 |

[2] | AMARI, S.-I. (1997). Information geometry. Contemp. Math. 203 81-95. · Zbl 0881.62034 |

[3] | AMARI, S.-I. (2001). Information geometry on hierarchy of probability distributions. IEEE Trans. Inform. Theory 47 1701-1711. · Zbl 0997.94009 |

[4] | AMARI, S.-I. and NAGAOKA, H. (2000). Methods of Information Geometry. Math. Monogr. 191. Oxford Univ. Press. · Zbl 0960.62005 |

[5] | AMARI, S.-I., BARNDORFF-NIELSEN, O. E., KASS, R. E., LAURITZEN, S. L. and RAO, C. R. (1987). Differential Geometry in Statistical Inference. IMS, Hayward, CA. · Zbl 0694.62001 |

[6] | AY, N. (2000). Aspekte einer Theorie pragmatischer Informationsstrukturierung. Ph.D. dissertation, Univ. Leipzig. |

[7] | BOOTHBY, W. M. (1975). An Introduction to Differentiable Manifolds and Riemannian Geometry. Pure Appl. Math. 63. Academic Press, New York. · Zbl 0333.53001 |

[8] | BRONDSTED, A. (1983). An Introduction to Convex Polytopes. Springer, New York. |

[9] | COVER, T. M. and THOMAS, J. A. (1991). Elements of Information Theory. WileyInterscience, New York. · Zbl 0762.94001 |

[10] | CSISZÁR, I. (1967). On topological properties of f -divergence. Studia Sci. Math. Hungar. 2 329-339. |

[11] | CSISZÁR, I. (1975). I -divergence geometry of probability distributions and minimization problems. Ann. Probab. 3 146-158. · Zbl 0318.60013 |

[12] | DECO, G. and OBRADOVIC, D. (1996). An Information-Theoretic Approach to Neural Computing. Perspectives in Neural Computing. Springer, New York. · Zbl 0849.68103 |

[13] | FUJIWARA, A. and AMARI, S.-I. (1995). Gradient systems in view of information geometry. Phys. D 80 317-327. · Zbl 0883.53020 |

[14] | GZYL, H. (1995). The Method of Maximum Entropy. Ser. Adv. Math. Appl. Sci. 29. World Scientific, Singapore. · Zbl 0822.62001 |

[15] | HIRSCH, M. and SMALE, S. (1974). Differential Equations, Dynamical Systems, and Linear Algebra. Academic Press, New York. · Zbl 0309.34001 |

[16] | INGARDEN, R. S., KOSSAKOWSKI A. and OHYA M. (1997). Information Dynamics and Open Systems, Classical and Quantum Approach. Kluwer, Dordrecht. · Zbl 0891.94007 |

[17] | JAYNES, E. T. (1957). Information theory and statistical mechanics. Phys. Rev. 106. · Zbl 0084.43701 |

[18] | KULLBACK, S. (1968). Information Theory and Statistics. Dover, Mineola, NY. · Zbl 0149.37901 |

[19] | KULLBACK, S. and LEIBLER, R. A. (1951). On information and sufficiency. Ann. Math. Statist. 22 79-86. · Zbl 0042.38403 |

[20] | LINSKER, R. (1988). Self-organization in a perceptual network. Computer 21 105-117. |

[21] | MARTIGNON, L., VON HASSELN, H., GRÜN, S., AERTSEN, A. and PALM, G. (1995). Detecting higher-order interactions among the spiking events in a group of neurons. Biol. Cybernet. 73 69-81. · Zbl 0826.92008 |

[22] | MURRAY, M. K. and RICE, J. W. (1994). Differential Geometry and Statistics. Chapman and Hall, London. · Zbl 0804.53001 |

[23] | NAGAOKA, H. and AMARI, S. (1982). Differential geometry of smooth families of probability distributions. AETR 82-7, Univ. Tokyo. |

[24] | NAKAMURA, Y. (1993). Completely integrable gradient systems on the manifolds of gaussian and multinomial distributions. Japan J. Indust. Appl. Math. 10 179-189. · Zbl 0814.58021 |

[25] | RAO, C. R. (1945). Information and the accuracy attainable in the estimation of statistical parameters. Bull. Calcutta Math. Soc. 37 81-91. · Zbl 0063.06420 |

[26] | ROCKAFELLAR, R. T. and WETS, J. B. R. (1998). Variational Analysis. Springer, New York. · Zbl 0888.49001 |

[27] | ROMAN, S. (1992). Coding and Information Theory. Springer, New York. · Zbl 0752.94001 |

[28] | SHANNON, C. E. (1948). A mathematical theory of communication. Bell System Tech. J. 27 379-423, 623-656. · Zbl 1154.94303 |

[29] | VAPNIK, V. (1998). Statistical Learning Theory. Adaptive and Learning Systems for Signal Processing, Communications, and Control. Wiley, New York. · Zbl 0935.62007 |

[30] | VAPNIK, V. and CHERVONENKIS, A. (1971). On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab. Appl. 16 264-280. · Zbl 0247.60005 |

[31] | WEBSTER, R. (1994). Convexity. Oxford Univ. Press. · Zbl 0835.52001 |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.