Optimally approximating exponential families. (English) Zbl 1283.94027

Summary: This article studies exponential families \(\mathcal E\) on finite sets such that the information divergence \(D(P\|\mathcal E)\) of an arbitrary probability distribution from \(\mathcal E\) is bounded by some constant \(D>0\). A particular class of low-dimensional exponential families that have low values of \(D\) can be obtained from partitions of the state space. The main results concern optimality properties of these partition exponential families. The case where \(D=\log(2)\) is studied in detail. This case is special, because if \(D<\log(2)\), then \(\mathcal{E}\) contains all probability measures with full support.


94A15 Information theory (general)
62B10 Statistical aspects of information-theoretic topics
94A17 Measures of information, entropy
Full Text: arXiv Link


[1] Ay, N.: An information-geometric approach to a theory of pragmatic structuring. Ann. Probab. 30 (2002), 416-436. · Zbl 1010.62007 · doi:10.1214/aop/1020107773
[2] Ay, N.: Locality of global stochastic interaction in directed acyclic networks. Neural Computat. 14 (2002), 2959-2980. · Zbl 1079.68582 · doi:10.1162/089976602760805368
[3] Brown, L.: Fundamentals of Statistical Exponential Families: With Applications in Statistical Decision Theory. Institute of Mathematical Statistics, Hayworth 1986. · Zbl 0685.62002
[4] Cover, T., Thomas, J.: Elements of Information Theory. First edition. Wiley, 1991. · Zbl 0762.94001
[5] Csiszár, I., Shields, P.: Information Theory and Statistics: A Tutorial. First edition. Foundations and Trends in Communications and Information Theory. Now Publishers, 2004. · Zbl 1157.62300 · doi:10.1561/0100000004
[6] Csiszár, I., Matúš, F.: Generalized maximum likelihood extimates for exponential families. Probab. Theory Rel. Fields 141 (2008), 213-246. · Zbl 1133.62039 · doi:10.1007/s00440-007-0084-z
[7] Pietra, S. Della, Pietra, V. Della, Lafferty, J.: Inducing features of random fields. IEEE Trans. Pattern Analysis and Machine Intelligence 19 (1997), 380-393. · doi:10.1109/34.588021
[8] Drton, M., Sturmfels, B., Sullivant, S.: Lectures on algebraic statistics. Oberwolfach Seminars 39, Birkhäuser, Basel 2009. · Zbl 1166.13001
[9] Geiger, D., Meek, C., Sturmfels, B.: On the toric algebra of graphical models. Ann. Statist. 34 (2006), 5, 1463-1492. · Zbl 1104.60007 · doi:10.1214/009053606000000263
[10] Jaynes, E. T.: Information theory and statistical mechanics. Phys. Rev. 106 (1957), 4, 620-630. · Zbl 0084.43701 · doi:10.1103/PhysRev.106.620
[11] Juríček, J.: Maximization of information divergence from multinomial distributions. Acta Univ. Carolin. 52 (2011), 1, 27-35. · Zbl 1356.62007
[12] Lauritzen, S. L.: Graphical Models. First edition. Oxford Statistical Science Series, Oxford University Press, 1996. · Zbl 0907.62001
[13] Linsker, R.: Self-organization in a perceptual network. IEEE Computer 21 (1988), 105-117. · doi:10.1109/2.36
[14] Matúš, F., Ay, N.: On maximization of the information divergence from an exponential family. Proc. WUPES’03, University of Economics, Prague 2003, pp. 199-204.
[15] Matúš, F., Rauh, J.: Maximization of the information divergence from an exponential family and criticality. 2011 IEEE International Symposium on Information Theory Proceedings (ISIT2011), 2011.
[16] Montúfar, G., Rauh, J., Ay, N.: Expressive power and approximation errors of Restricted Boltzmann Machines. NIPS, 2011.
[17] Oxley, J.: Matroid Theory. First edition. Oxford University Press, New York 1992. · Zbl 0784.05002
[18] Rauh, J.: Finding the Maximizers of the Information Divergence from an Exponential Family. Ph.D. Dissertation, Universität Leipzig, 2011. · Zbl 1365.94160
[19] Rauh, J.: Finding the maximizers of the information divergence from an exponential family. IEEE Trans. Inform. Theory 57 (2011), 6, 3236-3247. · Zbl 1365.94160 · doi:10.1109/TIT.2011.2136230
[20] Rauh, J., Kahle, T., Ay, N.: Support sets of exponential families and oriented matroids. Internat. J. Approx. Reasoning 52 (2011), 5, 613-626. · Zbl 1214.62013 · doi:10.1016/j.ijar.2011.01.013
[21] Zhu, S. C., Wu, Y. N., Mumford, D.: Minimax entropy principle and its application to texture modeling. Neural Computation 9 (1997), 1627-1660. · doi:10.1162/neco.1997.9.8.1627
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.