zbMATH — the first resource for mathematics

Relative entropy in sequential decision problems. (English) Zbl 0952.91019
Summary: Consider an agent who faces a sequential decision problem. At each stage the agent takes an action and observes a stochastic (e.g., daily prices, weather conditions, opponents’ actions in a repeated game, etc.). The agent’s stage-utility depends on his action, the observed outcome and on previous outcomes. We assume the agent is Bayesian and is endowed with a subjective belief over the distribution of outcomes. The agent’s initial belief is typically inaccurate. Therefore, his subjectively optimal strategy is initially suboptimal. As time passes information about the true dynamics is accumulated and, depending on the compatibility of the belief with respect to the truth, the agent may eventually learn to optimize. We introduce the notion of relative entropy, which is a natural adaptation of the entropy of a stochastic process to the subjective set-up. We present conditions, expressed in terms of relative entropy, that determine whether the agent will eventually learn to optimize. It is shown that low entropy yields asymptotic optimal behavior. In addition, we present a notion of pointwise merging and link it with relative entropy.

91B06 Decision theory
Full Text: DOI
[1] Blackwell, D.; Dubins, L., Merging of opinions with increasing information, Annals of mathematical statistics, 38, 882-886, (1962) · Zbl 0109.35704
[2] Blume, L., Easley, D., 1992. Rational expectations and rational learning. Notes from the Economic Theory Workshop in Honor of Roy Radner, Cornell University.
[3] Bollt, M.E., Jones, M.A., 1996. Developing Symbolic Dynamics to Measure the Complexity of Repeated Game Strategies by Topological Entropy. Math Center discussion paper No. 1165, Northwestern University.
[4] Feller, W., 1971. An Introduction to Probability Theory and Its Applications, Vol. II. Wiley, New York. · Zbl 0219.60003
[5] Kalai, E.; Lehrer, E., Rational learning lead to Nash equilibrium, Econometrica, 61, 1019-1045, (1993) · Zbl 0793.90106
[6] Kalai, E.; Lehrer, E., Weak and strong merging of opinions, Journal of mathematical economics, 23, 73-86, (1994) · Zbl 0789.90022
[7] Kolmogorov, A.N., New metric invariants of transitive dynamical systems and automorphisms of Lebesgue spaces, Doklady akademii nauk SSSR, 119, 861-864, (1958) · Zbl 0083.10602
[8] Lehrer, E., Repeated games with stationary bounded recall strategies, Journal of economic theory, 46, 130-144, (1988) · Zbl 0653.90106
[9] Lehrer, E., Smorodinsky, R., 1995. When is a Bayesian agent fortunate: relative entropy and learning, manuscript.
[10] Lehrer, E.; Smorodinsky, R., Compatible measures and merging, Mathematics of operations research, 21, 3, 697-706, (1996) · Zbl 0860.60021
[11] Lehrer, E.; Smorodinsky, R., Repeated large games with incomplete information, Games and economic behavior, 18, 116-134, (1997) · Zbl 0889.90175
[12] Lucas, R., Asset pricing in an exchange economy, Econometrica, 46, 1429-1445, (1978) · Zbl 0398.90016
[13] Neyman, A., Okada, D., 1996. Strategic entropy and complexity in repeated games. Discussion Paper #104, Center For Rationality and Interactive Decision Theory.
[14] Nyarko, Y., Bayesian learning leads to correlated equilibrium in normal form games, Economic theory, 4, 821-842, (1994) · Zbl 0811.90135
[15] Sandroni, A., 1996. Do markets favor agents able to make accurate predictions? Mimeo, Northwestern University. · Zbl 1055.91539
[16] Sandroni, A., Necessary and sufficient conditions for convergence to Nash equilibrium: the almost absolute continuity hypothesis, Games and economic behavior, 22, 121-147, (1998) · Zbl 0899.90174
[17] Sandroni, A., Does rational learning lead to Nash equilibrium in finitely repeated games?, Journal of economic theory, 78, 195-218, (1998) · Zbl 0914.90281
[18] Shannon, C., The mathematical theory of communication, Bell system technical journal, 27, 379-423, (1948) · Zbl 1154.94303
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.