## On measures of entropy and information.(English)Zbl 0106.33001

Proc. 4th Berkeley Symp. Math. Stat. Probab. 1, 547-561 (1961).
Let $$\Delta$$ denote the set of all finite discrete “generalized” probability distributions, that is, $$\Delta$$ is the set of all sequences $$P = (p_1, p_2,\ldots, p_n)$$ of nonnegative numbers such that $$0 < \sum_{k=1}^n p_k\le 1$$. The quantity $H(P) = H_1(P) = - \sum_{k=1}^n p_k\log_2 p_k\biggl / \sum_{k=1}^n p_k,$ defined for all $$P\in\Delta$$ and called the “entropy of order 1” of the generalized probability distribution $$P = (p_1, p_2,\ldots, p_n)$$, is characterized by the following five postulates:
Postulate 1. $$H(P)$$ is a symmetric function of the elements of $$P$$.
Postulate 2. If $$\{p\}$$ denotes the generalized probability distribution consisting of the single probability $$p$$, then $$H(\{p\})$$ is a continuous function of $$p$$ in the interval $$0 <p \le 1$$.
Postulate 3. $$H(\{\frac12\}) = 1$$.
Postulate 4. For $$P = (p_1, p_2,\ldots, p_n)\in\Delta$$, $$Q = (q_1, q_2,\ldots, q_n)\in\Delta$$, and $$P*Q =(p_jq_k)$$, $$j = 1, 2,\ldots, m$$; $$k = 1, 2, \ldots, n)$$, we have $$H(P * Q) = H(P) + H(Q)$$.
Postulate 5. If $$P\in\Delta$$, $$Q\in\Delta$$, and $$W(P) + W(Q) < 1$$, where $$W(P)$$ is the sum (weight) of the probabilities of $$P$$ and similarly for $$W(Q)$$, we have $H(P \cup Q) = [W(P)H(P) + W(Q) H(Q)] / [W(P) + W (Q)],$ where $$P \cup Q = (p_1, p_2, \ldots, p_m, q_1, q_2, \ldots, q_n)$$ if $$P = (p_1, p_2,\ldots, p_m)$$ and $$Q = (q_1, q_2, \ldots, q_n)$$.
Postulate 5 may be called the mean-value property of entropy.
The question arises what other quantity is obtained if we replace in postulate 5 the arithmetic mean by some other mean value related to a Kolmogorov-Nagumo function $$g(x)$$. The postulate 5 is so replaced by the
Postulate 5’: There exists a strictly monotonic and continuous function $$g(x)$$ such that if $$P\in\Delta$$, $$Q\in\Delta$$, and $$W(P) + W(Q) \le 1$$, we have $H(P \cup Q) = g^{-1}([W(P) g(H(P)) + W(Q) g(H(Q))]/[W(P) + W(Q)]).$ Clearly, if $$g(x) = ax + b$$ with $$a\ne 0$$, the postulate 5’ reduces to 5.
Another choice of $$g(x)$$ which is compatible with postulate 4 is the following: $g(x) = g_\alpha(x) = 2^{(\alpha-1)x}\quad\text{with }\alpha > 0,\ \alpha\ne 1.$ Then postulates 1, 2, 3, 4, and 5’ characterize the “entropy of order $$\alpha$$”: $H(P) = H_\alpha(P) = \frac1{\alpha-1}\log_2\left(\sum_{k=1}^n p_k^\alpha\Bigl / \sum_{k=1}^n p_k\right).$
Note that $$H_1(P)$$ is a limiting case of $$H_\alpha(P)$$ for $$\alpha\to 1$$. In the case $$P = (p_1, p_2,\ldots, p_n)$$ is absolutely continuous with respect to $$Q = (q_1, q_2,\ldots, q_n)$$, “the information of order $$\alpha$$ obtained if the generalized distribution $$P$$ is replaced by the generalized distribution $$Q$$” is defined as follows:
$I_\alpha(Q \vert P) = \frac1{\alpha-1} \log_2\left(\sum_{k=1}^n \frac{q_k^\alpha}{p_k^{\alpha-1}}\Bigl / \sum_{k=1}^n p_k\right),\ \alpha\ne 1.$
If $$\alpha\to 1$$ we obtain the “information of order 1”
$I_1(Q \vert P) = \lim_{\alpha\to 1} I_\alpha(Q \vert P) = \sum_{k=1}^n q_k \log_2\frac{q_k}{p_k}\Bigl / \sum_{k=1}^n q_k,$ which coincides with the well-known quantity (generalized entropy) in the case $$\sum_{k=1}^n p_k = 1$$.
Similarly to $$H_\alpha(P)$$, the quantity $$I'_\alpha(Q \vert P)$$ is characterized by a set of five postulates such that the Kolmogorov-Nagumo function $$g(x)$$ involved in the mean value property postulate (analogous to postulate 5’) is necessarily either a linear or an exponential function, leading to $$I_1(Q \vert P)$$ or $$I_\alpha(Q \vert P)$$, respectively.
In the last section of the paper, the author gives as an application of these concepts an information-theoretical proof of a limit theorem on Markov chains following the idea due to Yu. V. Linnik [Theor. Probab. Appl. 4, 288–299 (1960); translation from Teor. Veroyatn. Primen. 4, 311–321 (1959; Zbl 0097.13103)].
Reviewer: Alfred Pérez

### MSC:

 94A17 Measures of information, entropy 62B10 Statistical aspects of information-theoretic topics 60F99 Limit theorems in probability theory

### Citations:

Zbl 0101.34803; Zbl 0097.13103