On measures of entropy and information.

*(English)*Zbl 0106.33001
Proc. 4th Berkeley Symp. Math. Stat. Probab. 1, 547-561 (1961).

Let \(\Delta\) denote the set of all finite discrete “generalized” probability distributions, that is, \(\Delta\) is the set of all sequences \(P = (p_1, p_2,\ldots, p_n)\) of nonnegative numbers such that \(0 < \sum_{k=1}^n p_k\le 1\). The quantity
\[ H(P) = H_1(P) = - \sum_{k=1}^n p_k\log_2 p_k\biggl / \sum_{k=1}^n p_k, \]
defined for all \(P\in\Delta\) and called the “entropy of order 1” of the generalized probability distribution \(P = (p_1, p_2,\ldots, p_n)\), is characterized by the following five postulates:

Postulate 1. \(H(P)\) is a symmetric function of the elements of \(P\).

Postulate 2. If \(\{p\}\) denotes the generalized probability distribution consisting of the single probability \(p\), then \(H(\{p\})\) is a continuous function of \(p\) in the interval \(0 <p \le 1\).

Postulate 3. \(H(\{\frac12\}) = 1\).

Postulate 4. For \(P = (p_1, p_2,\ldots, p_n)\in\Delta\), \(Q = (q_1, q_2,\ldots, q_n)\in\Delta\), and \(P*Q =(p_jq_k)\), \(j = 1, 2,\ldots, m\); \(k = 1, 2, \ldots, n)\), we have \(H(P * Q) = H(P) + H(Q)\).

Postulate 5. If \(P\in\Delta\), \(Q\in\Delta\), and \(W(P) + W(Q) < 1\), where \(W(P)\) is the sum (weight) of the probabilities of \(P\) and similarly for \(W(Q)\), we have \[ H(P \cup Q) = [W(P)H(P) + W(Q) H(Q)] / [W(P) + W (Q)], \] where \(P \cup Q = (p_1, p_2, \ldots, p_m, q_1, q_2, \ldots, q_n)\) if \(P = (p_1, p_2,\ldots, p_m)\) and \(Q = (q_1, q_2, \ldots, q_n)\).

Postulate 5 may be called the mean-value property of entropy.

The question arises what other quantity is obtained if we replace in postulate 5 the arithmetic mean by some other mean value related to a Kolmogorov-Nagumo function \(g(x)\). The postulate 5 is so replaced by the

Postulate 5’: There exists a strictly monotonic and continuous function \(g(x)\) such that if \(P\in\Delta\), \(Q\in\Delta\), and \(W(P) + W(Q) \le 1\), we have \[ H(P \cup Q) = g^{-1}([W(P) g(H(P)) + W(Q) g(H(Q))]/[W(P) + W(Q)]). \] Clearly, if \(g(x) = ax + b\) with \(a\ne 0\), the postulate 5’ reduces to 5.

Another choice of \(g(x)\) which is compatible with postulate 4 is the following: \[ g(x) = g_\alpha(x) = 2^{(\alpha-1)x}\quad\text{with }\alpha > 0,\ \alpha\ne 1. \] Then postulates 1, 2, 3, 4, and 5’ characterize the “entropy of order \(\alpha\)”: \[ H(P) = H_\alpha(P) = \frac1{\alpha-1}\log_2\left(\sum_{k=1}^n p_k^\alpha\Bigl / \sum_{k=1}^n p_k\right). \]

Note that \(H_1(P)\) is a limiting case of \(H_\alpha(P)\) for \(\alpha\to 1\). In the case \(P = (p_1, p_2,\ldots, p_n)\) is absolutely continuous with respect to \(Q = (q_1, q_2,\ldots, q_n)\), “the information of order \(\alpha\) obtained if the generalized distribution \(P\) is replaced by the generalized distribution \(Q\)” is defined as follows:

\[ I_\alpha(Q \vert P) = \frac1{\alpha-1} \log_2\left(\sum_{k=1}^n \frac{q_k^\alpha}{p_k^{\alpha-1}}\Bigl / \sum_{k=1}^n p_k\right),\ \alpha\ne 1. \]

If \(\alpha\to 1\) we obtain the “information of order 1”

\[ I_1(Q \vert P) = \lim_{\alpha\to 1} I_\alpha(Q \vert P) = \sum_{k=1}^n q_k \log_2\frac{q_k}{p_k}\Bigl / \sum_{k=1}^n q_k, \] which coincides with the well-known quantity (generalized entropy) in the case \(\sum_{k=1}^n p_k = 1\).

Similarly to \(H_\alpha(P)\), the quantity \(I'_\alpha(Q \vert P)\) is characterized by a set of five postulates such that the Kolmogorov-Nagumo function \(g(x)\) involved in the mean value property postulate (analogous to postulate 5’) is necessarily either a linear or an exponential function, leading to \(I_1(Q \vert P)\) or \(I_\alpha(Q \vert P)\), respectively.

In the last section of the paper, the author gives as an application of these concepts an information-theoretical proof of a limit theorem on Markov chains following the idea due to Yu. V. Linnik [Theor. Probab. Appl. 4, 288–299 (1960); translation from Teor. Veroyatn. Primen. 4, 311–321 (1959; Zbl 0097.13103)].

Postulate 1. \(H(P)\) is a symmetric function of the elements of \(P\).

Postulate 2. If \(\{p\}\) denotes the generalized probability distribution consisting of the single probability \(p\), then \(H(\{p\})\) is a continuous function of \(p\) in the interval \(0 <p \le 1\).

Postulate 3. \(H(\{\frac12\}) = 1\).

Postulate 4. For \(P = (p_1, p_2,\ldots, p_n)\in\Delta\), \(Q = (q_1, q_2,\ldots, q_n)\in\Delta\), and \(P*Q =(p_jq_k)\), \(j = 1, 2,\ldots, m\); \(k = 1, 2, \ldots, n)\), we have \(H(P * Q) = H(P) + H(Q)\).

Postulate 5. If \(P\in\Delta\), \(Q\in\Delta\), and \(W(P) + W(Q) < 1\), where \(W(P)\) is the sum (weight) of the probabilities of \(P\) and similarly for \(W(Q)\), we have \[ H(P \cup Q) = [W(P)H(P) + W(Q) H(Q)] / [W(P) + W (Q)], \] where \(P \cup Q = (p_1, p_2, \ldots, p_m, q_1, q_2, \ldots, q_n)\) if \(P = (p_1, p_2,\ldots, p_m)\) and \(Q = (q_1, q_2, \ldots, q_n)\).

Postulate 5 may be called the mean-value property of entropy.

The question arises what other quantity is obtained if we replace in postulate 5 the arithmetic mean by some other mean value related to a Kolmogorov-Nagumo function \(g(x)\). The postulate 5 is so replaced by the

Postulate 5’: There exists a strictly monotonic and continuous function \(g(x)\) such that if \(P\in\Delta\), \(Q\in\Delta\), and \(W(P) + W(Q) \le 1\), we have \[ H(P \cup Q) = g^{-1}([W(P) g(H(P)) + W(Q) g(H(Q))]/[W(P) + W(Q)]). \] Clearly, if \(g(x) = ax + b\) with \(a\ne 0\), the postulate 5’ reduces to 5.

Another choice of \(g(x)\) which is compatible with postulate 4 is the following: \[ g(x) = g_\alpha(x) = 2^{(\alpha-1)x}\quad\text{with }\alpha > 0,\ \alpha\ne 1. \] Then postulates 1, 2, 3, 4, and 5’ characterize the “entropy of order \(\alpha\)”: \[ H(P) = H_\alpha(P) = \frac1{\alpha-1}\log_2\left(\sum_{k=1}^n p_k^\alpha\Bigl / \sum_{k=1}^n p_k\right). \]

Note that \(H_1(P)\) is a limiting case of \(H_\alpha(P)\) for \(\alpha\to 1\). In the case \(P = (p_1, p_2,\ldots, p_n)\) is absolutely continuous with respect to \(Q = (q_1, q_2,\ldots, q_n)\), “the information of order \(\alpha\) obtained if the generalized distribution \(P\) is replaced by the generalized distribution \(Q\)” is defined as follows:

\[ I_\alpha(Q \vert P) = \frac1{\alpha-1} \log_2\left(\sum_{k=1}^n \frac{q_k^\alpha}{p_k^{\alpha-1}}\Bigl / \sum_{k=1}^n p_k\right),\ \alpha\ne 1. \]

If \(\alpha\to 1\) we obtain the “information of order 1”

\[ I_1(Q \vert P) = \lim_{\alpha\to 1} I_\alpha(Q \vert P) = \sum_{k=1}^n q_k \log_2\frac{q_k}{p_k}\Bigl / \sum_{k=1}^n q_k, \] which coincides with the well-known quantity (generalized entropy) in the case \(\sum_{k=1}^n p_k = 1\).

Similarly to \(H_\alpha(P)\), the quantity \(I'_\alpha(Q \vert P)\) is characterized by a set of five postulates such that the Kolmogorov-Nagumo function \(g(x)\) involved in the mean value property postulate (analogous to postulate 5’) is necessarily either a linear or an exponential function, leading to \(I_1(Q \vert P)\) or \(I_\alpha(Q \vert P)\), respectively.

In the last section of the paper, the author gives as an application of these concepts an information-theoretical proof of a limit theorem on Markov chains following the idea due to Yu. V. Linnik [Theor. Probab. Appl. 4, 288–299 (1960); translation from Teor. Veroyatn. Primen. 4, 311–321 (1959; Zbl 0097.13103)].

Reviewer: Alfred Pérez

##### MSC:

94A17 | Measures of information, entropy |

62B10 | Statistical aspects of information-theoretic topics |

60F99 | Limit theorems in probability theory |