##
**Elements of information theory.**
*(English)*
Zbl 0762.94001

Wiley Series in Telecommunications. New York: John Wiley & Sons, Inc. xxii, 542 p. (1991).

The opening sentence of the book reads: “This is intended to be a simple and accessible book on information theory. As Einstein said: Everything should be made as simple as possible, but not simpler.” Having read the book my conclusion is that the authors very well succeeded in doing so.

Their definitions of the basic notions in Information Theory are accompanied by a lot of examples that make intuitively clear why these notions have to be defined in that way. The proofs of the basic theorems are very transparent and (of course this goes hand in hand) elegant. Furthermore the authors give a lot of connections and applications of results of Information Theory to other sciences (e.g. Probability Theory, Physics, Economy and Computer Science). I think the book could perfectly be used to give a course in Information Theory to students from different branches. Almost no mathematical background is required, however it helps when some knowledge on probability theory is present. The book contains a lot of nice exercises that encourage the student in mastering the basic techniques and applying results. Every chapter closes with a summary in which the main results of the chapter are restated. This is followed by the exercises and a kind of historical perspective stating who contributed to the development of the results and where they can be found for the first time.

In Chapter 1 an outline for the rest of the book is given. Chapter 2 develops the basic notions of Information Theory: Entropy, Relative and Mutual Information, Conditional Information. This is followed by some useful results such as chain rules and important inequalities stemming from concavity and convexity of certain functions. Then some implications to Thermodynamics and Probability Theory are discussed. The Chapter closes with Fano’s inequality.

Chapter 3 is about the Asymptotic Equipartition Property, that states that the set of sequences of possible appearances of random variables can be divided into two sets: The typical set of sequences that appear with probability tending to 1 and the non-typical set of sequences that are not likely to show up. The connections between entropy, data compression and typical sequences are explored.

Chapter 4 is concerned with the entropy rate of stochastic processes. The notion of Markov Chain is developed. The Chapter closes with some results on hidden Markov Models.

Chapter 5 is devoted to data compression. First some examples of source codes are given. Then the Kraft inequality is proved. After that optimal coding is being discussed and some bounds on the optimal codelength are derived. Shannon-Fano-Elias coding is the next subject of this Chapter, followed by a kind of “reverse” coding in which the object is to mimic a given probability distribution using a fair coin.

Chapter 6 explores data compression a little further and shows connections to gambling strategies. The Chapter closes with some remarks on the entropy of English text, and shows a number of “approximations” to English (up to fourth order in which every letter depends on the previous three).

Kolmogorov Complexity is the main topic in Chapter 7. This is the length of the shortest computer program that produces the sequence. Formally it is enough to take into account a Turing machine since this is a universal computer. It turns out that the Komogorov Complexity of sequences is not computable as a consequence of the halting problem. (It cannot be decided whether a program stops or continues forever, hence looking at all programs and finding which sequences they produce is impossible. Unfortunately this is the only way to find the Kolmogorov Complexity of sequences.)

Chapter 8 turns to quite another subject: that of Channel Capacity. First the notion of a Channel is developed and some examples are given. Then the Channel Capacity is introduced and investigated. Moreover the Error Probability is introduced. Next the Channel Coding Theorem is stated, exemplified and proved (of course the proof uses the already developed concept of typical sequences: The typical sequences are covered by codewords and the non-typical sequences produce a decoding error). Fano’s inequality is used to prove the Converse to the Coding Theorem. Shortly Coding Theory is touched upon (e.g. Hamming codes are explained). In Section 8.12 the influence of feedback is studied. Chapter 9 takes continuous random variables into consideration and the notion of differential entropy is developed. All kind of properties of discrete random variables have their counterparts for continuous random variables and this is about the essence of this Chapter.

The results of Chapter 9 are used in Chapter 10 where the Gaussian Channel is discussed. The Capacity for Gaussian Channels with a power constraint is derived and the Theory is pushed a little further in developing the counterparts of the results from Chapter 8 for the case of continuous random variables.

Chapter 11 focusses on Maximum Entropy and Spectral Estimation. The Entropy Rate for a Gaussian process is defined and a Maximum Entropy Theorem due to Burg is discussed for this case.

In Chapter 12 the implications of information theoretic results for Statistics are discussed. Subjects within this Chapter include: the Method of Types, the Law of Large Numbers, Large Deviation Theory, Hypothesis Testing (Neyman Pearson lemma), Chernoff Bound and Universal Coding (the purpose of which is to reach the entropy using coding techniques which do not depend on the probability distribution on the Source). The Chapter ends with Lempel-Ziv Coding, the notion of Fisher Information and the Cramer Rao Inequality.

In Chapter 13 our attention is drawn to Rate Distortion Theory. Chapter 14 discusses “Network Information Theory” sometimes called multi-user information theory. Subjects covered are: — Gaussian Multiple User Channels; — Discrete Multi Access Channels (for some of them the capacity region is determined) (for these two cases the following channels are investigated: the Multiple Access Channel, the Broadcast Channel, the Relay Channel, the inference channel and the two way channel); — Encoding of correlated sources (Slepian-Wolf theorem); — Coding with side information.

Chapter 15 is devoted to implications of information theoretic results to economics (the stock market). In Chapter 16 the basic and useful inequalities in Information Theory are recalled and collected. The book ends with giving a large bibliography.

Normally a book of this proportion contains a lot of misprints and errors, but the authors succeeded in keeping this to a minimum (the only more or less “serious” one is the omission of Reference [235]).

Since cryptography is a widely developing field these days it surprised me that no information theoretic applications to cryptography were discussed. Nevertheless I think that without risk I can recommended this book both to the expert and to the novice in Information theory.

Their definitions of the basic notions in Information Theory are accompanied by a lot of examples that make intuitively clear why these notions have to be defined in that way. The proofs of the basic theorems are very transparent and (of course this goes hand in hand) elegant. Furthermore the authors give a lot of connections and applications of results of Information Theory to other sciences (e.g. Probability Theory, Physics, Economy and Computer Science). I think the book could perfectly be used to give a course in Information Theory to students from different branches. Almost no mathematical background is required, however it helps when some knowledge on probability theory is present. The book contains a lot of nice exercises that encourage the student in mastering the basic techniques and applying results. Every chapter closes with a summary in which the main results of the chapter are restated. This is followed by the exercises and a kind of historical perspective stating who contributed to the development of the results and where they can be found for the first time.

In Chapter 1 an outline for the rest of the book is given. Chapter 2 develops the basic notions of Information Theory: Entropy, Relative and Mutual Information, Conditional Information. This is followed by some useful results such as chain rules and important inequalities stemming from concavity and convexity of certain functions. Then some implications to Thermodynamics and Probability Theory are discussed. The Chapter closes with Fano’s inequality.

Chapter 3 is about the Asymptotic Equipartition Property, that states that the set of sequences of possible appearances of random variables can be divided into two sets: The typical set of sequences that appear with probability tending to 1 and the non-typical set of sequences that are not likely to show up. The connections between entropy, data compression and typical sequences are explored.

Chapter 4 is concerned with the entropy rate of stochastic processes. The notion of Markov Chain is developed. The Chapter closes with some results on hidden Markov Models.

Chapter 5 is devoted to data compression. First some examples of source codes are given. Then the Kraft inequality is proved. After that optimal coding is being discussed and some bounds on the optimal codelength are derived. Shannon-Fano-Elias coding is the next subject of this Chapter, followed by a kind of “reverse” coding in which the object is to mimic a given probability distribution using a fair coin.

Chapter 6 explores data compression a little further and shows connections to gambling strategies. The Chapter closes with some remarks on the entropy of English text, and shows a number of “approximations” to English (up to fourth order in which every letter depends on the previous three).

Kolmogorov Complexity is the main topic in Chapter 7. This is the length of the shortest computer program that produces the sequence. Formally it is enough to take into account a Turing machine since this is a universal computer. It turns out that the Komogorov Complexity of sequences is not computable as a consequence of the halting problem. (It cannot be decided whether a program stops or continues forever, hence looking at all programs and finding which sequences they produce is impossible. Unfortunately this is the only way to find the Kolmogorov Complexity of sequences.)

Chapter 8 turns to quite another subject: that of Channel Capacity. First the notion of a Channel is developed and some examples are given. Then the Channel Capacity is introduced and investigated. Moreover the Error Probability is introduced. Next the Channel Coding Theorem is stated, exemplified and proved (of course the proof uses the already developed concept of typical sequences: The typical sequences are covered by codewords and the non-typical sequences produce a decoding error). Fano’s inequality is used to prove the Converse to the Coding Theorem. Shortly Coding Theory is touched upon (e.g. Hamming codes are explained). In Section 8.12 the influence of feedback is studied. Chapter 9 takes continuous random variables into consideration and the notion of differential entropy is developed. All kind of properties of discrete random variables have their counterparts for continuous random variables and this is about the essence of this Chapter.

The results of Chapter 9 are used in Chapter 10 where the Gaussian Channel is discussed. The Capacity for Gaussian Channels with a power constraint is derived and the Theory is pushed a little further in developing the counterparts of the results from Chapter 8 for the case of continuous random variables.

Chapter 11 focusses on Maximum Entropy and Spectral Estimation. The Entropy Rate for a Gaussian process is defined and a Maximum Entropy Theorem due to Burg is discussed for this case.

In Chapter 12 the implications of information theoretic results for Statistics are discussed. Subjects within this Chapter include: the Method of Types, the Law of Large Numbers, Large Deviation Theory, Hypothesis Testing (Neyman Pearson lemma), Chernoff Bound and Universal Coding (the purpose of which is to reach the entropy using coding techniques which do not depend on the probability distribution on the Source). The Chapter ends with Lempel-Ziv Coding, the notion of Fisher Information and the Cramer Rao Inequality.

In Chapter 13 our attention is drawn to Rate Distortion Theory. Chapter 14 discusses “Network Information Theory” sometimes called multi-user information theory. Subjects covered are: — Gaussian Multiple User Channels; — Discrete Multi Access Channels (for some of them the capacity region is determined) (for these two cases the following channels are investigated: the Multiple Access Channel, the Broadcast Channel, the Relay Channel, the inference channel and the two way channel); — Encoding of correlated sources (Slepian-Wolf theorem); — Coding with side information.

Chapter 15 is devoted to implications of information theoretic results to economics (the stock market). In Chapter 16 the basic and useful inequalities in Information Theory are recalled and collected. The book ends with giving a large bibliography.

Normally a book of this proportion contains a lot of misprints and errors, but the authors succeeded in keeping this to a minimum (the only more or less “serious” one is the omission of Reference [235]).

Since cryptography is a widely developing field these days it surprised me that no information theoretic applications to cryptography were discussed. Nevertheless I think that without risk I can recommended this book both to the expert and to the novice in Information theory.

Reviewer: H.J.Tiersma (Diemen)

### MSC:

94-01 | Introductory exposition (textbooks, tutorial papers, etc.) pertaining to information and communication theory |

94A15 | Information theory (general) |

94A17 | Measures of information, entropy |

94A34 | Rate-distortion theory in information and communication theory |

62B10 | Statistical aspects of information-theoretic topics |

68P30 | Coding and information theory (compaction, compression, models of communication, encoding schemes, etc.) (aspects in computer science) |