Communication theory of secrecy systems.

*(English)*Zbl 1200.94005In 1948, C. E. Shannon published his seminal paper ‘A mathematical theory of communication’ [Bell Syst. Tech. J. 27, 379–423, 623–656 (1948; Zbl 1154.94303)]
on information theory. His work, which was classified for some time, had been motivated in part by his research on secrecy coding during the second world war. In 1949, the companion paper ‘Communication theory of secrecy systems’ followed. Like Diffie and Hellman’s discovery of public-key cryptography, it is a key paper in the development of modern cryptography.
It is generally credited with transforming cryptography from an art into a science.
Shannon established the first general mathematical model of cryptography and developed the analysis of encryption methods by using the information-theoretic methods from ‘A Mathematical Theory of Communication’.
The basic question which he studied is: How much information about the plaintext and the secret key is contained in the ciphertext,
regardless of the amount of work necessary to obtain this information?
Complex block ciphers and public-key encryption
were not known at that time. So, his analysis was focused on the classical ciphers, such as substitution, transposition or Vigenère ciphers.

But his methods are still fundamental in the analysis of modern cryptosystems.

Today, we distinguish between two types of security in cryptography, computational security and information-theoretic (or unconditional) security. The security of many currently used cryptosystems, in particular that of all public-key cryptosystems, is based on the assumed infeasibility of solving a computational problem, such as factoring integers, with (necessarily limited) real-world resources. Since there is only evidence, but no mathematical proof for the hardness of the underlying computational problem, the security of such a system is conditional. It relies on an unproven intractability assumption.

Shannon’s information-theoretic model of security provides unconditional security, not dependent on the assumed hardness of a computational problem and the limited computational power of an adversary. The adversary may have unbounded computing power. Information-theoretic security is stronger than computational security, but it is usually less practical. However, there is a lot of successful and promising work in order to construct practical information-theoretically secure cryptosystems (see below).

A symmetric encryption scheme (or secrecy system) consists of an encryption map \({E}:{K \times M}\to{C}\) mapping a plaintext \(m \in M\) to a ciphertext (or cryptogram) \(c = E(k,m)\) by using a secret key \(k \in K\). You may consider it, as Shannon does, as a family \(\left(E_k \right)_{k \in K}\) of encryption transformations \(E_k : m \mapsto E(k,m)\). The sets \(M\), \(K\) and \(C\) are information sources: Each message \(m \in M\) is produced with an a priori probability \(\mathrm{prob}(m)\), and the keys \(k\) are randomly selected, according to some probability distribution. As a consequence, we also have a probability distribution on \(C\). The key concepts of information theory that are applied in the analysis of secrecy systems are entropy, redundancy, mutual information, equivocation and the like. The average amount of information associated with a message, measured in bits, is given by the entropy or uncertainty \(H(M)\),

\[ H(M) = - \sum_{m \in M} \mathrm{prob}(m) \cdot\log_2(\mathrm{prob}(m)). \]

The entropy is maximal if the probabilities are uniformly distributed. Then \(H(M) = \log_2(n)\), with \(n\) the number of elements in \(M\). The redundancy of an information source \(M\) is \(\log_2(n) - H(M)\). The redundancy (per character) of a natural language is \( \lim_{n \to\infty} {D_n}/{n}\), where \(D_n\) is the redundancy of sentences of length \(n\). The redundancy of normal English is about \(0.7\). Important in cryptanalysis is the conditional entropy \(H(M|C)\), also called equivocation. It is computed from the a posteriori probabilities \(\mathrm{prob}(m|c)\) (given the ciphertext \(c\), what is the probability of \(m\)?),

\[ H(M|C):=- \sum_{c \in C}\mathrm{prob}(c) \times \sum_{m \in M}\mathrm{prob}(m|c) \cdot \log_2(\mathrm{prob}(m|c)).\tag{*} \]

The equivocation is the average missing amount of information about \(m\), given a cryptogram \(c\). \(I(M;C) := H(M) - H(M|C)\) is called mutual information and measures the average amount of information about \(m\) obtained by learning \(c\).

Shannon’s 1949 paper is structured into three parts. In Part I, ‘Mathematical Structure of Secrecy Systems’, he describes the structure of a (symmetric) encryption scheme (see above). He restates Kerkhoff’s principle from the 19th century: “The enemy knows the system being used.” Widely used classical ciphers are presented, and basic features of secrecy systems are introduced.

Part II, ‘Theoretical Secrecy’, is the core of the paper. Shannon studies the information-theoretic security of encryption schemes against ciphertext-only attacks: “How immune is a system to cryptanalysis when the cryptanalyst has unlimited time and manpower available for the analysis of cryptograms?”

Perfect secrecy of an encryption scheme is defined, which means that intercepting the ciphertext \(c\) does not increase the adversary’s information about the plaintext, i.e., the equivocation \(H(M|C)\) is equal to \(H(M)\). Perfect secrecy means unconditional security against eavesdroppers. It does not protect against active attackers who try to modify messages, without being detected. Shannon shows that the one-time pad, which was proposed by G.S. Vernam in 1917, provides perfect secrecy. Until 1949, the Vernam cipher was considered unbreakable, but it was not mathematically proven. Additionally, he proves the famous result that the Vernam cipher is optimal: Perfect secrecy requires a random secret key of at least the same length as the plaintext, or, more precisely, the entropy of the keys must not be less than the entropy of the messages, \(H(K) \geq H(M)\).

An adversary who has intercepted the ciphertext \(c\) should not be able to figure out the plaintext or key. The expected amount of information about the plaintext or key that he is missing is measured by the equivocations \(H(M|C)\) and \(H(K|C)\). Shannon studies these equivocations as functions of the length \(n\) of the messages. He shows, for example, that equivocation is non-increasing with \(n\), except for very small \(n\). For message spaces with redundancy (e.g. normal English) the equivocations approach 0 rapidly (depending on the cipher). The minimal \(n\) for which the equivocation \(H(K|C)\) and hence \(H(M|C)\) become 0 is called the unicity distance. Using the model of a random cipher, Shannon calculates the unicity distance as \({H(K)}/{D}\), with \(D\) the plaintext redundancy in bits per character. Encrypting normal English text by applying a random permutation of the 26 letters of the alphabet, the unicity distance is about 27. Hence, a ciphertext \(c\) of length \({> 27}\) is expected to have only one “solution”, i.e., only one plaintext \(m\) fits this ciphertext \(c\). The unicity distance is practically useful for the cryptanalysis of classical ciphers. For modern ciphers it is a more theoretical measure. Even if you know that there is only one solution for a given ciphertext, it is usually practically infeasible to compute this plaintext with real-world resources.

Shannon introduces the concept of a (strongly) ideal cipher, which means encryption schemes with infinite unicity distance or even constant equivocation. He explains how ideal ciphers can be approximated by removing the redundancy of the message space.

In Part III, ‘Practical Secrecy’, various aspects of practical cryptanalysis are studied. He discusses how the amount of work for decrypting ciphertexts might be kept large, even if the length of the ciphertexts exceeds the unicity distance. The principles of diffusion and confusion are introduced. Both principles are still applied in the design of modern block ciphers. Known-plaintext attacks (called “probable word analysis”) are explained as one of the most powerful tools for breaking a classical cipher.

With his 1949 paper, Shannon laid the foundation for many important concepts and methods in modern cryptography, in particular in the field of information-theoretic security. Here, information theory has applications such as unconditional security proofs or bounds and impossibility proofs for the achievability of unconditional security.

Important examples are public discussion key agreement protocols. Two parties, say Alice and Bob, agree on a provably perfect or almost perfect secret key \(K\) by communicating over a public channel that is intercepted by an eavesdropper, say Eve. Almost perfect means that, for some small \(\varepsilon \geq 0\),

\[ H(K) \geq \mathrm{bitlength}(K) - \varepsilon, \] i.e., \(K\) is almost uniformly distributed, and \[ I(K;\hbox{Eve's knowledge}) \leq \varepsilon, \] i.e., Eve has almost no information about the key. Using this key in a one-time pad yields unconditionally secure encryption.

In quantum key distribution, Alice and Bob use quantum communication. They typically transmit polarized photons over a fiber-optic channel and get a preliminary shared key. Due to quantum indeterminacy, measuring an unknown quantum state changes that state in a probabilistic way. This can be exploited in order to detect any eavesdropping (which necessarily involves measurement) and, more importantly, to calculate the amount of information that has been intercepted (measured in bits as a Shannon entropy). By applying privacy amplification, the remaining, non-intercepted bits of information can be extracted from the preliminary key in order to get a secure key.

Information-theoretically secure key agreement protocols were developed assuming that all parties have access to a public source of randomness. Alice and Bob can exploit Eve’s partial uncertainty about the exchanged messages, which results from her limited storage capacity (“Bounded Storage Model”) or from the unavoidable noise in the communication channel (“Noisy Channel Model”). As in quantum cryptography, by applying privacy amplification, Eve’s uncertainty can be converted into a provably secure key. There is an upper bound on the extractable secret key rate that can be obtained by public discussion protocols [Ueli M. Maurer, St. Wolf, SIAM J. Comput. 28, No. 5, 1689–1721 (1999; Zbl 1053.94014)].

Besides information-theoretic security, Shannon also strongly influenced important concepts of computational security. We give some examples.

The random cipher model – considering a cipher as a family of random functions – is used in the analysis of block ciphers. The idea of ideal ciphers is applied in order to improve the security of encryption algorithms. In schemes like OAEP the plaintext is randomly padded before being encrypted. This removes the redundancy.

The computational analogue of Shannon’s perfect secrecy is ciphertext-indistinguishability or the equivalent notion of semantic security. The difference is that the computing power of the adversary is restricted to polynomial resources. Perfectly secret stream ciphers can be approximated by using high quality pseudorandom key streams. Computationally perfect pseudorandom generators can be derived from one-way permutations with hard-core predicates (e.g. the Blum-Blum-Shub generator). The resulting stream ciphers are ciphertext-indistinguishable. This result is analogous to Shannon’s result on the Vernam cipher. There are public-key encryption schemes that are provably ciphertext-indistinguishable against adaptively-chosen-ciphertext attacks, e.g. Cramer-Shoup encryption.

But his methods are still fundamental in the analysis of modern cryptosystems.

Today, we distinguish between two types of security in cryptography, computational security and information-theoretic (or unconditional) security. The security of many currently used cryptosystems, in particular that of all public-key cryptosystems, is based on the assumed infeasibility of solving a computational problem, such as factoring integers, with (necessarily limited) real-world resources. Since there is only evidence, but no mathematical proof for the hardness of the underlying computational problem, the security of such a system is conditional. It relies on an unproven intractability assumption.

Shannon’s information-theoretic model of security provides unconditional security, not dependent on the assumed hardness of a computational problem and the limited computational power of an adversary. The adversary may have unbounded computing power. Information-theoretic security is stronger than computational security, but it is usually less practical. However, there is a lot of successful and promising work in order to construct practical information-theoretically secure cryptosystems (see below).

A symmetric encryption scheme (or secrecy system) consists of an encryption map \({E}:{K \times M}\to{C}\) mapping a plaintext \(m \in M\) to a ciphertext (or cryptogram) \(c = E(k,m)\) by using a secret key \(k \in K\). You may consider it, as Shannon does, as a family \(\left(E_k \right)_{k \in K}\) of encryption transformations \(E_k : m \mapsto E(k,m)\). The sets \(M\), \(K\) and \(C\) are information sources: Each message \(m \in M\) is produced with an a priori probability \(\mathrm{prob}(m)\), and the keys \(k\) are randomly selected, according to some probability distribution. As a consequence, we also have a probability distribution on \(C\). The key concepts of information theory that are applied in the analysis of secrecy systems are entropy, redundancy, mutual information, equivocation and the like. The average amount of information associated with a message, measured in bits, is given by the entropy or uncertainty \(H(M)\),

\[ H(M) = - \sum_{m \in M} \mathrm{prob}(m) \cdot\log_2(\mathrm{prob}(m)). \]

The entropy is maximal if the probabilities are uniformly distributed. Then \(H(M) = \log_2(n)\), with \(n\) the number of elements in \(M\). The redundancy of an information source \(M\) is \(\log_2(n) - H(M)\). The redundancy (per character) of a natural language is \( \lim_{n \to\infty} {D_n}/{n}\), where \(D_n\) is the redundancy of sentences of length \(n\). The redundancy of normal English is about \(0.7\). Important in cryptanalysis is the conditional entropy \(H(M|C)\), also called equivocation. It is computed from the a posteriori probabilities \(\mathrm{prob}(m|c)\) (given the ciphertext \(c\), what is the probability of \(m\)?),

\[ H(M|C):=- \sum_{c \in C}\mathrm{prob}(c) \times \sum_{m \in M}\mathrm{prob}(m|c) \cdot \log_2(\mathrm{prob}(m|c)).\tag{*} \]

The equivocation is the average missing amount of information about \(m\), given a cryptogram \(c\). \(I(M;C) := H(M) - H(M|C)\) is called mutual information and measures the average amount of information about \(m\) obtained by learning \(c\).

Shannon’s 1949 paper is structured into three parts. In Part I, ‘Mathematical Structure of Secrecy Systems’, he describes the structure of a (symmetric) encryption scheme (see above). He restates Kerkhoff’s principle from the 19th century: “The enemy knows the system being used.” Widely used classical ciphers are presented, and basic features of secrecy systems are introduced.

Part II, ‘Theoretical Secrecy’, is the core of the paper. Shannon studies the information-theoretic security of encryption schemes against ciphertext-only attacks: “How immune is a system to cryptanalysis when the cryptanalyst has unlimited time and manpower available for the analysis of cryptograms?”

Perfect secrecy of an encryption scheme is defined, which means that intercepting the ciphertext \(c\) does not increase the adversary’s information about the plaintext, i.e., the equivocation \(H(M|C)\) is equal to \(H(M)\). Perfect secrecy means unconditional security against eavesdroppers. It does not protect against active attackers who try to modify messages, without being detected. Shannon shows that the one-time pad, which was proposed by G.S. Vernam in 1917, provides perfect secrecy. Until 1949, the Vernam cipher was considered unbreakable, but it was not mathematically proven. Additionally, he proves the famous result that the Vernam cipher is optimal: Perfect secrecy requires a random secret key of at least the same length as the plaintext, or, more precisely, the entropy of the keys must not be less than the entropy of the messages, \(H(K) \geq H(M)\).

An adversary who has intercepted the ciphertext \(c\) should not be able to figure out the plaintext or key. The expected amount of information about the plaintext or key that he is missing is measured by the equivocations \(H(M|C)\) and \(H(K|C)\). Shannon studies these equivocations as functions of the length \(n\) of the messages. He shows, for example, that equivocation is non-increasing with \(n\), except for very small \(n\). For message spaces with redundancy (e.g. normal English) the equivocations approach 0 rapidly (depending on the cipher). The minimal \(n\) for which the equivocation \(H(K|C)\) and hence \(H(M|C)\) become 0 is called the unicity distance. Using the model of a random cipher, Shannon calculates the unicity distance as \({H(K)}/{D}\), with \(D\) the plaintext redundancy in bits per character. Encrypting normal English text by applying a random permutation of the 26 letters of the alphabet, the unicity distance is about 27. Hence, a ciphertext \(c\) of length \({> 27}\) is expected to have only one “solution”, i.e., only one plaintext \(m\) fits this ciphertext \(c\). The unicity distance is practically useful for the cryptanalysis of classical ciphers. For modern ciphers it is a more theoretical measure. Even if you know that there is only one solution for a given ciphertext, it is usually practically infeasible to compute this plaintext with real-world resources.

Shannon introduces the concept of a (strongly) ideal cipher, which means encryption schemes with infinite unicity distance or even constant equivocation. He explains how ideal ciphers can be approximated by removing the redundancy of the message space.

In Part III, ‘Practical Secrecy’, various aspects of practical cryptanalysis are studied. He discusses how the amount of work for decrypting ciphertexts might be kept large, even if the length of the ciphertexts exceeds the unicity distance. The principles of diffusion and confusion are introduced. Both principles are still applied in the design of modern block ciphers. Known-plaintext attacks (called “probable word analysis”) are explained as one of the most powerful tools for breaking a classical cipher.

With his 1949 paper, Shannon laid the foundation for many important concepts and methods in modern cryptography, in particular in the field of information-theoretic security. Here, information theory has applications such as unconditional security proofs or bounds and impossibility proofs for the achievability of unconditional security.

Important examples are public discussion key agreement protocols. Two parties, say Alice and Bob, agree on a provably perfect or almost perfect secret key \(K\) by communicating over a public channel that is intercepted by an eavesdropper, say Eve. Almost perfect means that, for some small \(\varepsilon \geq 0\),

\[ H(K) \geq \mathrm{bitlength}(K) - \varepsilon, \] i.e., \(K\) is almost uniformly distributed, and \[ I(K;\hbox{Eve's knowledge}) \leq \varepsilon, \] i.e., Eve has almost no information about the key. Using this key in a one-time pad yields unconditionally secure encryption.

In quantum key distribution, Alice and Bob use quantum communication. They typically transmit polarized photons over a fiber-optic channel and get a preliminary shared key. Due to quantum indeterminacy, measuring an unknown quantum state changes that state in a probabilistic way. This can be exploited in order to detect any eavesdropping (which necessarily involves measurement) and, more importantly, to calculate the amount of information that has been intercepted (measured in bits as a Shannon entropy). By applying privacy amplification, the remaining, non-intercepted bits of information can be extracted from the preliminary key in order to get a secure key.

Information-theoretically secure key agreement protocols were developed assuming that all parties have access to a public source of randomness. Alice and Bob can exploit Eve’s partial uncertainty about the exchanged messages, which results from her limited storage capacity (“Bounded Storage Model”) or from the unavoidable noise in the communication channel (“Noisy Channel Model”). As in quantum cryptography, by applying privacy amplification, Eve’s uncertainty can be converted into a provably secure key. There is an upper bound on the extractable secret key rate that can be obtained by public discussion protocols [Ueli M. Maurer, St. Wolf, SIAM J. Comput. 28, No. 5, 1689–1721 (1999; Zbl 1053.94014)].

Besides information-theoretic security, Shannon also strongly influenced important concepts of computational security. We give some examples.

The random cipher model – considering a cipher as a family of random functions – is used in the analysis of block ciphers. The idea of ideal ciphers is applied in order to improve the security of encryption algorithms. In schemes like OAEP the plaintext is randomly padded before being encrypted. This removes the redundancy.

The computational analogue of Shannon’s perfect secrecy is ciphertext-indistinguishability or the equivalent notion of semantic security. The difference is that the computing power of the adversary is restricted to polynomial resources. Perfectly secret stream ciphers can be approximated by using high quality pseudorandom key streams. Computationally perfect pseudorandom generators can be derived from one-way permutations with hard-core predicates (e.g. the Blum-Blum-Shub generator). The resulting stream ciphers are ciphertext-indistinguishable. This result is analogous to Shannon’s result on the Vernam cipher. There are public-key encryption schemes that are provably ciphertext-indistinguishable against adaptively-chosen-ciphertext attacks, e.g. Cramer-Shoup encryption.

Reviewer: Hans Delfs (Nürnberg)