Handbook of data compression. With contributions by David Bryant. 5th ed.

*(English)*Zbl 1194.68115
London: Springer (ISBN 978-1-84882-902-2/hbk). xxii, 1359 p. (2010).

Springer now published the 5th edition of the 1359-pages Handbook of Data Compression, written by David Salomon and Giovanni Motta, with contributions by David Bryant.

Before describing the content, it seems worthwhile to reflect on some remarks of Giovanni Motta, made in the preface of the book:

“The aim of this book is to provide a quick reference for the subject of data compression. Judging by the size of the book, the ‘reference’ is certainly there, but what about ‘quick’? We believe that the following features make the book a quick reference:

\(\bullet\) The detailed index which constitutes 3% of the book.

\(\bullet\) The glossary. Most of the terms, concepts, and techniques discussed throughout the book appear also, albeit briefly, in the glossary.

\(\bullet\) The particular organization of the book. Data is compressed by removing redundancies in its original representation, and these redundancies depend on the type of data. Text, images, video, and audio all have different types of redundancies and are best compressed by different algorithms which in turn are based on different approaches. Thus, the book is organized by different data types, with individual chapters devoted to image, video, and audio compression techniques. Some approaches to compression, however, are general and work well on many different types of data, which is why the book also has chapters on variable-length codes, statistical methods, dictionary-based methods, and wavelet methods.”

The book can be used as a quick reference. It can also be used to learn about the most important issues of approaches to and techniques of data compression, specific algorithms and their implementation focus and applications. Each of the 11 chapters as well as the appendix contain some exercises. Answers to exercises are given between Appendix and Bibliography. The bibliography is very helpful in order to find references to specific subjects. The book is aimed at readers that have general knowledge of computer applications, binary data, and files. As Giovanni Motta writes:

“The book is not for dummies, nor is it a guide to implementers.” The broad coverage makes the value of the book as broad as practically possible, and makes it a guide for ‘backgrounded’ readers who want to understand how different types of data can be compressed. If through this guide one finds out that a particular compression algorithm should be implemented in a given situation, then one should then better rely on the original publications by the creator of the algorithm, as advised by the authors of the reviewed book, to carry out the implementation successfully.

If a reader is interested in the field, the book is warmly recommended. Studying it, you can feel that the authors have a lot of experience and deep understanding of the field; the explanations are given in a clear language. Illustrations and tables of data help to clarify the text. The exercises help to find the kernel of the particular subject.

The preface contains a list of what is new compared to the 4th edition; those earlier editions were published under the title “Data compression. The complete reference” (see Zbl 0956.68036; Zbl 0960.68040; Zbl 1049.68061; Zbl 1129.68035).

The following subjects are covered:

The Introduction contains some information on compression benchmarks, ‘How to Hide Data’, compression curiosities, usefulness of irreversible compression.

Chapter 1, Basic Techniques: intuitive compression; run-length encoding; RLE text compression; RLE image compression; move-to-front coding; scalar quantization; recursive range reduction.

Chapter 2, Basic VL Codes: fixed- and variable-length codes; prefix codes; VLCs, entropy, and redundancy; universal codes; the Kraft-McMillan inequality; Tunstall code; Schlakwijk’s coding; Tjalkens-Willems V-to-B coding; phased-in codes; redundancy feedback (RF) coding; recursive phased-in codes; self-delimiting codes. The old Section 2.9 is completely rewritten.

Chapter 3, Advanced VL Codes: VLCs for integers; start-step-stop codes; start/stop codes; Elias codes; RBUC, recursive bottom-up coding; Levenstein code; Even-Rodeh code; punctured Elias codes; other prefix codes; ternary comma code; location based encoding (LBE); Stout codes; Boldi-Vigna (\(\zeta\)) codes; Yamamoto’s recursive code; VLCs and search trees; Taboo codes; Wang’s flag code; Yamamoto flag code; number bases; Fibonacci code; generalized Fibonacci code; Goldbach codes; additive codes; Golomb code; Rice code; subexponential code; codes ending with ‘1’; interpolative coding.

Chapter 4, Robust VL Codes: codes for error control; free distance; synchronous prefix codes; resynchronizing Huffman codes; bidirectional codes; symmetric codes; VLEC codes.

Chapter 5, Statistical Methods: Shannon-Fano coding; Huffman coding; adaptive Huffman coding; MNP5; MNP7; reliability; facsimile compression; PK font compression; arithmetic coding; adaptive arithmetic coding; QM coder; text compression; Hutter prize; PPM; PAQ; context-tree weighting.

Chapter 6, Dictionary Methods: string compression; simple dictionary compression; LZ77 (sliding window); LZSS; LZPP; repetition times; QIC-122; LZX; LZ78; LZFG; LZRW1; LZRW4; LZW; UNIX compression (LZC); LZMW; LZAP; LZJ; LZY; LZP; repetition finder; GIF images; RAR and WinRAR; V.42bis protocol; various LZ applications; deflate: Zip and Gzip; LZMA and 7-Zip; PNG; XML compression: XMill; EXE compressors; off-line dictionary-based compression; DCA compression with antidictionaries; CRC; data compression patents; a unification.

Chapter 7, Image Compression: pixels; image types; approaches to image compression; intuitive methods; image transforms; orthogonal transforms; Discrete Cosine Transform; test images; JPEG; JPEG-LS; adaptive linear prediction and classification; progressive image compression; JBIG; JBIG2; simple images: EIDAC; block matching; grayscale LZ image compression; vector quantization; adaptive vector quantization; block truncating coding; context-based methods; FELICS; MLP; adaptive Golomb; PPPM; CALIC; differential lossless compression; DPCM; context-tree weighting; block decomposition; binary tree predictive coding; quadtrees; quadrisection; space-filling curves; Hilbert scan and VQ; finite automata methods; iterated function systems; spatial prediction; cell encoding.

Chapter 8, Wavelet Methods: Fourier Transform; frequency domain; the uncertainty principle; Fourier image compression; CWT and its inverse; Haar Transform; filter banks; DWT; multiresolution decomposition; various image decompositions; lifting scheme; IWT; Laplacian pyramid; SPIHT; CREW; EZW; DjVu; WSQ fingerprint compression; JPEG 2000.

Chapter 9, Video Compression: analog video; composite and components video; digital video; history of video compression; video compression; MPEG; MPEG-4; H.261; H.264; H.264/AVC scalable video coding; VC-1.

Chapter 10, Audio Compression: sound; digital audio; the human auditory system; WAVE audio format; \(\mu\)-Law and A-Law companding; ADPCM audio compression; MLP audio; speech compression; shorten; FLAC; WavPack; monkey’s audio; MPEG-4 audio lossless coding (ALS); MPEG-1/2 audio layers; advanced audio coding (AAC); Dolby AC-3.

Chapter 11, Other Methods: Burrows-Wheeler method; symbol ranking; ACB; sort-based context similarity; sparse strings; word-based text compression; textual image compression; dynamic Markov coding; FHM curve compression; Sequitur; triangle mesh compression: Edgebreaker; SCSU Unicode compression; portable document format (PDF); file differencing; hyperspectral data compression; Stuffit.

Appendix: Information Theory.

Before describing the content, it seems worthwhile to reflect on some remarks of Giovanni Motta, made in the preface of the book:

“The aim of this book is to provide a quick reference for the subject of data compression. Judging by the size of the book, the ‘reference’ is certainly there, but what about ‘quick’? We believe that the following features make the book a quick reference:

\(\bullet\) The detailed index which constitutes 3% of the book.

\(\bullet\) The glossary. Most of the terms, concepts, and techniques discussed throughout the book appear also, albeit briefly, in the glossary.

\(\bullet\) The particular organization of the book. Data is compressed by removing redundancies in its original representation, and these redundancies depend on the type of data. Text, images, video, and audio all have different types of redundancies and are best compressed by different algorithms which in turn are based on different approaches. Thus, the book is organized by different data types, with individual chapters devoted to image, video, and audio compression techniques. Some approaches to compression, however, are general and work well on many different types of data, which is why the book also has chapters on variable-length codes, statistical methods, dictionary-based methods, and wavelet methods.”

The book can be used as a quick reference. It can also be used to learn about the most important issues of approaches to and techniques of data compression, specific algorithms and their implementation focus and applications. Each of the 11 chapters as well as the appendix contain some exercises. Answers to exercises are given between Appendix and Bibliography. The bibliography is very helpful in order to find references to specific subjects. The book is aimed at readers that have general knowledge of computer applications, binary data, and files. As Giovanni Motta writes:

“The book is not for dummies, nor is it a guide to implementers.” The broad coverage makes the value of the book as broad as practically possible, and makes it a guide for ‘backgrounded’ readers who want to understand how different types of data can be compressed. If through this guide one finds out that a particular compression algorithm should be implemented in a given situation, then one should then better rely on the original publications by the creator of the algorithm, as advised by the authors of the reviewed book, to carry out the implementation successfully.

If a reader is interested in the field, the book is warmly recommended. Studying it, you can feel that the authors have a lot of experience and deep understanding of the field; the explanations are given in a clear language. Illustrations and tables of data help to clarify the text. The exercises help to find the kernel of the particular subject.

The preface contains a list of what is new compared to the 4th edition; those earlier editions were published under the title “Data compression. The complete reference” (see Zbl 0956.68036; Zbl 0960.68040; Zbl 1049.68061; Zbl 1129.68035).

The following subjects are covered:

The Introduction contains some information on compression benchmarks, ‘How to Hide Data’, compression curiosities, usefulness of irreversible compression.

Chapter 1, Basic Techniques: intuitive compression; run-length encoding; RLE text compression; RLE image compression; move-to-front coding; scalar quantization; recursive range reduction.

Chapter 2, Basic VL Codes: fixed- and variable-length codes; prefix codes; VLCs, entropy, and redundancy; universal codes; the Kraft-McMillan inequality; Tunstall code; Schlakwijk’s coding; Tjalkens-Willems V-to-B coding; phased-in codes; redundancy feedback (RF) coding; recursive phased-in codes; self-delimiting codes. The old Section 2.9 is completely rewritten.

Chapter 3, Advanced VL Codes: VLCs for integers; start-step-stop codes; start/stop codes; Elias codes; RBUC, recursive bottom-up coding; Levenstein code; Even-Rodeh code; punctured Elias codes; other prefix codes; ternary comma code; location based encoding (LBE); Stout codes; Boldi-Vigna (\(\zeta\)) codes; Yamamoto’s recursive code; VLCs and search trees; Taboo codes; Wang’s flag code; Yamamoto flag code; number bases; Fibonacci code; generalized Fibonacci code; Goldbach codes; additive codes; Golomb code; Rice code; subexponential code; codes ending with ‘1’; interpolative coding.

Chapter 4, Robust VL Codes: codes for error control; free distance; synchronous prefix codes; resynchronizing Huffman codes; bidirectional codes; symmetric codes; VLEC codes.

Chapter 5, Statistical Methods: Shannon-Fano coding; Huffman coding; adaptive Huffman coding; MNP5; MNP7; reliability; facsimile compression; PK font compression; arithmetic coding; adaptive arithmetic coding; QM coder; text compression; Hutter prize; PPM; PAQ; context-tree weighting.

Chapter 6, Dictionary Methods: string compression; simple dictionary compression; LZ77 (sliding window); LZSS; LZPP; repetition times; QIC-122; LZX; LZ78; LZFG; LZRW1; LZRW4; LZW; UNIX compression (LZC); LZMW; LZAP; LZJ; LZY; LZP; repetition finder; GIF images; RAR and WinRAR; V.42bis protocol; various LZ applications; deflate: Zip and Gzip; LZMA and 7-Zip; PNG; XML compression: XMill; EXE compressors; off-line dictionary-based compression; DCA compression with antidictionaries; CRC; data compression patents; a unification.

Chapter 7, Image Compression: pixels; image types; approaches to image compression; intuitive methods; image transforms; orthogonal transforms; Discrete Cosine Transform; test images; JPEG; JPEG-LS; adaptive linear prediction and classification; progressive image compression; JBIG; JBIG2; simple images: EIDAC; block matching; grayscale LZ image compression; vector quantization; adaptive vector quantization; block truncating coding; context-based methods; FELICS; MLP; adaptive Golomb; PPPM; CALIC; differential lossless compression; DPCM; context-tree weighting; block decomposition; binary tree predictive coding; quadtrees; quadrisection; space-filling curves; Hilbert scan and VQ; finite automata methods; iterated function systems; spatial prediction; cell encoding.

Chapter 8, Wavelet Methods: Fourier Transform; frequency domain; the uncertainty principle; Fourier image compression; CWT and its inverse; Haar Transform; filter banks; DWT; multiresolution decomposition; various image decompositions; lifting scheme; IWT; Laplacian pyramid; SPIHT; CREW; EZW; DjVu; WSQ fingerprint compression; JPEG 2000.

Chapter 9, Video Compression: analog video; composite and components video; digital video; history of video compression; video compression; MPEG; MPEG-4; H.261; H.264; H.264/AVC scalable video coding; VC-1.

Chapter 10, Audio Compression: sound; digital audio; the human auditory system; WAVE audio format; \(\mu\)-Law and A-Law companding; ADPCM audio compression; MLP audio; speech compression; shorten; FLAC; WavPack; monkey’s audio; MPEG-4 audio lossless coding (ALS); MPEG-1/2 audio layers; advanced audio coding (AAC); Dolby AC-3.

Chapter 11, Other Methods: Burrows-Wheeler method; symbol ranking; ACB; sort-based context similarity; sparse strings; word-based text compression; textual image compression; dynamic Markov coding; FHM curve compression; Sequitur; triangle mesh compression: Edgebreaker; SCSU Unicode compression; portable document format (PDF); file differencing; hyperspectral data compression; Stuffit.

Appendix: Information Theory.

Reviewer: Waltraud Gerhardt (Delft)

##### MSC:

68P30 | Coding and information theory (compaction, compression, models of communication, encoding schemes, etc.) (aspects in computer science) |

68-00 | General reference works (handbooks, dictionaries, bibliographies, etc.) pertaining to computer science |

68P05 | Data structures |

68U10 | Computing methodologies for image processing |