zbMATH — the first resource for mathematics

Introduction to digital speech processing. (English) Zbl 1162.94003
Digital Speech Processing is a well established field in communication engineering since some decades. It includes important technologies like speech coding, speech synthesis, and speech recognition, which are basing on a number of fundamental concepts of representing and processing the information of the speech signal. This textbook provides a short but rather complete reference of these methods. After a short introduction in the most essential applications and some fundamentals of modelling the speech signal, the authors summarize psychoacoustic material so far, as it is relevant in speech processing. This includes, of course, the non-linear effects of loudness and pitch perception, as well as the concept of critical bands and masking. Then, the description of signal processing aspects starts with a discussion of the non-stationary nature of the speech signal which enforces the introduction of frame-wise processing. The “short time” features which are computable frame by frame include signal energy, zero-crossing rate, autocorrelation function, and spectrum. The spectral description of the speech signal is introduced as the Short-Time Fourier Transform which is then sampled in time and frequency domain. The concept of the cepstrum (the Fourier transform of the logarithmic spectrum) is very essential for the frame-wise processing of speech, too. The related algorithms are described in more detail, and the important application areas of the cepstrum in pitch detection and speech recognition are discussed. Finally, the Linear Prediction Analysis is discussed as a powerful means of speech modelling. The last three chapters are devoted to the main applications of the presented algorithms. Speech coding is discussed starting with simple signal quantization and leading to powerful coding schemes which utilize parametric speech models. Speech synthesis is treated by discussing the different components of a text-to-speech (TTS) system. Speech recognition is introduced by summarizing the different steps which are required for training and applying a recognizer. All applications are finally discussed with respect to their future development.

94A12 Signal theory (characterization, reconstruction, filtering, etc.)
94-01 Introductory exposition (textbooks, tutorial papers, etc.) pertaining to information and communication theory
68T10 Pattern recognition, speech recognition
Full Text: DOI