# zbMATH — the first resource for mathematics

Automatic speech recognition using a predictive echo state network classifier. (English) Zbl 1132.68663
Summary: We have combined an Echo State Network (ESN) with a competitive state machine framework to create a classification engine called the predictive ESN classifier. We derive the expressions for training the predictive ESN classifier and show that the model was significantly more noise robust compared to a hidden Markov model in noisy speech classification experiments by $$8\pm 1$$ dB signal-to-noise ratio. The simple training algorithm and noise robustness of the predictive ESN classifier make it an attractive classification engine for automatic speech recognition.

##### MSC:
 68T10 Pattern recognition, speech recognition 68T05 Learning and adaptive systems in artificial intelligence
hmm; LSTM
Full Text:
##### References:
 [1] Atal, B.S., Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification, Journal of the acoustical society of America, 55, 6, 1304-1312, (1974) [2] Atiya, A.F.; Parlos, A.G., New results on recurrent network training: unifying the algorithms and accelerating convergence, Institute of electrical and electronics engineers transactions on neural networks, 11, 3, 697-709, (2000) [3] Bengio, Y.; De Mori, R.; Gori, M., Learning the dynamic nature of speech with back-propagation for sequences, Pattern recognition letters, 13, 5, 375-385, (1992) [4] Bishop, C.M., Neural networks for pattern recognition, (1995), Oxford University Press New York, NY [5] Bourlard, H.A.; Morgan, N., Connectionist speech recognition: A hybrid approach, (1993), Kluwer Academic Publishers Norwell, MA [6] Cover, T.M., Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition, Institute of electrical and electronics engineers transactions on electronic computers, EC-14, 3, 326-334, (1965) · Zbl 0152.18206 [7] Deng, L.; Droppo, J.; Acero, A., Dynamic compensation of hmm variances using the feature enhancement uncertainty computed from a parametric model of speech distortion, Institute of electrical and electronics engineers transactions on speech and audio processing, 13, 3, 412-421, (2005) [8] Doddington, G.R.; Schalk, T.B., Speech recognition: turning theory to practice, Institute of electrical and electronics engineers spectrum, 26-32, (1981) [9] Elman, J.L.; Zipser, D., Learning the hidden structure of speech, Journal of the acoustical society of America, 83, 4, 1615-1626, (1988) [10] Ephraim, Y.; Van Trees, H.L., A signal subspace approach for speech enhancement, Institute of electrical and electronics engineers transactions on speech and audio processing, 3, 4, 251-266, (1995) [11] Franzini, M.; Lee, K.-F.; Waibel, A., (), 425-428 [12] Furui, S., Cepstral analysis technique for automatic speaker verification, Institute of electrical and electronics engineers transactions on acoustics, speech, and signal processing, 29, 2, 254-272, (1981) [13] Gish, H., (), 1361-1364 [14] Gong, Y., Speech recognition in noisy environments: A survey, Speech communication, 16, 261-291, (1995) [15] Graves, A.; Schmidhuber, J., Framewise phoneme classification with bidirectional LSTM and other neural network architectures, Neural networks, 18, 5-6, 602-610, (2005) [16] Haykin, S., Neural networks: A comprehensive foundation, (1999), Prentice Hall Upper Saddle River, NJ · Zbl 0934.68076 [17] Haykin, S., Adaptive filter theory, (2001), Prentice Hall Upper Saddle River, NJ [18] Haykin, S., Signal processing in a nonlinear, non-Gaussian and nonstationary world, (), 43-53 [19] Hirsch, H. G., & Pearce, D. (2000). The Aurora experimental framework for the performance evaluation of speech recognition systems under noise conditions. In Proceedings of the international speech communications association tutorial and research workshop (pp. 181-188) [20] Hopkins, W. G. (2007). A new view of statistics. Internet society for sport science. http://www.sportsci.org/resource/stats/, January 27, 2007 [21] Iso, K.; Watanabe, T., (), 441-444 [22] Iso, K.; Watanabe, T., (), 57-60 [23] Jacobs, R.A.; Jordan, M.I.; Nowlan, S.J.; Hinton, G.E., Adaptive mixtures of local experts, Neural computation, 3, 1, 79-87, (1991) [24] Jaeger, H. (2001). The “echo state” approach to analysing and training recurrent neural networks. Tech. Rep. Fraunhofer Institute for Autonomous Intelligent Systems: German National Research Center for Information Technology (GMD Report 148) [25] Jaeger, H., Adaptive nonlinear system identification with echo state networks, (), 593-600 [26] Jaeger, H. (2005). Reservoir riddles: Suggestions for echo state network research. In Proceedings of the international joint conference on neural networks (pp. 1460-1462) [27] Jaeger, H.; Haas, H., Harnessing nonlinearity: predicting chaotic systems and saving energy in wireless communication, Science, 304, 5667, 78-80, (2004) [28] Juang, B.-H.; Rabiner, L.R., The segmental K-means algorithm for estimating parameters of hidden Markov models, Institute of electrical and electronics engineers transactions on acoustics, speech, and signal processing, 38, 9, 1639-1641, (1990) · Zbl 0708.62076 [29] Levin, E., (), 433-436 [30] Murphy, K. (2007). Hidden Markov Model Toolbox for Matlab. URL: http://www.cs.ubc.ca/ murphyk/Software/HMM/hmm.html, January 28, 2007 [31] Ozturk, M. C., & Principe, J. C. (2005). Computing with transiently stable states. In Proceedings of the international joint conference on neural networks (pp. 1467-1472) [32] Petek, B., (), 3442-3445 [33] Prokhorov, D. (2005). Echo state networks: Appeal and challenges. In Proceedings of the international joint conference on neural networks (pp. 1463-1466) [34] Rabiner, L.R., A tutorial on hidden Markov models and selected applications in speech recognition, (), 267-296 [35] Rabiner, L.R.; Juang, B.H., Fundamentals of speech recognition, (1993), Prentice-Hall Englewood Cliffs, NJ · Zbl 0762.62036 [36] Robinson, A.J., An application of recurrent nets to phone probability estimation, Institute of electrical and electronics engineers transactions on neural networks, 5, 2, 298-305, (1994) [37] Skowronski, M.D.; Harris, J.G., Exploiting independent filter bandwidth of human factor cepstral coefficients in automatic speech recognition, Journal of the acoustical society of America, 116, 3, 1774-1780, (2004) [38] Skowronski, M.D.; Harris, J.G., Minimum Mean squared error time series classification using an echo state network prediction model, (), 3153-3156 [39] Skowronski, M. D., & Harris, J. G. (2007). Noise-robust automatic speech recognition using a discriminative echo state network. In International symposium on circuits and systems. New Orleans, LA, USA: Institute of Electrical and Electronics Engineers (in press) · Zbl 1132.68663 [40] Strope, B.; Alwan, A., A model of dynamic auditory perception and its application to robust word recognition, Institute of electrical and electronics engineers transactions on speech and audio processing, 5, 5, 451-464, (1997) [41] Tebelskis, J. (1995). Speech recognition using neural networks. Unpublished doctoral dissertation. Pittsburgh, PA, USA: Carnegie Mellon Univerity [42] Tebelskis, J.; Waibel, A., (), 437-440 [43] Wolf, A.; Swift, J.B.; Swinney, H.L.; Vastano, J.A., Determining Lyapunov exponents from a time series, Physica D, 16, 285-317, (1985) · Zbl 0585.58037 [44] Young, S., Jansen, J., Odell, J., Ollasen, D., & Woodland, P. (1995). The HTK Book (version 2.0). Cambridge, UK: Entropics Cambridge Research Lab [45] Zhu, Q., Stolcke, A., Chen, B. Y., & Morgan, N. (2004). Incorporating Tandem/HATs MLP features into SRI’s conversational speech recognition system. In Effective affordable reusable speech-to-text rich transcription Fall 2004 workshop. Palisades, NY: Defense Advanced Research Projects Agency
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.