×

Marginal energy density over the low frequency range as a feature for voiced/non-voiced detection in noisy speech signals. (English) Zbl 1281.93093

Summary: In this paper, we present a Pseudo Wigner-Ville Distribution (PWVD) based novel method for the Voiced/Non-Voiced (V/NV) detection in noisy speech signals. The energy distribution of the speech signal on the time-frequency plane is obtained by computing the PWVD coefficients of the analytic speech signal over the Low Frequency Range (LFR). The Marginal Energy Density with respect to Time (MEDT) over the Low Frequency Range (LFR) derived from the energy distribution of the speech signal on the time-frequency plane is used as a feature to provide the instantaneous V/NV detection. The experimental results on speech signals from the CMU-Arctic database under white, babble and vehicular noise environments taken from the NOISEX-92 database at various Signal to Noise Ratio (SNR) are obtained to assess the performance of the proposed method. A significant performance improvement in the V/NV detection accuracy is obtained by the proposed method over the existing methods for the V/NV detection under the white noise and babble/vehicular noise environments, respectively.

MSC:

93E03 Stochastic systems in control theory (general)
68T10 Pattern recognition, speech recognition

Software:

WaveSurfer
PDFBibTeX XMLCite
Full Text: DOI

References:

[3] Kondoz, A. M., Digital SpeechCoding for Low Bit Rate Communication Systems (2004), Wiley: Wiley England
[4] Sircar, P.; Saini, R. K., Parametric modeling of speech by complex AM and FM signals, Digital Signal Processing, 17, 6, 1055-1064 (2007)
[5] Naylor, P. A.; Kounoudes, A.; Gudnason, J.; Brookes, M., Estimation of glottal closure instants in voiced speech using the DYPSA algorithm, IEEE Transactions on Audio, Speech and Language Processing, 15, 1, 34-43 (2007)
[6] Resch, B.; Nilsson, M.; Ekman, A.; Kleijn, W. B., Estimation of the instantaneous pitch of speech, IEEE Transactions on Audio, Speech and Language Processing, 15, 3, 813-822 (2007)
[11] Manfredi, C.; D’Aniello, M.; Bruscaglioni, P.; Ismaelli, A., A comparative analysis of fundamental frequency estimation methods with application to pathological voices, Medical Engineering and Physics, 22, 2, 135-147 (2000)
[15] Atal, B.; Rabiner, L., A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition, IEEE Transactions on Acoustics, Speech and Signal Processing, 24, 3, 201-212 (1976)
[18] Qi, Y.; Hunt, B. R., Voiced-unvoiced-silence classifications of speech using hybrid features and a network classifier, IEEE Transactions on Speech and Audio Processing, 1, 2, 250-255 (1993)
[19] Grenier, Y., Time-dependent ARMA modeling of nonstationary signals, IEEE Transactions on Acoustics, Speech and Signal Processing, 31, 4, 899-911 (1983)
[22] Fisher, E.; Tabrikian, J.; Dubnov, S., Generalized likelihood ratio test for voiced-unvoiced decision in noisy speech using the harmonic model, IEEE Transactions on Audio, Speech and Language Processing, 14, 2, 502-510 (2006)
[23] Murty, K. S.R.; Yegnanarayana, B.; Joseph, M. A., Characterization of glottal activity from speech signals, IEEE Signal Processing Letters, 16, 6, 469-472 (2009)
[24] Dhananjaya, N.; Yegnanarayana, B., Voiced/nonvoiced detection based on robustness of voiced epochs, IEEE Signal Processing Letters, 17, 3, 273-276 (2010)
[25] Classen, T. A.C. M.; Mecklenbrauker, W. F.G., The Wigner distributiona tool for time-frequency signal analysis—part 2: discrete-time signals, Philips Journal of Research, 35, 276-300 (1980) · Zbl 0474.94008
[26] Kadambe, S.; Boudreaux-Bartels, G. F., A comparison of the existence of cross terms in the Wigner distribution and the squared magnitude of the wavelet transform and the short-time Fourier transform, IEEE Transactions on Signal Processing, 40, 10, 2498-2517 (1992) · Zbl 0825.94054
[27] Pachori, R. B.; Sircar, P., A new technique to reduce cross terms in the Wigner distribution, Digital Signal Processing, 17, 2, 466-474 (2007)
[28] Ferguson, B. G.; Quinn, B. G., Application of the short-time Fourier transform and the Wigner-Ville distribution to the acoustic localization of aircraft, Journal of the Acoustical Society of America, 96, 2, 821-827 (1994)
[29] Hlawatsch, F.; Auger, F., Time-Frequency AnalysisConcepts and Methods (2008), Wiley: Wiley NJ
[30] Oppenheim, A. V.; Schafer, R. W., Discrete-Time Signal Processing (1989), Prentice Hall: Prentice Hall Englewood Cliffs, NJ · Zbl 0676.42001
[31] Potamianos, A.; Maragos, P., A comparison of the energy operator and the Hilbert transform approach to signal and speech demodulation, Signal Processing, 37, 1, 95-120 (1994) · Zbl 0806.94007
[32] Marple, L., Computing the discrete-time “analytic” signal via FFT, IEEE Transactions on Signal Processing, 47, 9, 2600-2603 (1999) · Zbl 0990.94502
[33] Loutridis, S. J., Instantaneous energy density as a feature for gear fault detection, Mechanical Systems and Signal Processing, 20, 5, 1239-1253 (2006)
[34] Jain, P.; Pachori, R. B., Time-order representation based method for epoch detection from speech signals, Journal of Intelligent Systems, 21, 1, 79-95 (2012)
[35] Deller, J. R.; Hansen, J. H.L.; Proakis, J. G., Discrete-Time Processing of Speech Signals (2011), Wiley-India: Wiley-India New Delhi
[38] Rabiner, L. R.; Schafer, R. W., Digital Processing of Speech Signals (2009), Pearson Education: Pearson Education India
[41] Ramirez, J.; Segura, J. C.; Benitez, C.; Torre, A.; Rubio, A. J., A new Kullback-Leibler VAD for speech recognition in noise, IEEE Signal Processing Letters, 11, 2, 266-269 (2004)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.