×

zbMATH — the first resource for mathematics

Monaural speech/music source separation using discrete energy separation algorithm. (English) Zbl 1197.94079
Summary: We address the problem of monaural source separation of a mixed signal containing speech and music components. We use Discrete Energy Separation Algorithm (DESA) to estimate frequency-modulating (FM) signal energy. The FM signal energy is used to design a time-varying filter in the time-frequency domain for rejecting the interfering signal. The FM signal energy was chosen due to its good ability to differentiate between speech and music signals using localized information both in time and frequency. We present experimental results which demonstrate the advantages and limitations of the proposed method using synthetic data and real audio signals.

MSC:
94A12 Signal theory (characterization, reconstruction, filtering, etc.)
Software:
BSS Eval
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Ozerov, A.; Philippe, P.; Bimbot, F.; Gribonval, R.: Adaptation of Bayesian models for single-channel source separation and its application to voice/music separation in popular songs, IEEE transactions on audio, speech, and language processing 15, No. 5, 1564-1578 (July 2007)
[2] L. Benaroya, F. Bimbot, Wiener based source separation with HMM/GMM using a single sensor, in: ICA 2003 Nara, Japan, April 2003, pp. 957–961.
[3] Roweis, S. T.: One microphone source separation, Advances in neural information processing systems (NIPS) 13, 793-799 (2001)
[4] F.R. Bach, M. I. Jordan, Blind one-microphone speech separation: a spectral learning approach, in: NIPS, Vancouver, 2004.
[5] M. Helén, T. Virtanen, Separation of drums from polyphonic music using non-negative matrix factorization and support vector machine, in: Proceedings of the 13th European Signal Processing Conference (EUSIPCO 2005), Turkey, 2005.
[6] T. Virtanen, Sound source separation using sparse coding with temporal continuity objective, in: International Computer Music Conference, ICMC, 2003.
[7] Teager, H. M.; Teager, S. M.: A phenomenological model for vowel production in the vocal tract, Speech science: recent advances, 73-109 (1985)
[8] Teager, H. M.; Teager, S. M.: Evidence for nonlinear sound production mechanisms in the vocal tract, Speech production and speech modeling 55, 241-261 (1989)
[9] J. Kaiser, On a simple algorithm to calculate the ’energy’ of a signal, in: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, April 1990, pp. 381–384.
[10] Maragos, P.; Kaiser, J.; Quatieri, T.: Energy separation in signal modulations with application to speech analysis, IEEE transactions on signal processing 41, No. 10, 3024-3051 (October 1993) · Zbl 0800.94135 · doi:10.1109/78.277799
[11] P. Maragos, T.F. Quatieri, J.F. Kaiser, Speech nonlinearities, modulations, and energy operators, in: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, 1991, pp. 421–424.
[12] Maragos, P.; Kaiser, J.; Quatieri, T.: On amplitude and frequency demodulation using energy operators, IEEE transactions on signal processing 41, No. 4, 1532-1550 (April 1993) · Zbl 0770.94003 · doi:10.1109/78.212729
[13] A. Potamianos, P. Maragos, Speech formant frequency and bandwidth tracking using multiband energy demodulation, in: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, May 1995, pp. 784–787.
[14] Potamianos, A.; Maragos, P.: Time–frequency distributions for automatic speech recognition, IEEE transactions on speech and audio processing 9, No. 3, 196-200 (March 2001)
[15] T. Thiruvaran, E. Ambikairajah, J. Epps, Speaker identification using FM features, in: Proceedings of 11th Australasian International Conference on Speech Science and Technology, Auckland, New Zealand, 2006, pp. 148–152.
[16] Dimitriadis, D. V.; Maragos, P.; Potamianos, A.: Robust AM–FM features for speech recognition, IEEE signal processing letters 12, No. 9, 621-624 (September 2005)
[17] C.R. Jankowski Jr., T.F. Quatieri, D.A. Reynolds, Measuring fine structure in speech: application to speaker identification, in: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, May 1995, pp. 325–328.
[18] Potamianos, A.; Maragos, P.: Speech analysis and synthesis using an AM–FM modulation model, Speech communication 28, No. 3, 195-209 (1999)
[19] R. Sussman, M. Kahrs, Analysis and resynthesis of musical instrument sounds using energy separation, in: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 2, May 1996, pp. 997–1000.
[20] T. Virtanen, A. Klapuri, Separation of harmonic sound sources using sinusoidal modeling, in: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 2, 2000, pp. II765–II768.
[21] Yilmaz, O.; Rickard, S.: Blind separation of speech mixtures via time–frequency masking, IEEE transactions on signal processing 52, No. 7, 1830-1847 (July 2004) · Zbl 1369.94383
[22] L. Atlas, C. Janssen, Coherent modulation spectral filtering for single-channel music source separation, in: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP)-05, vol. 4, March 2005, pp. iv/461–iv/464.
[23] S. Disch, B. Edler, Multiband perceptual modulation analysis, processing and synthesis of audio signals, in: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), April 2009, pp. 2305–2308.
[24] Y. Litvin, I. Cohen, D. Chazan, Separation of speech and music sources from a single-channel mixture using discrete energy separation algorithm, in: International Workshop on Acoustic Echo and Noise Control, IWAENC, 2010. · Zbl 1197.94079
[25] Duda, R. O.; Hart, P. E.; Stork, D. G.: Pattern classification, (2001) · Zbl 0968.68140
[26] R. Gribonval, L. Benaroya, E. Vincent, C. Févotte, Proposals for performance measurement in source separation, in: Proceedings of the 4th International Symposium on ICA and BSS (ICA2003), Nara, Japan, April 2003, pp. 763–768.
[27] C. Févotte, R. Gribonval, E. Vincent, BSS_EVAL toolbox user guide revision 2.0, Technical Report 1706, IRISA, Rennes, France, April 2005 [Online]. Available: \langlehttp://www.irisa.fr/metiss/bsseval/ angle.
[28] Vincent, E.; Gribonval, R.; Plumbley, M. D.: Oracle estimators for the benchmarking of source separation algorithms, Signal processing 87, No. 8, 1933-1950 (2007) · Zbl 1186.94354 · doi:10.1016/j.sigpro.2007.01.016
[29] Benaroya, L.; Bimbot, F.; Gribonval, R.: Audio source separation with a single sensor, IEEE transactions on audio, speech, and language processing 14, No. 1, 191-199 (January 2006)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.