×

Underdetermined separation of speech mixture based on sparse Bayesian learning. (English) Zbl 1400.62118

Summary: This paper describes a novel algorithm for underdetermined speech separation problem based on compressed sensing which is an emerging technique for efficient data reconstruction. The proposed algorithm consists of two steps. The unknown mixing matrix is firstly estimated from the speech mixtures in the transform domain by using \(K\)-means clustering algorithm. In the second step, the speech sources are recovered based on an autocalibration sparse Bayesian learning algorithm for speech signal. Numerical experiments including the comparison with other sparse representation approaches are provided to show the achieved performance improvement.

MSC:

62H12 Estimation in multivariate analysis
62F15 Bayesian inference
62H30 Classification and discrimination; cluster analysis (statistical aspects)

Software:

BSS Eval; PDCO
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Candes, E. J.; Romberg, J.; Tao, T., Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information, IEEE Transactions on Information Theory, 52, 2, 489-509, (2006) · Zbl 1231.94017 · doi:10.1109/tit.2005.862083
[2] Candes, E. J.; Wakin, M. B., An introduction to compressive sampling, IEEE Signal Processing Magazine, 25, 2, 21-30, (2008) · doi:10.1109/msp.2007.914731
[3] Baraniuk, R. G., Compressive sensing [lecture notes], IEEE Signal Processing Magazine, 24, 4, 118-121, (2007) · doi:10.1109/msp.2007.4286571
[4] Comon, P.; Jutten, C., Handbook of Blind Source Separation: Independentcomponent Analysis and Applications, (2010), Academic Press
[5] Pedersen, M. S.; Larsen, J.; Kjems, U.; Parra, L. C., A Survey of Convolutive Blind Source Separation Methods, (2007), New York, NY, USA: Springer, New York, NY, USA
[6] Hyvrinen, A.; Juhu, K.; Erkki, O., Independent Component Analysis, (2001), Wiley-Interscience
[7] Yilmaz, O.; Rickard, S., Blind separation of speech mixtures via time-frequency masking, IEEE Transactions on Signal Processing, 52, 7, 1830-1847, (2004) · Zbl 1369.94383 · doi:10.1109/tsp.2004.828896
[8] Araki, S.; Sawada, H.; Mukai, R.; Makino, S., Underdetermined blind sparse source separation for arbitrarily arranged multiple sensors, Signal Processing, 87, 8, 1833-1847, (2007) · Zbl 1186.94042 · doi:10.1016/j.sigpro.2007.02.003
[9] Reju, V. G.; Koh, S. N.; Soon, I. Y., Underdetermined convolutive blind source separation via time-frequency masking, IEEE Transactions on Audio, Speech and Language Processing, 18, 1, 101-116, (2010) · doi:10.1109/tasl.2009.2024380
[10] Sawada, H.; Araki, S.; Makino, S., Underdetermined convolutive blind source separation via frequency bin-wise clustering and permutation alignment, IEEE Transactions on Audio, Speech and Language Processing, 19, 3, 516-527, (2011) · doi:10.1109/TASL.2010.2051355
[11] Bao, G.; Ye, Z.; Xu, X.; Zhou, Y., A compressed sensing approach to blind separation of speech mixture based on a two-layer sparsity model, IEEE Transactions on Audio, Speech and Language Processing, 21, 5, 899-906, (2013) · doi:10.1109/TASL.2012.2234110
[12] Xu, T.; Wang, W., A compressed sensing approach for underdetermined blind audio source separation with sparse representation, Proceedings of the IEEE/SP 15th Workshop on Statistical Signal Processing (SSP ’09) · doi:10.1109/ssp.2009.5278532
[13] Xu, T.; Wang, W., A block-based compressed sensing method for underdetermined blind speech separation incorporating binary mask, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ’10) · doi:10.1109/icassp.2010.5494935
[14] Wipf, D. P.; Rao, B. D., Sparse Bayesian learning for basis selection, IEEE Transactions on Signal Processing, 52, 8, 2153-2164, (2004) · Zbl 1369.94318 · doi:10.1109/TSP.2004.831016
[15] Zhao, L.; Wang, L.; Bi, G.; Zhang, L.; Zhang, H., Robust frequency-hopping spectrum estimation based on sparse bayesian method, IEEE Transactions on Wireless Communications, 14, 2, 781-793, (2015) · doi:10.1109/TWC.2014.2360191
[16] Zhao, L.; Wang, L.; Bi, G.; Li, S.; Yang, L.; Zhang, H., Structured sparsity-driven autofocus algorithm for high-resolution radar imagery, Signal Processing, 125, 376-388, (2016) · doi:10.1016/j.sigpro.2016.02.004
[17] Tropp, J. A.; Gilbert, A. C., Signal recovery from random measurements via orthogonal matching pursuit, IEEE Transactions on Information Theory, 53, 12, 4655-4666, (2007) · Zbl 1288.94022 · doi:10.1109/tit.2007.909108
[18] Chen, S. S.; Donoho, D. L.; Saunders, M. A., Atomic decomposition by basis pursuit, SIAM Journal on Scientific Computing, 20, 1, 33-61, (1998) · Zbl 0919.94002 · doi:10.1137/s1064827596304010
[19] Chen, S. S.; Donoho, D. L.; Saunders, M. A., Atomic decomposition by basis pursuit, SIAM Review, 43, 1, 129-159, (2001) · Zbl 0979.94010 · doi:10.1137/S003614450037906X
[20] Tipping, M., Sparse bayesian learning and the relevance vectormachine, Journal of Machine Learning Research, 1, 211-244, (2001) · Zbl 0997.68109
[21] Rubinstein, R.; Zibulevsky, M.; Elad, M., Double sparsity: learning sparse dictionaries for sparse signal approximation, IEEE Transactions on Signal Processing, 58, 3, 1553-1564, (2010) · Zbl 1392.94427 · doi:10.1109/tsp.2009.2036477
[22] Jafari, M. G.; Plumbley, M. D., Fast dictionary learning for sparse representations of speech signals, IEEE Journal on Selected Topics in Signal Processing, 5, 5, 1025-1031, (2011) · doi:10.1109/JSTSP.2011.2157892
[23] Aharon, M.; Elad, M.; Bruckstein, A., K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation, IEEE Transactions on Signal Processing, 54, 11, 4311-4322, (2006) · Zbl 1375.94040 · doi:10.1109/tsp.2006.881199
[24] van den Berg, E.; Friendlander, M. P., Probing the pareto frontier for basis pursuit solutions, UBC Computer Science Tech. Rep, TR-2008-01, (2008) · Zbl 1193.49033
[25] Vincent, E.; Arberet, S.; Gribonval, R.; Adali, T.; Jutten, C.; Romano, J. M. T.; Barros, A. K., Underdetermined instantaneous audio source separation via local gaussian modeling, Independent Component Analysis and Signal Separation. Independent Component Analysis and Signal Separation, Lecture Notes in Computer Science, 5441, 775-782, (2009), New York, NY, USA: Springer, New York, NY, USA · doi:10.1007/978-3-642-00599-2_97
[26] Xu, R.; Wunsch, D., Survey of clustering algorithms, IEEE Transactions on Neural Networks, 16, 3, 645-678, (2005) · doi:10.1109/TNN.2005.845141
[27] Xu, T.; Wang, W., Methods for learning adaptive dictionary in underdetermined speech separation, Proceedings of the 21st IEEE International Workshop on Machine Learning for Signal Processing (MLSP ’11) · doi:10.1109/mlsp.2011.6064610
[28] Babacan, S. D.; Molina, R.; Katsaggelos, A. K., Bayesian compressive sensing using Laplace priors, IEEE Transactions on Image Processing, 19, 1, 53-63, (2010) · Zbl 1371.94480 · doi:10.1109/tip.2009.2032894
[29] He, L.; Carin, L., Exploiting structure in wavelet-based Bayesian compressive sensing, IEEE Transactions on Signal Processing, 57, 9, 3488-3497, (2009) · Zbl 1391.94234 · doi:10.1109/tsp.2009.2022003
[30] Zhao, L.; Bi, G.; Wang, L.; Zhang, H., An improved auto-calibration algorithm based on sparse bayesian learning framework, IEEE Signal Processing Letters, 20, 9, 889-892, (2013) · doi:10.1109/LSP.2013.2272462
[31] Tzikas, D. G.; Likas, A. C.; Galatsanos, N. P., The variational approximation for Bayesian inference: Life after the EM algorithm, IEEE Signal Processing Magazine, 25, 6, 131-146, (2008) · doi:10.1109/MSP.2008.929620
[32] Jørgensen, B., Statistical Properties of the Generalized Inverse Gaussian distribution, (1982), New York, NY, USA: Springer, New York, NY, USA · Zbl 0486.62022
[33] Araki, S.; Nesta, F.; Vincent, E.; Koldovsky, Z.; Nolte, G.; Ziehe, A.; Benichoux, A., The 2011 signal separation evaluation campaign (sisec2011): -audio source separation, Proceedings of the International Conference on Latent Variable Analysis and Signal Separation ((LVA/ICA ’12), Springer
[34] Vincent, E.; Gribonval, R.; Févotte, C., Performance measurement in blind audio source separation, IEEE Transactions on Audio, Speech and Language Processing, 14, 4, 1462-1469, (2006) · doi:10.1109/TSA.2005.858005
[35] Fevotte, C.; Gribonval, R.; Vincent, E., BSS EVAL toolbox user guide, 1706, (2005), Rennes, France: IRISA, Rennes, France
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.