A novel fuzzy clustering algorithm using observation weighting and context information for reverberant blind speech separation.

*(English)*Zbl 1177.94060Summary: Time-frequency masking has evolved as a powerful tool for tackling blind source separation problems. In previous work, mask estimation was performed with the help of well-known standard cluster algorithms. Spatial observation vectors, extracted from a set of microphones, were grouped into separate clusters, each representing a particular source. However, most off-the-shelf clustering methods are not very robust to outliers or noise in the data. This lack of robustness often leads to incorrect localization and partitioning results, particularly for reverberant speech mixtures. To address this issue, we investigate the use of observation weights and context information as means to improve the clustering performance under reverberant conditions. While the observation weights improve the localization accuracy by ignoring noisy observations, context information smoothes the cluster membership levels by exploiting the highly structured nature of speech signals in the time-frequency domain. In a number of experiments, we demonstrate the superiority of the proposed method over conventional fuzzy clustering, both in terms of localization accuracy as well as speech separation performance.

##### MSC:

94A12 | Signal theory (characterization, reconstruction, filtering, etc.) |

##### Keywords:

blind source separation; fuzzy clustering; reverberation; robustness; time-frequency masking; adaptive beamforming
PDF
BibTeX
XML
Cite

\textit{M. Kühne} et al., Signal Process. 90, No. 2, 653--669 (2010; Zbl 1177.94060)

Full Text:
DOI

##### References:

[1] | Cherry, E.: Some experiments on the recognition of speech, with one and with two ears, Journal of the acoustical society of America 25, No. 5, 975-979 (1953) |

[2] | Yilmaz, Ö.; Rickard, S.: Blind separation of speech mixtures via time–frequency masking, IEEE transactions on signal processing 52, No. 7, 1830-1847 (2004) · Zbl 1369.94383 |

[3] | J. Peterson, S. Kadambe, A probabilistic approach for blind source separation of underdetermined convolutive mixtures, in: International Conference on Multimedia and Expo, Baltimore, USA, 2003. |

[4] | S. Araki, H. Sawada, R. Mukai, S. Makino, Normalized observation vector clustering approach for sparse source separation, in: European Signal Processing Conference, Florence, Italy, 2006. · Zbl 1178.94108 |

[5] | H. Sawada, S. Araki, S. Makino, A two-stage frequency-domain blind source separation method for underdetermined convolutive mixtures, in: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, 2007. |

[6] | M. Mandel, D. Ellis, T. Jebara, An EM algorithm for localizing multiple sound sources in reverberant environments, in: Twentieth Annual Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 2006. |

[7] | R. Weiss, M. Mandel, D. Ellis, Source separation based on binaural cues and source model constraints, in: Interspeech, Brisbane, Australia, 2008. |

[8] | A. Jourjine, S. Rickard, Ö. Yilmaz, Blind separation of disjoint orthogonal signals: demixing N sources from 2 mixtures, in: IEEE International Conference on Acoustics, Speech, and Signal Processing, Istanbul, Turkey, 2000. |

[9] | T. Melia, S. Rickard, Underdetermined blind source separation in echoic environments using DESPRIT, EURASIP Journal on Advances in Signal Processing 2007 (2007) 19. · Zbl 1168.94427 · doi:10.1155/2007/86484 |

[10] | Roy, R.; Kailath, T.: Esprit–estimation of signal parameters via rotational invariance techniques, IEEE transactions on acoustics, speech and signal processing 37, No. 7, 984-995 (1989) · Zbl 0701.93090 |

[11] | Araki, S.; Sawada, H.; Mukai, R.; Makino, S.: Underdetermined blind sparse source separation for arbitrarily arranged multiple sensors, Signal processing 87, No. 8, 1833-1847 (2007) · Zbl 1186.94042 · doi:10.1016/j.sigpro.2007.02.003 |

[12] | Hartigan, J.; Wong, M.: A k-means clustering algorithm, Applied statistics 28, No. 1, 100-108 (1979) · Zbl 0447.62062 · doi:10.2307/2346830 |

[13] | S. Araki, H. Sawada, R. Mukai, S. Makino, DOA estimation for multiple sparse sources with normalized observation vector clustering, in: IEEE International Conference on Acoustics, Speech and Signal Processing, Toulouse, France, 2006. |

[14] | Kühne, M.; Togneri, R.; Nordholm, S.: Robust source localization in reverberant environments based on weighted fuzzy clustering, IEEE signal processing letters 16, No. 2, 85-88 (2009) |

[15] | O’grady, P.; Pearlmutter, B.: Soft-LOST: EM on a mixture of oriented lines, , 428-435 (2004) |

[16] | Li, S.: Markov random field modeling in image analysis, (2001) · Zbl 0978.68130 |

[17] | Ambroise, C.; Dang, V.; Govaert, G.: Clustering of spatial data by the EM algorithm, Geoenv I–geostatistics for environmental applications of quantitative geology and geostatistics 9, 493-504 (1997) |

[18] | A. Liew, S. Leung, W. Lau, Fuzzy image clustering incorporating spatial continuity, in: IEE Proceedings on Vision, Image and Signal Processing, vol. 147, 2000, pp. 185–192. |

[19] | Xia, Y.; Feng, D.; Wang, T.; Zhao, R.; Zhang, Y.: Image segmentation by clustering of spatial patterns, Pattern recognition letters 28, No. 12, 1548-1555 (2007) |

[20] | Chuang, K.; Tzeng, H.; Chen, S.; Wu, J.; Chen, T.: Fuzzy c-means clustering with spatial information for image segmentation, Computerized medical imaging and graphics 30, No. 1, 9-15 (2006) |

[21] | Pham, D.: Spatial models for fuzzy clustering, Computer vision and image understanding 84, 285-297 (2001) · Zbl 1033.68612 · doi:10.1006/cviu.2001.0951 |

[22] | Russ, J.: The image processing handbook, (1999) · Zbl 0931.68133 |

[23] | M. Togami, T. Sumiyoshi, A. Amano, Stepwise phase difference restoration method for sound source localization using multiple microphone pairs, in: IEEE International Conference on Acoustics, Speech and Signal Processing, Honolulu, USA, 2007. |

[24] | Mitianoudis, N.; Davies, M.: Audio source separation of convolutive mixtures, IEEE transactions on speech and audio processing 11, No. 5, 489-497 (2003) |

[25] | Theodoridis, S.; Koutroumbas, K.: Pattern recognition, (2006) · Zbl 1093.68103 |

[26] | L. Rabiner, W. Schafer, Digital Processing of Speech Signals, Signal Processing Series, Prentice-Hall, Englewood Cliffs, NJ, 1978. |

[27] | Bezdek, J.: Pattern recognition with fuzzy objective function algorithms, (1981) · Zbl 0503.68069 |

[28] | Rousseeuw, P.; Leroy, A.: Robust regression and outlier detection, probability and mathematical statistics, (1987) · Zbl 0711.62030 |

[29] | S. Miyamoto, R. Inokuchi, Y. Kuroda, Possibilistic and fuzzy c-means clustering with weighted objects, in: IEEE International Conference on Fuzzy Systems, Vancouver, Canada, 2006. |

[30] | Faller, C.; Merimaa, J.: Source localization in complex listening situations: selection of binaural cues based on interaural coherence, Journal of the acoustical society of America 116, No. 5, 3075-3089 (2004) |

[31] | Huang, J.; Ohnishi, N.; Sugie, N.: Sound localization in reverberant environment based on the model of the precedence effect, IEEE transactions on instrumentation and measurement 46, No. 4, 842-846 (1997) |

[32] | Litovsky, R.; Colburn, H.; Yost, W.; Guzman, S.: The precedence effect, Journal of the acoustical society of America 106, No. 4, 1633-1654 (1999) |

[33] | Kim, Y.; An, S.; Kil, R.: Zero-crossing based time-frequency masking for sound segregation, Neural information processing letters and reviews 10, No. 4–6, 125-134 (2006) |

[34] | Abrard, F.; Deville, Y.: A time–frequency blind signal separation method applicable to underdetermined mixtures of dependent sources, Signal processing 85, No. 7, 1389-1403 (2005) · Zbl 1148.94384 · doi:10.1016/j.sigpro.2005.02.010 |

[35] | S. Reeves, R. Mersereau, Regularization parameter estimation for iterative image restoration in a weighted Hilbert space, in: International Conference on Acoustics, Speech, and Signal Processing, Albuquerque, NM, USA, 1990. |

[36] | Reeves, S.: A cross-validation framework for solving image restoration problems, Journal of visual communication and image processing 3, No. 4, 433-445 (1992) |

[37] | Wang, D.: On ideal binary mask as the computational goal of auditory scene analysis, Speech separation by humans and machines, 181-197 (2005) |

[38] | Lehmann, E.; Johansson, A.: Prediction of energy decay in room impulse responses simulated with an image–source model, Journal of the acoustical society of America 124, No. 1, 269-277 (2008) |

[39] | R. Leonard, A database for speaker-independent digit recognition, in: IEEE International Conference on Acoustics, Speech, and Signal Processing, San Diego, CA, 1984. |

[40] | J. Garofolo, L. Lamel, W. Fisher, J. Fiscus, D. Pallett, N. Dahlgren, V. Zue, Timit acoustic–phonetic continuous speech corpus, Technical Report, Linguistic Data Consortium, 1993. |

[41] | C. Fèvotte, R. Gribonval, E. Vincent, BSS EVAL toolbox user guide, Technical Report 1706, IRISA Technical Report, 2005. |

[42] | Vincent, E.; Gribonval, R.; Fèvotte, C.: Performance measurement in blind audio source separation, IEEE transactions on audio, speech, and language processing 14, No. 4, 1462-1469 (2006) |

[43] | S. Araki, H. Sawada, R. Mukai, S. Makino, Blind sparse source separation with spatially smoothed time–frequency masking, in: International Workshop on Acoustic Echo and Noise Control, Paris, France, 2006. |

[44] | Aarabi, P.; Mavandadi, S.: Robust sound localization using conditional time–frequency histograms, Information fusion 4, No. 2, 111-122 (2003) |

[45] | J. Cermak, S. Araki, H. Sawada, S. Makino, Blind speech separation by combining beamformers and a time frequency binary mask, in: International Workshop on Acoustic Echo and Noise Control, Paris, France, 2006. |

[46] | Malonakis, D.; Ingle, V.; Kogon, S.: Statistical and adaptive signal processing, (2000) |

[47] | J. Cermak, S. Araki, H. Sawada, S. Makino, Blind source separation based on a beamformer array and time frequency binary masking, in: IEEE International Conference on Acoustics, Speech and Signal Processing, Honolulu, USA, 2007. |

[48] | S. Araki, H. Sawada, S. Makino, Blind speech separation in a meeting situation with maximum SNR beamformers, in: IEEE International Conference on Acoustics, Speech and Signal Processing, Honolulu, USA, 2007. |

[49] | Van Hoesel, R.; Clark, G.: Evaluation of a portable two-microphone adaptive beam-forming speech processor with cochlear implant patients, Journal of the acoustical society of America 97, 2498-2503 (1995) |

[50] | Weiss, M.: Use of an adaptive noise canceller as an input preprocessor for a hearing aid, Journal of rehabilitation research and development 24, 93-102 (1987) |

[51] | D. Kolossa, R. Orglmeister, Nonlinear postprocessing for blind speech separation, in: Fifth International Conference on Independent Component Analysis and Signal Separation, Granada, Spain, 2004. |

[52] | Fèvotte, C.; Godsill, S.: Blind separation of sparse sources using Jeffrey’s inverse prior and the EM algorithm, ICA 2006, lecture notes in computer science 3889, 593-600 (2006) · Zbl 1178.94069 · doi:10.1007/11679363 |

[53] | M. Dmour, M. Davies, Under-determined speech separation using GMM-based non-linear beamforming, in: European Signal Processing Conference, Lausanne, Switzerland, 2008. |

[54] | Y. Li, D. Wang, Musical sound separation using pitch-based labeling and binary time–frequency masking, in: IEEE International Conference on Acoustics, Speech, and Signal Processing, Las Vegas, USA, 2008. |

[55] | Cooke, M.; Green, P.; Josifovski, L.; Vizinho, A.: Robust automatic speech recognition with missing and unreliable acoustic data, Speech communication 34, No. 3, 267-285 (2001) · Zbl 1005.68756 · doi:10.1016/S0167-6393(00)00034-0 |

[56] | Hathaway, R.; Bezdek, J.; Hu, Y.: Generalized fuzzy c-means clustering strategies using LP norm distances, IEEE transactions on fuzzy systems 8, No. 5, 576-582 (2000) |

[57] | Kersten, P.: Fuzzy order statistics and their application to fuzzy clustering, IEEE transactions on fuzzy systems 7, No. 6, 708-712 (1999) |

[58] | D.-Q. Zhang, S.-C. Chen, Kernel based fuzzy and possibilistic c-means clustering, in: International Conference on Artificial Neural Networks, Istanbul, Turkey, 2003. |

[59] | A. Andreadis, G. Benelli, A. Garzelli, Detail-preserving segmentation of polarimetric SAR imagery, in: International Geoscience and Remote Sensing Symposium, Lincoln, USA, 1996. |

[60] | Gath, I.; Geva, A.: Unsupervised optimal fuzzy clustering, IEEE transactions on pattern analysis and machine intelligence 11, No. 7, 773-780 (1987) |

[61] | Bezdek, J.: A convergence theorem for the fuzzy ISODATA clustering algorithms, IEEE transactions on pattern analysis and machine intelligence 2, No. 1, 1-8 (1980) · Zbl 0441.62055 · doi:10.1109/TPAMI.1980.4766964 |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.