Clustering based on conditional distributions in an auxiliary space. (English) Zbl 1009.62048

Summary: We study the problem of learning groups or categories that are local in the continuous primary space but homogeneous by the distributions of an associated auxiliary random variable over a discrete auxiliary space. Assuming that variation in the auxiliary space is meaningful, categories will emphasize similarly meaningful aspects of the primary space. From a data set consisting of pairs of primary and auxiliary items, the categories are learned by minimizing a Kullback-Leibler divergence-based distortion between (implicitly estimated) distributions of the auxiliary data, conditioned on the primary data. Still, the categories are defined in terms of the primary space. An online algorithm resembling the traditional Hebb-type competitive learning is introduced for learning the categories. Minimizing the distortion criterion turns out to be equivalent to maximizing the mutual information between the categories and the auxiliary data. In addition, connections to density estimation and to the distributional clustering paradigm are outlined. The method is demonstrated by clustering yeast gene expression data from DNA chips, with biological knowledge about the functional classes of the genes as the auxiliary data.


62H30 Classification and discrimination; cluster analysis (statistical aspects)
62P10 Applications of statistics to biology and medical sciences; meta analysis
68T05 Learning and adaptive systems in artificial intelligence
Full Text: DOI


[1] DOI: 10.1088/0954-898X/7/1/003 · Zbl 0898.92038
[2] DOI: 10.1038/355161a0
[3] DOI: 10.1073/pnas.97.1.262
[4] Dempster A. P., Journal of the Royal Statistical Society, Series B 39 pp 1– (1977)
[5] DOI: 10.1016/0025-5564(76)90024-9
[6] DOI: 10.1073/pnas.95.25.14863
[7] Forgy E. W., Biometrics 21 pp 768– (1965)
[8] DOI: 10.1109/TIT.1979.1056067 · Zbl 0409.94013
[9] DOI: 10.1109/MASSP.1984.1162229
[10] DOI: 10.1007/BF00337422 · Zbl 0321.92023
[11] DOI: 10.1016/S0893-6080(05)80154-6 · Zbl 0816.92002
[12] Kaski S., IEEE Transactions on Neural Networks.
[13] Kohonen T., Neural Networks 6 pp 895– (1993)
[14] Kohonen T., TINS 22 pp 135– (1999)
[15] DOI: 10.1109/PROC.1985.13340
[16] Mardia K. V., Journal of the Royal Statistical Society B 37 pp 349– (1975)
[17] DOI: 10.1007/BF00319777
[18] DOI: 10.1007/BF00275687 · Zbl 0488.92012
[19] Pérez R., Journal of Mathematical Biology 1 pp 275– (1975)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.