×

zbMATH — the first resource for mathematics

Community discovery using nonnegative matrix factorization. (English) Zbl 1235.68034
Summary: Complex networks exist in a wide range of real world systems, such as social networks, technological networks, and biological networks. During the last decades, many researchers have concentrated on exploring some common things contained in those large networks include the small-world property, power-law degree distributions, and network connectivity. In this paper, we will investigate another important issue, community discovery, in network analysis. We choose nonnegative matrix factorization (NMF) as our tool to find the communities because of its powerful interpretability and close relationship between clustering methods. Targeting different types of networks (undirected, directed and compound), we propose three NMF techniques (symmetric NMF, asymmetric NMF and joint NMF). The correctness and convergence properties of those algorithms are also studied. Finally the experiments on real world networks are presented to show the effectiveness of the proposed methods.

MSC:
68M10 Network design and communication in computer systems
90B10 Deterministic network models in operations research
68T05 Learning and adaptive systems in artificial intelligence
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Albert R, Jeong H, Barabasi A-L (1999) The diameter of the world wide web. Nature 401: 130 · doi:10.1038/43601
[2] Amaral LAN, Scala A, Bartheélémy M, Stanley HE (2000) Classes of small-world networks. Proc Natl Acad Sci USA 97: 11149–11152 · doi:10.1073/pnas.200327197
[3] Barthelemy M, Amaral LAN (1999) Small-world networks: evidence for a crossover picture. Phys Rev Lett 82: 3180 · doi:10.1103/PhysRevLett.82.3180
[4] Brunet JP, Tamayo P, Golub TR, Mesirov JP (2004) Metagenes and molecular pattern discovery using matrix factorization. Proc Natl Acad Sci USA 102(12): 4164–4169 · doi:10.1073/pnas.0308531101
[5] Chen G, Wang F, Zhang C (2007) Collaborative filtering using orthogonal nonnegative matrix tri-factorization. In: ICDM workshops on high performance computing, pp 303–308
[6] Ding C, He X, Simon HD (2005) On the equivalence of nonnegative matrix factorization and spectral clustering. In: Proceedings of SIAM international conference on data mining, pp 606–610
[7] Ding C, Li T, Jordan MI (2006a) Convex and semi-nonnegative matrix factorizations. LBNL Tech Report 60428
[8] Ding C, Li T, Peng W, Park H (2006b) Orthogonal nonnegative matrix tri-factorizations for clustering. In: SIGKDD, pp 126–135
[9] Ding C, Li T, Peng W (2008) On the equivalence between nonnegative matrix factorization and probabilistic latent semantic indexing. Comput Stat Data Anal 52(1): 155–173 · Zbl 1452.62053
[10] Faloutsos M, Faloutsos P, Faloutsos C (1999) On power-law relationships of the internet topology. In: SIGCOMM, pp 251–262 · Zbl 0889.68050
[11] Flake GW, Lawrence S, Giles CL (2000) Efficient identification of web communities. In: SIGKDD, pp 150–160
[12] Girvan M, Newman ME (2002) Community structure in social and biological networks. Proc Natl Acad Sci USA 99(12): 7821–7826 · Zbl 1032.91716 · doi:10.1073/pnas.122653799
[13] Hofmann T, Puzicha J (1999) Latent class models for collaborative filtering. In: IJCAI, pp 688–693
[14] Ino H, Kudo M, Nakamura A (2005) Partitioning of web graphs by community topology. In: WWW, pp 661–669
[15] Lee DD, Seung HS (1999) Learning the parts of objects by nonnegative matrix factorization. Nature 401: 788–791 · Zbl 1369.68285 · doi:10.1038/44565
[16] Lee DD, Seung HS (2000) Algorithms for non-negative matrix factorization, pp 556–562
[17] Long B, Zhang Z, Wu X, Yu PS (2007) Relational clustering by symmetric convex coding. In: Proceedings of the 24th international conference on machine learning, pp 569–576
[18] Miao G, Song Y, Zhang D, Bai H (2008) Parallel spectral clustering algorithm for large-scale community data mining. In: The 17th WWW workshop on social web search and mining (SWSM)
[19] Newman MEJ (2004a) Detecting community structure in networks. Eur Phys J B 38: 321–330 · doi:10.1140/epjb/e2004-00124-y
[20] Newman MEJ (2004b) Fast algorithm for detecting community structure in very large networks. Phys Rev E 69
[21] Paatero P, Tapper U (1994) Positive matrix factorization: a nonnegative factor model with optimal utilization of error estimates of data values. Environmetrics 5: 111–126 · doi:10.1002/env.3170050203
[22] Palla G, Derényi I, Farkas I, Vicsek T. Uncovering the overlapping community structure of complex networks in nature and society. Nature 435(7043):814–818
[23] Pauca VP, Shahnaz F, Berry MW, Plemmons RJ (2004) Positive matrix factorization: a nonnegative factor model with optimal utilization of error estimates of data values. In: The 4th SIAM international conference on data mining, pp 452–456
[24] Pennock DM, Horvitz E, Lawrence S, Giles CL (2000) Collaborative filtering by personality diagnosis: a hybrid memory- and model-based approach. In: UAI
[25] Priebe CE, Conroy JM, Marchette DJ, Park Y (2005) Scan statistics on Enron graphs. In: Proceedings of SIAM international conference on data mining · Zbl 1086.68562
[26] Resnick P., Iacovou N, Suchak M, Bergstrom P, Riedl J (1994) Grouplens: an open architecture for collaborative filtering of netnews, pp 175–186
[27] Ruan J, Zhang W (2007) An efficient spectral algorithm for network community discovery and its applications to biological and social networks. In: ICDM
[28] Scott J (2000) Social network analysis: a handbook, 2nd edn. Sage Publications, London
[29] Sharan R et al (2005) From the cover: conserved patterns of protein interaction in multiple species. Proc Natl Acad Sci USA 102(6): 1974–1979 · doi:10.1073/pnas.0409522102
[30] Strehl A, Ghosh J, Cardie C (2002) Cluster ensembles: a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3: 583–617 · Zbl 1084.68759
[31] von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4): 395–416 · doi:10.1007/s11222-007-9033-z
[32] Wang F, Ma S, Yang L, Li T (2006) Recommendation on item graphs. In: ICDM, pp 1119–1123
[33] Wang D, Li T, Zhu S, Ding CHQ (2008a) Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval, pp 307–314
[34] Wang F, Li T, Zhang C (2008b) Semi-supervised clustering via matrix factorization. In: The 8th SIAM international conference on data mining
[35] Wasserman S, Faust K (1994) Social network analysis. Cambridge University Press, Cambridge · Zbl 0926.91066
[36] Watts DJ, Strogatz SH (1998) Collective dynamics of ’small-world’ networks. Nature 393: 440–442 · Zbl 1368.05139 · doi:10.1038/30918
[37] Weiss Y (1999) Segmentation using eigenvectors: a unifying view. In: ICCV, pp 975–982
[38] Xie YL, Hopke PK, Paatero P (1999) Positive matrix factorization applied to a curve resolution problem. J Chemometr 12(6): 357–364 · doi:10.1002/(SICI)1099-128X(199811/12)12:6<357::AID-CEM523>3.0.CO;2-S
[39] Yu K, Tresp V (2005) Learning to learn and collaborative filtering. In: NIPS workshop on inductive transfer: 10 years later · Zbl 1112.68453
[40] Zhou D, Huang J, Schölkopf B (2005) Learning from labeled and unlabeled data on a directed graph. In: Proceedings of the 22nd international conference on machine learning, pp 1036–1043
[41] Zhang H, Giles CL, Foley HC, Yen J (2007) Probabilistic community discovery using hierarchical latent gaussian mixture model. In: AAAI, pp 663–668
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.