Spectral clustering and the high-dimensional stochastic blockmodel. (English) Zbl 1227.62042

Summary: Networks or graphs can easily represent a diverse set of data sources that are characterized by interacting units or actors. Social networks, representing people who communicate with each other, are one example. Communities or clusters of highly connected actors form an essential feature in the structure of several empirical networks. Spectral clustering is a popular and computationally feasible method to discover these communities. The stochastic blockmodel [see T. A. B. Snyders and K. Nowicki, J. Classif. 14, No. 1, 75–100 (1997; Zbl 0896.62063)] is a social network model with well-defined communities; each node is a member of one community. For a network generated from the stochastic blockmodel, we bound the number of nodes “misclustered” by spectral clustering. The asymptotic results in this paper are the first clustering results that allow the number of clusters in the model to grow with the number of nodes, hence the name high-dimensional.
In order to study spectral clustering under the stochastic blockmodel, we first show that under the more general latent space model, the eigenvectors of the normalized graph Laplacian asymptotically converge to the eigenvectors of a “population” normalized graph Laplacian. Aside from the implication for spectral clustering, this provides insight into a graph visualization technique. Our method of studying the eigenvectors of random matrices is original.


62H30 Classification and discrimination; cluster analysis (statistical aspects)
91D30 Social networks; opinion dynamics
05C90 Applications of graph theory
65C60 Computational problems in statistics (MSC2010)


Zbl 0896.62063
Full Text: DOI arXiv


[1] Airoldi, E. M., Blei, D. M., Fienberg, S. E. and Xing, E. P. (2008). Mixed membership stochastic blockmodels. J. Mach. Learn. Res. 9 1981-2014. · Zbl 1225.68143
[2] Albert, R. and Barabási, A. L. (2002). Statistical mechanics of complex networks. Rev. Modern Phys. 74 47-97. · Zbl 1205.82086
[3] Barabási, A. L. and Albert, R. (1999). Emergence of scaling in random networks. Science 286 509-512. · Zbl 1226.05223
[4] Belkin, M. (2003). Problems of learning on manifolds. Ph.D. thesis, Univ. Chicago.
[5] Belkin, M. and Niyogi, P. (2003). Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15 1373-1396. · Zbl 1085.68119
[6] Belkin, M. and Niyogi, P. (2008). Towards a theoretical foundation for Laplacian-based manifold methods. J. Comput. System Sci. 74 1289-1308. · Zbl 1157.68056
[7] Bhatia, R. (1987). Perturbation Bounds for Matrix Eigenvalues . Longman, Harlow. · Zbl 0696.15013
[8] Bickel, P. J. and Chen, A. (2009). A nonparametric view of network models and Newman-Girvan and other modularities. Proc. Nat. Acad. Sci. India Sect. A 106 21068. · Zbl 1359.62411
[9] Bock, R. D. and Husain, S. (1952). Factors of the Tele: A preliminary report. Sociometry 15 206-219.
[10] Bousquet, O., Chapelle, O. and Hein, M. (2004). Measure based regularization. Adv. Neural Inf. Process. Syst. 16 .
[11] Brandes, U., Delling, D., Gaertler, M., Görke, R., Hoefer, M., Nikoloski, Z. and Wagner, D. (2008). On modularity clustering. IEEE Transactions on Knowledge and Data Engineering 20 172-188. · Zbl 1141.68519
[12] Breiger, R. L., Boorman, S. A. and Arabie, P. (1975). An algorithm for clustering relational data with applications to social network analysis and comparison with multidimensional scaling. J. Math. Psych. 12 328-383.
[13] Champion, M. (1998). How many atoms make up the universe? Available at .
[14] Choi, D., Wolfe, P. and Airoldi, E. (2010). Stochastic blockmodels with growing number of classes. Preprint. Available at . · Zbl 1318.62207
[15] Coifman, R., Lafon, S., Lee, A., Maggioni, M., Nadler, B., Warner, F. and Zucker, S. (2005). Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps. Proc. Nat. Acad. Sci. India Sect. A 102 7426-7431.
[16] Condon, A. and Karp, R. M. (1999). Algorithms for graph partitioning on the planted partition model. In Randomization, Approximation, and Combinatorial Optimization (Berkeley, CA, 1999). Lecture Notes in Comput. Sci. 1671 221-232. Springer, Berlin. · Zbl 0946.05081
[17] Donath, W. E. and Hoffman, A. J. (1973). Lower bounds for the partitioning of graphs. IBM J. Res. Develop. 17 420-425. · Zbl 0259.05112
[18] Erdös, P. and Rényi, A. (1959). On random graphs. Publ. Math. Debrecen 6 290-297. · Zbl 0092.15705
[19] Fiedler, M. (1973). Algebraic connectivity of graphs. Czechoslovak Math. J. 23 298-305. · Zbl 0265.05119
[20] Fjällström, P. (1998). Algorithms for graph partitioning: A survey. Computer and Information Science 3 . Available at .
[21] Fortunato, S. (2010). Community detection in graphs. Phys. Rep. 486 75-174.
[22] Frank, O. and Strauss, D. (1986). Markov graphs. J. Amer. Statist. Assoc. 81 832-842. · Zbl 0607.05057
[23] Freeman, L. C. (2000). Visualizing social networks. Journal of Social Structure 1 (1).
[24] Giné, E. and Koltchinskii, V. (2006). Empirical graph Laplacian approximation of Laplace-Beltrami operators: Large sample results. In High Dimensional Probability. Institute of Mathematical Statistics Lecture Notes-Monograph Series 51 238-259. IMS, Beachwood, OH. · Zbl 1124.60030
[25] Girvan, M. and Newman, M. (2002). Community structure in social and biological networks. Proc. Nat. Acad. Sci. India Sect. A 99 7821-7826. · Zbl 1032.91716
[26] Goldenberg, A., Zheng, A. X., Fienberg, S. E. and Airoldi, E. M. (2010). A survey of statistical network models. Foundations and Trends in Machine Learning 2 129-233. · Zbl 1184.68030
[27] Hagen, L. and Kahng, A. B. (1992). New spectral methods for ratio cut partitioning and clustering. IEEE Trans. Computer-Aided Design 11 1074-1085.
[28] Handcock, M. S., Raftery, A. E. and Tantrum, J. M. (2007). Model-based clustering for social networks. J. Roy. Statist. Soc. Ser. A 170 301-354. · Zbl 05273954
[29] Hein, M. (2006). Uniform convergence of adaptive graph-based regularization. In Learning Theory. Lecture Notes in Comput. Sci. 4005 50-64. Springer, Berlin. · Zbl 1143.68416
[30] Hein, M., Audibert, J. Y. and von Luxburg, U. (2005). From graphs to manifolds-weak and strong pointwise consistency of graph Laplacians. In Learning Theory. Lecture Notes in Comput. Sci. 3559 470-485. Springer, Berlin. · Zbl 1095.68097
[31] Hendrickson, B. and Leland, R. (1995). An improved spectral graph partitioning algorithm for mapping parallel computations. SIAM J. Sci. Comput. 16 452-469. · Zbl 0816.68093
[32] Hoff, P. D., Raftery, A. E. and Handcock, M. S. (2002). Latent space approaches to social network analysis. J. Amer. Statist. Assoc. 97 1090-1098. · Zbl 1041.62098
[33] Holland, P., Laskey, K. B. and Leinhardt, S. (1983). Stochastic blockmodels: Some first steps. Social Networks 5 109-137.
[34] Holland, P. W. and Leinhardt, S. (1981). An exponential family of probability distributions for directed graphs. J. Amer. Statist. Assoc. 76 33-50. · Zbl 0457.62090
[35] Kallenberg, O. (2005). Probabilistic Symmetries and Invariance Principles . Springer, New York. · Zbl 1084.60003
[36] Klein, D. and Randić, M. (1993). Resistance distance. J. Math. Chem. 12 81-95.
[37] Koren, Y. (2005). Drawing graphs by eigenvectors: Theory and practice. Comput. Math. Appl. 49 1867-1888. · Zbl 1080.05024
[38] Lafon, S. S. (2004). Diffusion maps and geometric harmonics. Ph.D. thesis, Yale Univ.
[39] Leskovec, J., Lang, K. J., Dasgupta, A. and Mahoney, M. W. (2008). Statistical properties of community structure in large social and information networks. In Proceeding of the 17th International Conference on World Wide Web 695-704. ACM, Beijing, China.
[40] Liotta, G., ed. (2004). Graph Drawing. Lecture Notes in Comput. Sci. 2912 . Springer, Berlin. · Zbl 1029.00056
[41] Meilă, M. and Shi, J. (2001). A random walks view of spectral segmentation. In AI and Statistics ( AISTATS ) 2001.
[42] Newman, M. and Girvan, M. (2004). Finding and evaluating community structure in networks. Phys. Rev. E (3) 69 026113. · Zbl 1032.91716
[43] Nowicki, K. and Snijders, T. A. B. (2001). Estimation and prediction for stochastic blockstructures. J. Amer. Statist. Assoc. 96 1077-1087. · Zbl 1072.62542
[44] Pothen, A., Simon, H. D. and Liou, K. P. (1990). Partitioning sparse matrices with eigenvectors of graphs. SIAM J. Matrix Anal. Appl. 11 430-452. · Zbl 0711.65034
[45] Shi, J. and Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22 888-905.
[46] Snijders, T. A. B. and Nowicki, K. (1997). Estimation and prediction for stochastic blockmodels for graphs with latent block structure. J. Classification 14 75-100. · Zbl 0896.62063
[47] Spielman, D. A. and Teng, S. H. (2007). Spectral partitioning works: Planar graphs and finite element meshes. Linear Algebra Appl. 421 284-305. · Zbl 1122.05062
[48] Steinhaus, H. (1956). Sur la division des corp materiels en parties. Bull. Acad. Polon. Sci. Cl. III 4 801-804. · Zbl 0079.16403
[49] Stewart, G. W. and Sun, J. (1990). Matrix Perturbation Theory . Academic Press, San Diego, CA. · Zbl 0706.65013
[50] Traud, A. L., Kelsic, E. D., Mucha, P. J. and Porter, M. A. (2008). Community structure in online collegiate social networks.
[51] Van Driessche, R. and Roose, D. (1995). An improved spectral bisection algorithm and its application to dynamic load balancing. Parallel Comput. 21 29-48. · Zbl 0875.68257
[52] Van Duijn, M. A. J., Snijders, T. A. B. and Zijlstra, B. J. H. (2004). p 2 : A random effects model with covariates for directed graphs. Stat. Neerl. 58 234-254. · Zbl 1050.62117
[53] von Luxburg, U. (2007). A tutorial on spectral clustering. Stat. Comput. 17 395-416.
[54] von Luxburg, U., Belkin, M. and Bousquet, O. (2008). Consistency of spectral clustering. Ann. Statist. 36 555-586. · Zbl 1133.62045
[55] Vu, V. (2011). Singular vectors under random perturbation. Random Structures Algorithms . . · Zbl 1242.65069
[56] Wasserman, S. and Faust, K. (1994). Social Network Analysis: Methods and Applications . Cambridge Univ. Press, Cambridge. · Zbl 0926.91066
[57] Watts, D. J. and Strogatz, S. H. (1998). Collective dynamics of ‘small world’ networks. Nature 393 440-442. · Zbl 1368.05139
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.