×

Minimum spectral connectivity projection pursuit. Divisive clustering using optimal projections for spectral clustering. (English) Zbl 1430.62134

Summary: We study the problem of determining the optimal low-dimensional projection for maximising the separability of a binary partition of an unlabelled dataset, as measured by spectral graph theory. This is achieved by finding projections which minimise the second eigenvalue of the graph Laplacian of the projected data, which corresponds to a non-convex, non-smooth optimisation problem. We show that the optimal univariate projection based on spectral connectivity converges to the vector normal to the maximum margin hyperplane through the data, as the scaling parameter is reduced to zero. This establishes a connection between connectivity as measured by spectral graph theory and maximal Euclidean separation. The computational cost associated with each eigen problem is quadratic in the number of data. To mitigate this issue, we propose an approximation method using microclusters with provable approximation error bounds. Combining multiple binary partitions within a divisive hierarchical model allows us to construct clustering solutions admitting clusters with varying scales and lying within different subspaces. We evaluate the performance of the proposed method on a large collection of benchmark datasets and find that it compares favourably with existing methods for projection pursuit and dimension reduction for data clustering. Applying the proposed approach for a decreasing sequence of scaling parameters allows us to obtain large margin clustering solutions, which are found to be competitive with those from dedicated maximum margin clustering algorithms.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Bach, F.R., Jordan, M.I.: Learning spectral clustering, with application to speech separation. J. Mach. Learn. Res. 7, 1963-2001 (2006) · Zbl 1222.68138
[2] Bache, K., Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
[3] Boumal, N., Mishra, B., Absil, P.A., Sepulchre, R.: Manopt, a matlab toolbox for optimization on manifolds. J. Mach. Learn. Res. 15, 1455-1459 (2014) · Zbl 1319.90003
[4] Burke, J.V., Lewis, A.S., Overton, M.L.: A robust gradient sampling algorithm for nonsmooth, nonconvex optimization. SIAM J. Optim. 15(3), 751-779 (2006) · Zbl 1078.65048
[5] Chi, Y., Song, X., Zhou, D., Hino, K., Tseng, B.L.: On evolutionary spectral clustering. ACM Trans. Knowl. 3(4), 17:1-17:30 (2009)
[6] Edelman, A., Arias, T., Smith, S.T.: The geometry of algorithms with orthogonality constraints. SIAM J. Matrix Anal. Appl. 20(2), 303-353 (1998) · Zbl 0928.65050
[7] Fan, K.: On a theorem of weyl concerning eigenvalues of linear transformations I. Proc. Natl. Acad. Sci. USA 35(11), 652 (1949) · Zbl 0041.00602
[8] Hagen, L., Kahng, A.B.: New spectral methods for ratio cut partitioning and clustering. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 11(9), 1074-1085 (1992)
[9] Hartigan, J.A., Hartigan, P.M.: The dip test of unimodality. Ann. Stat. 13(1), 70-84 (1985) · Zbl 0575.62045
[10] Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Texts in Statistics, 2nd edn. Springer, New York (2009) · Zbl 1273.62005
[11] Hofmeyr, D., Pavlidis, N.: Maximum clusterability divisive clustering. In: 2015 IEEE Symposium Series on Computational Intelligence, pp. 780-786. IEEE (2015)
[12] Hofmeyr, D.: Improving spectral clustering using the asymptotic value of the normalised cut. arXiv preprint arXiv:1703.09975 (2017)
[13] Joachims, T.: Transductive inference for text classification using support vector machines. In: Proceedings of International Conference on Machine Learning (ICML), Bled, Slowenien, vol. 99, pp. 200-209 (1999)
[14] Kaiser, H.F.: The application of electronic computers to factor analysis. Educ. Psychol. Meas. 20(1), 141-151 (1960)
[15] Krause, A., Liebscher, V.: Multimodal projection pursuit using the dip statistic. Preprint-Reihe Mathematik 13 (2005)
[16] Lewis, A.S., Overton, M.L.: Eigenvalue optimization. Acta Numer. 5, 149-190 (1996) · Zbl 0870.65047
[17] Lewis, A., Overton, M.: Nonsmooth optimization via quasi-Newton methods. Math. Program. 141, 135-163 (2013) · Zbl 1280.90118
[18] Magnus, J.R.: On differentiating eigenvalues and eigenvectors. Econ. Theory 1(02), 179-191 (1985)
[19] Ng, A.; Jordan, MI; Weiss, Y.; Dietterich, T. (ed.); Becker, S. (ed.); Ghahramani, Z. (ed.), On spectral clustering: analysis and an algorithm, No. 14, 849-856 (2002), Cambridge
[20] Ning, H., Xu, W., Chi, Y., Gong, Y., Huang, T.S.: Incremental spectral clustering by efficiently updating the eigen-system. Pattern Recogn. 43(1), 113-127 (2010) · Zbl 1176.68186
[21] Niu, D., Dy, J.G., Jordan, M.I.: Dimensionality reduction for spectral clustering. In: International Conference on Artificial Intelligence and Statistics, pp. 552-560 (2011)
[22] Nocedal, J., Wright, S.: Numerical Optimization. Springer, Berlin (2006) · Zbl 1104.65059
[23] Overton, M.L., Womersley, R.S.: Optimality conditions and duality theory for minimizing sums of the largest eigenvalues of symmetric matrices. Math. Program. 62(1-3), 321-357 (1993) · Zbl 0806.90114
[24] Pavlidis, N.G., Hofmeyr, D.P., Tasoulis, S.K.: Minimum density hyperplanes. J. Mach. Learn. Res. 17(156), 1-33 (2016) · Zbl 1392.68362
[25] Peña, D., Prieto, F.J.: Cluster identification using projections. J. Am. Stat. Assoc. 147, 389 (2001) · Zbl 1051.62055
[26] Polak, E.: On the mathematical foundations of nondifferentiable optimization in engineering design. SIAM Rev. 29(1), 21-89 (1987)
[27] Rahimi, A., Recht, B.: Clustering with normalized cuts is clustering with a hyperplane. Stat. Learn. Comput. Vis. 56, 1 (2004)
[28] Schur, J.: Bemerkungen zur theorie der beschränkten bilinearformen mit unendlich vielen veränderlichen. J. für die reine und Angew. Math. 140, 1-28 (1911) · JFM 42.0367.01
[29] Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888-905 (2000)
[30] Strehl, A., Ghosh, J.: Cluster ensembles – a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583-617 (2002) · Zbl 1084.68759
[31] Tong, S., Koller, D.: Restricted Bayes optimal classifiers. In: AAAI/IAAI, pp. 658-664 (2000)
[32] Trillos, N.G., Slepčev, D., Von Brecht, J., Laurent, T., Bresson, X.: Consistency of cheeger and ratio graph cuts. J. Mach. Learn. Res. 17(1), 6268-6313 (2016) · Zbl 1392.62180
[33] Vapnik, V.N., Kotz, S.: Estimation of Dependences Based on Empirical Data, vol. 40. Springer, New York (1982) · Zbl 0499.62005
[34] von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395-416 (2007)
[35] Wagner, D., Wagner, F.: Between Min Cut and Graph Bisection. Springer, Berlin (1993) · Zbl 0925.05053
[36] Wang, F., Zhao, B., Zhang, C.: Linear time maximum margin clustering. IEEE Trans. Neural Netw. 21(2), 319-332 (2010)
[37] Weiss, Y.: Segmentation using eigenvectors: a unifying view. In: Proceedings of the 7th IEEE International Conference on Computer Vision, vol. 2, pp. 975-982 (1999)
[38] Weyl, H.: Das asymptotische verteilungsgesetz der eigenwerte linearer partieller differentialgleichungen (mit einer anwendung auf die theorie der hohlraumstrahlung). Math. Ann. 71(4), 441-479 (1912) · JFM 43.0436.01
[39] Wolfe, P.: On the convergence of gradient methods under constraint. IBM J. Res. Dev. 16(4), 407-411 (1972) · Zbl 0265.90046
[40] Xu, L., Neufeld, J., Larson, B., Schuurmans, D.: Maximum margin clustering. In: Advances in Neural Information Processing Systems, pp. 1537-1544 (2004)
[41] Yan, D., Huang, L., Jordan, M.I.: Fast approximate spectral clustering. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 907-916. ACM (2009)
[42] Ye, Q.: Relative perturbation bounds for eigenvalues of symmetric positive definite diagonally dominant matrices. SIAM J. Matrix Anal. Appl. 31(1), 11-17 (2009) · Zbl 1189.15026
[43] Zelnik-Manor, L., Perona, P.: Self-tuning spectral clustering. In: Advances in Neural Information Processing Systems, pp. 1601-1608 (2004)
[44] Zhang, T., Ramakrishnan, R., Livny, M.: Birch: an efficient data clustering method for very large databases. In: ACM SIGMOD Record, vol. 25, pp. 103-114. ACM (1996)
[45] Zhang, B.: Dependence of clustering algorithm performance on clustered-ness of data. Technical Report, 20010417. Hewlett-Packard Labs (2001)
[46] Zhang, K., Tsang, I.W., Kwok, J.T.: Maximum margin clustering made practical. IEEE Trans. Neural Netw. 20(4), 583-596 (2009)
[47] Zhao, Y., Karypis, G.: Empirical and theoretical comparisons of selected criterion functions for document clustering. Mach. Learn. 55(3), 311-331 (2004) · Zbl 1089.68615
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.