×

A dual reformulation and solution framework for regularized convex clustering problems. (English) Zbl 1487.62072

Summary: Clustering techniques are powerful tools commonly used in statistical learning and data analytics. Most of the past research formulates clustering tasks as a non-convex problem, where a global optimum often cannot be found. Recent studies show that hierarchical clustering and \(k\)-means clustering can be relaxed and analyzed as a convex problem. Moreover, sparse convex clustering algorithms are proposed to extend the convex clustering framework to high-dimensional space by introducing an adaptive group-Lasso penalty term. Due to the non-smoothness nature of the associated objective functions, there are still no efficient fast-convergent algorithms for clustering problems even with convexity. In this paper, we first review the structure of convex clustering problems and prove the differentiability of their dual problems. We then show that such reformulated dual problems can be efficiently solved by the accelerated first-order methods with the feasibility projection. Furthermore, we present a general framework for convex clustering with regularization terms and discuss a specific implementation of this framework using \(L_{1,1}\)-norm. We also derive the dual form for the regularized convex clustering problems and show that it can be efficiently solved by embedding a projection operator and a proximal operator in the accelerated gradient method. Finally, we compare our approach with several other co-clustering algorithms using a number of example clustering problems. Numerical results show that our models and solution methods outperform all the compared algorithms for both convex clustering and convex co-clustering.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
90C25 Convex programming
90C46 Optimality conditions and duality in mathematical programming
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Aloise, D.; Deshpande, A.; Hansen, P.; Popat, P., NP-hardness of Euclidean sum-of-squares clustering, Machine Learning, 75, 2, 245-248 (2009) · Zbl 1378.68047
[2] Beck, A.; Teboulle, M., Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems, IEEE Transactions on Image Processing, 18, 11, 2419-2434 (2009) · Zbl 1371.94049
[3] Beck, A.; Teboulle, M., Smoothing and first order methods: A unified framework, SIAM Journal on Optimization, 22, 2, 557-580 (2012) · Zbl 1251.90304
[4] Becker, S. R.; Candès, E. J.; Grant, M. C., Templates for convex cone problems with applications to sparse signal recovery, Mathematical Programming Computation, 3, 3, 165-218 (2011) · Zbl 1257.90042
[5] Blanquero, R.; Carrizosa, E.; Jiménez-Cordero, A.; Martín-Barragán, B., Functional-bandwidth kernel for support vector machine with functional data: An alternating optimization algorithm, European Journal of Operational Research, 275, 1, 195-207 (2019) · Zbl 1430.90450
[6] Blanquero, R.; Carrizosa, E.; Jiménez-Cordero, A.; Martín-Barragán, B., Variable selection in classification for multivariate functional data, Information Sciences, 481, 445-462 (2019) · Zbl 1443.62179
[7] Carrizosa, E.; Olivares-Nadal, A. V.; Ramírez-Cobo, P., A sparsity-controlled vector autoregressive model, Biostatistics, 18, 2, 244-259 (2016)
[8] Chambolle, A., An algorithm for total variation minimization and applications, Journal of Mathematical Imaging and Vision, 20, 1-2, 89-97 (2004) · Zbl 1366.94048
[9] Chi, E. C.; Allen, G. I.; Baraniuk, R. G., Convex biclustering, Biometrics, 73, 1, 10-19 (2017) · Zbl 1366.62208
[10] Chi, E. C.; Lange, K., Splitting methods for convex clustering, Journal of Computational and Graphical Statistics, 24, 4, 994-1013 (2015)
[11] De Leeuw, J.; Mair, P., Gifi methods for optimal scaling in R: The package homals, Journal of Statistical Software, 1-30 (2009)
[12] Dhillon, I. S., Co-clustering documents and words using bipartite spectral graph partitioning, Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining, 269-274 (2001), ACM
[13] Duchi, J.; Shalev-Shwartz, S.; Singer, Y.; Chandra, T., Efficient projections onto the l1-ball for learning in high dimensions, Proceedings of the twenty-fifth international conference on machine learning (ICML) (2008)
[14] Fisher, R. A., The use of multiple measurements in taxonomic problems, Annals of Eugenics, 7, 2, 179-188 (1936)
[15] Frank, M.; Wolfe, P., An algorithm for quadratic programming, Naval Research Logistics Quarterly, 3, 1-2, 95-110 (1956)
[16] Grant, M., Boyd, S., & Ye, Y. (2008). Cvx: Matlab software for disciplined convex programming.
[17] Hartigan, J. A., Direct clustering of a data matrix, Journal of the American Statistical Association, 67, 337, 123-129 (1972)
[18] Hocking, T. D.; Joulin, A.; Bach, F.; Vert, J.-P., Clusterpath an algorithm for clustering using convex fusion penalties, Proceedings of the twenty-eighth international conference on machine learning, 1 (2011)
[19] Hoefling, H., A path algorithm for the fused lasso signal approximator, Journal of Computational and Graphical Statistics, 19, 4, 984-1006 (2010)
[20] Hubert, L.; Arabie, P., Comparing partitions, Journal of Classification, 2, 1, 193-218 (1985)
[21] Jain, A. K.; Dubes, R. C., Algorithms for clustering data (1988), Prentice-Hall, Inc. · Zbl 0665.62061
[22] Křivánek, M.; Morávek, J., NP-hard problems in hierarchical-tree clustering, Acta Informatica, 23, 3, 311-323 (1986) · Zbl 0644.68055
[23] Lee, M.; Shen, H.; Huang, J. Z.; Marron, J., Biclustering via sparse singular value decomposition, Biometrics, 66, 4, 1087-1095 (2010) · Zbl 1233.62182
[24] Lindsten, F.; Ohlsson, H.; Ljung, L., Clustering using sum-of-norms regularization: With application to particle filter output computation, Proceedings of the 2011 IEEE statistical signal processing workshop (SSP),, 201-204 (2011), IEEE
[25] Mahajan, M.; Nimbhorkar, P.; Varadarajan, K., The planar k-means problem is NP-hard, Walcom: Algorithms and computation, 274-285 (2009), Springer · Zbl 1211.68212
[26] Meilă, M., Comparing clusterings an information based distance, Journal of Multivariate Analysis, 98, 5, 873-895 (2007) · Zbl 1298.91124
[27] Nemirovsky, A.-S.; Yudin, D.-B.; Dawson, E.-R., Problem complexity and method efficiency in optimization. (1982), John Wiley & Sons, Inc., Panstwowe Wydawnictwo Naukowe (PWN)
[28] Nesterov, Y., A method of solving a convex programming problem with convergence rate \(\mathcal{O} ( 1 / k^2 ), 27 (1983)\)
[29] Nesterov, Y., Smooth minimization of non-smooth functions, Mathematical Programming, 103, 1, 127-152 (2005) · Zbl 1079.90102
[30] Nesterov, Y., Smoothing technique and its applications in semidefinite optimization, Mathematical Programming, 110, 2, 245-259 (2007) · Zbl 1126.90058
[31] O’Donoghue, B.; Candes, E., Adaptive restart for accelerated gradient schemes, Foundations of Computational Mathematics, 15, 3, 715-732 (2015) · Zbl 1320.90061
[32] Papalexakis, E. E.; Sidiropoulos, N.; Bro, R., From k-means to higher-way co-clustering: Multilinear decomposition with sparse latent factors, IEEE Transactions on Signal Processing, 61, 2, 493-506 (2013)
[33] Ramsay, J. O.; Silverman, B. W., Applied functional data analysis: methods and case studies (2007), Springer · Zbl 1011.62002
[34] Rand, W. M., Objective criteria for the evaluation of clustering methods, Journal of the American Statistical Association, 66, 336, 846-850 (1971)
[35] Rosales-Salas, J.; Maldonado, S.; Seret, A., Understanding time use via data mining: A clustering-based framework, Intelligent Data Analysis, 22, 3, 597-616 (2018)
[36] Rudin, L. I.; Osher, S.; Fatemi, E., Nonlinear total variation based noise removal algorithms, Physica D: Nonlinear Phenomena, 60, 1-4, 259-268 (1992) · Zbl 0780.49028
[37] Shang, F.; Jiao, L.; Wang, F., Graph dual regularization non-negative matrix factorization for co-clustering, Pattern Recognition, 45, 6, 2237-2250 (2012) · Zbl 1234.68356
[38] Sun, W.; Wang, J.; Fang, Y., Regularized k-means clustering of high-dimensional data and its asymptotic consistency, Electronic Journal of Statistics, 6, 148-167 (2012) · Zbl 1335.62109
[39] Tan, K. M.; Witten, D. M., Sparse biclustering of transposable data, Journal of Computational and Graphical Statistics, 23, 4, 985-1008 (2014)
[40] Tibshirani, R., Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society Series B (Methodological), 267-288 (1996) · Zbl 0850.62538
[41] Tseng, P., Approximation accuracy, gradient methods, and error bound for structured convex optimization, Mathematical Programming, 125, 2, 263-295 (2010) · Zbl 1207.65084
[42] Wang, B.; Zhang, Y.; Sun, W. W.; Fang, Y., Sparse convex clustering, Journal of Computational and Graphical Statistics, 27, 2, 393-403 (2018) · Zbl 07498956
[43] Wille, L. T.; Vennik, J., Computational complexity of the ground-state determination of atomic clusters, Journal of Physics A: Mathematical and General, 18, 8, L419 (1985)
[44] Witten, D. M.; Tibshirani, R., A framework for feature selection in clustering, Journal of the American Statistical Association, 105, 490, 713-726 (2010) · Zbl 1392.62194
[45] Zhu, C.; Xu, H.; Leng, C.; Yan, S., Convex optimization procedure for clustering: Theoretical revisit, Advances in neural information processing systems, 1619-1627 (2014)
[46] Zou, H., The adaptive Lasso and its oracle properties, Journal of the American statistical Association, 101, 476, 1418-1429 (2006) · Zbl 1171.62326
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.