×

Regularized greedy column subset selection. (English) Zbl 1451.68250

Summary: The Column Subset Selection Problem is a hard combinatorial optimization problem that provides a natural framework for unsupervised feature selection, and there exist efficient algorithms that provide good approximations. The drawback of the problem formulation is that it incorporates no form of regularization, and is therefore very sensitive to noise when presented with scarce data. In this paper we propose a regularized formulation of this problem, and derive a correct greedy algorithm that is similar in efficiency to existing greedy methods for the unregularized problem. We study its adequacy for feature selection and propose suitable formulations. Additionally, we derive a lower bound for the error of the proposed problems. Through various numerical experiments on real and synthetic data, we demonstrate the significantly increased robustness and stability of our method, as well as the improved conditioning of its output, all while remaining efficient for practical use.

MSC:

68T20 Problem solving in the context of artificial intelligence (heuristics, search strategies, etc.)
68T05 Learning and adaptive systems in artificial intelligence
90C27 Combinatorial optimization

Software:

COIL-20; PMTK; MNIST
PDFBibTeX XMLCite
Full Text: DOI arXiv Link

References:

[1] Altschuler, J.; Bhaskara, A.; Fu, G.; Mirrokni, V.; Rostamizadeh, A.; Zadimoghaddam, M., Greedy column subset selection: new bounds and distributed algorithms, International Conference on Machine Learning, 2539-2548 (2016)
[2] Bishop, C. M., Training with noise is equivalent to tikhonov regularization, Neural Comput., 7, 1, 108-116 (1995)
[3] Boutsidis, C.; Drineas, P.; Magdon-Ismail, M., Near-optimal column-based matrix reconstruction, SIAM J. Comput., 43, 2, 687-717 (2014) · Zbl 1298.65079
[4] Boutsidis, C.; Mahoney, M. W.; Drineas, P., An improved approximation algorithm for the column subset selection problem, Proc. of the 20th Annual ACM-SIAM Symp. on Discrete Algorithms, 968-977 (2009), Soc. for Industrial and Applied Mathematics · Zbl 1420.68235
[5] Cai, D.; Zhang, C.; He, X., Unsupervised feature selection for multi-cluster data, Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 333-342 (2010), ACM
[6] Chan, T. F., Rank revealing qr factorizations, Linear Algebra Appl., 88 (1987) · Zbl 0624.65025
[7] Çivril, A., Column subset selection problem is ug-hard, J. Comput. Syst. Sci., 80, 4, 849-859 (2014) · Zbl 1285.68052
[8] Civril, A.; Magdon-Ismail, M., Column subset selection via sparse approximation of svd, Theor. Comput. Sci., 421, 1-14 (2012) · Zbl 1238.65030
[9] Cortes, C.; Vapnik, V., Support-vector networks, Mach. Learn., 20, 3, 273-297 (1995) · Zbl 0831.68098
[10] Deshpande, A.; Rademacher, L., Efficient volume sampling for row/column subset selection, Foundations of Computer Science (FOCS), 2010 51st Annual IEEE Symposium on, 329-338 (2010), IEEE
[11] Du, L.; Shen, Y.-D., Unsupervised feature selection with adaptive structure learning, Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 209-218 (2015), ACM
[12] Dy, J. G.; Brodley, C. E.; Kak, A.; Broderick, L. S.; Aisen, A. M., Unsupervised feature selection applied to content-based retrieval of lung images, IEEE Trans. Pattern Anal. Mach. Intell., 25, 3, 373-378 (2003)
[13] Eckart, C.; Young, G., The approximation of one matrix by another of lower rank, Psychometrika, 1, 3, 211-218 (1936) · JFM 62.1075.02
[14] Fanty, M.; Cole, R., Spoken letter recognition, Advances in Neural Information Processing Systems, 220-226 (1991)
[15] Farahat, A. K.; Ghodsi, A.; Kamel, M. S., An efficient greedy method for unsupervised feature selection, Data Mining (ICDM), 2011 IEEE 11th International Conference on, 161-170 (2011), IEEE
[16] Fernandes, K.; Vinagre, P.; Cortez, P., A proactive intelligent decision support system for predicting the popularity of online news, Progress in Artificial Intelligence, 535-546 (2015), Springer
[17] Frieze, A.; Kannan, R.; Vempala, S., Fast monte-carlo algorithms for finding low-rank approximations, J. ACM (JACM), 51, 6, 1025-1041 (2004) · Zbl 1125.65005
[18] Georghiades, A. S.; Belhumeur, P. N.; Kriegman, D. J., From few to many: illumination cone models for face recognition under variable lighting and pose, Pattern Anal. Mach. Intell., IEEE Trans., 23, 6, 643-660 (2001)
[19] Golub, G. H.; Hansen, P. C.; O’Leary, D. P., Tikhonov regularization and total least squares, SIAM J. Matrix Anal. Appl., 21, 1, 185-194 (1999) · Zbl 0945.65042
[20] Gu, M.; Eisenstat, S. C., Efficient algorithms for computing a strong rank-revealing qr factorization, SIAM J. Sci. Comput., 17, 4, 848-869 (1996) · Zbl 0858.65044
[21] Guruswami, V.; Sinop, A. K., Optimal column-based low-rank matrix reconstruction, Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms, 1207-1214 (2012), SIAM · Zbl 1421.68227
[22] Guyon, I.; Elisseeff, A., An introduction to variable and feature selection, J. Mach. Learn. Res., 3, 1157-1182 (2003) · Zbl 1102.68556
[23] He, R.; Tan, T.; Wang, L.; Zheng, W.-S., l 2, 1 regularized correntropy for robust feature selection, Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, 2504-2511 (2012), IEEE
[24] He, X.; Cai, D.; Niyogi, P., Laplacian score for feature selection, Advances in Neural Information Processing Systems, 507-514 (2005)
[25] Hoerl, A. E.; Kennard, R. W., Ridge regression: biased estimation for nonorthogonal problems, Technometrics, 12, 1, 55-67 (1970) · Zbl 0202.17205
[26] Hou, C.; Nie, F.; Li, X.; Yi, D.; Wu, Y., Joint embedding learning and sparse regression: a framework for unsupervised feature selection, IEEE Trans. Cybern., 44, 6, 793-804 (2014)
[27] Hou, C.; Nie, F.; Yi, D.; Wu, Y., Feature selection via joint embedding learning and sparse regression, IJCAI Proceedings-International Joint Conference on Artificial Intelligence, 22, 1324 (2011)
[28] Krogh, A.; Hertz, J. A., A simple weight decay can improve generalization, Advances in Neural Information Processing Systems, 950-957 (1992)
[29] LeCun, Y.; Cortes, C.; Burges, C. J., Mnist handwritten digit database, AT&T Labs [Online]. Available: http://yann. lecun. com/exdb/mnist, 2 (2010)
[30] Li, Z.; Yang, Y.; Liu, J.; Zhou, X.; Lu, H., Unsupervised feature selection using nonnegative spectral analysis., AAAI, 2012, 1026-1032 (2012)
[31] Lutkepohl, H., Handbook of matrices., Comput. Stat. Data Anal., 2, 25, 243 (1997)
[32] Mahoney, M. W.; Drineas, P., Cur matrix decompositions for improved data analysis, Proc. Natl. Acad. Sci., 106, 3, 697-702 (2009) · Zbl 1202.68480
[33] Mitra, P.; Murthy, C.; Pal, S. K., Unsupervised feature selection using feature similarity, Pattern Anal. Mach. Intell., IEEE Trans., 24, 3, 301-312 (2002)
[34] Murphy, K. P., Machine Learning: A Probabilistic Perspective (2012), The MIT Press, https://dl.acm.org/citation.cfm?id=2380985 · Zbl 1295.68003
[35] Nene, S. A.; Nayar, S. K.; Murase, H., Columbia Object Image Library (COIL-20), Technical Report (1996), Columbia University
[36] Ordozgoiti, B.; Canaval, S. G.; Mozo, A., A fast iterative algorithm for improved unsupervised feature selection, Data Mining (ICDM), 2016 IEEE 16th International Conference on, 390-399 (2016), IEEE
[37] Ordozgoiti, B.; Canaval, S. G.; Mozo, A., Iterative column subset selection, Knowl. Inf. Syst., 1-30 (2017)
[38] Qian, M.; Zhai, C., Robust unsupervised feature selection., IJCAI, 1621-1627 (2013)
[39] Samaria, F. S.; Harter, A. C., Parameterisation of a stochastic model for human face identification, Applications of Computer Vision, 1994., Proceedings of the Second IEEE Workshop on, 138-142 (1994), IEEE
[40] Shitov, Y., Column subset selection is np-complete, arXiv preprint arXiv:1701.02764 (2017)
[41] Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R., Dropout: a simple way to prevent neural networks from overfitting, J. Mach. Learn. Res., 15, 1, 1929-1958 (2014) · Zbl 1318.68153
[42] Thompson, R. C., Principal submatrices ix: interlacing inequalities for singular values of submatrices, Linear Algebra Appl., 5, 1, 1-12 (1972) · Zbl 0252.15009
[43] Tibshirani, R., Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society. Series B (Methodological), 267-288 (1996) · Zbl 0850.62538
[44] Wang, S.; Pedrycz, W.; Zhu, Q.; Zhu, W., Unsupervised feature selection via maximum projection and minimum redundancy, Knowl. Based Syst., 75, 19-29 (2015)
[45] Wang, S.; Tang, J.; Liu, H., Embedded unsupervised feature selection., AAAI, 470-476 (2015)
[46] Xu, Z.; King, I.; Lyu, M. R.-T.; Jin, R., Discriminative semi-supervised feature selection via manifold regularization, IEEE Trans. Neural Netw., 21, 7, 1033-1047 (2010)
[47] Yang, Y.; Shen, H. T.; Ma, Z.; Huang, Z.; Zhou, X., l2, 1-norm regularized discriminative feature selection for unsupervised learning, IJCAI Proceedings-International Joint Conference on Artificial Intelligence, 22, 1589 (2011)
[48] Zhao, Z.; Liu, H., Spectral feature selection for supervised and unsupervised learning, Proceedings of the 24th International Conference on Machine Learning, 1151-1157 (2007), ACM
[49] Zhao, Z.; Wang, L.; Liu, H.; Ye, J., On similarity preserving feature selection, IEEE Trans. Knowl. Data Eng., 25, 3, 619-632 (2013)
[50] Zhao, Z.; Wang, L.; Liu, H., Efficient spectral feature selection with minimum redundancy., AAAI, 673-678 (2010)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.