×

Label distribution learning by regularized sample self-representation. (English) Zbl 1426.62193

Summary: Multilabel learning that focuses on an instance of the corresponding related or unrelated label can solve many ambiguity problems. Label distribution learning (LDL) reflects the importance of the related label to an instance and offers a more general learning framework than multilabel learning. However, the current LDL algorithms ignore the linear relationship between the distribution of labels and the feature. In this paper, we propose a regularized sample self-representation (RSSR) approach for LDL. First, the label distribution problem is formalized by sample self-representation, whereby each label distribution can be represented as a linear combination of its relevant features. Second, the LDL problem is solved by \(L_2\)-norm least-squares and \(L_{2,1}\)-norm least-squares methods to reduce the effects of outliers and overfitting. The corresponding algorithms are named RSSR-LDL2 and RSSR-LDL21. Third, the proposed algorithms are compared with four state-of-the-art LDL algorithms using 12 public datasets and five evaluation metrics. The results demonstrate that the proposed algorithms can effectively identify the predictive label distribution and exhibit good performance in terms of distance and similarity evaluations.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
68T05 Learning and adaptive systems in artificial intelligence

Software:

SparseLOGREG
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Zhang, M.-L.; Zhang, K., Multi-label learning by exploiting label dependency, Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD-2010 · doi:10.1145/1835804.1835930
[2] Zhang, M.-L.; Zhou, Z.-H., A review on multi-label learning algorithms, IEEE Transactions on Knowledge and Data Engineering, 26, 8, 1819-1837 (2014) · doi:10.1109/TKDE.2013.39
[3] Zhou, Z.-H.; Zhang, M.-L., Multi-instance multi-label learning with application to scene classification, Proceedings of the 20th Annual Conference on Neural Information Processing Systems (NIPS ’06)
[4] Zhang, M.-L.; Wu, L., LIFT: multi-label learning with label-specific features, IEEE Transactions on Pattern Analysis and Machine Intelligence, 37, 1, 107-120 (2015) · doi:10.1109/tpami.2014.2339815
[5] Li, Y.-K.; Zhang, M.-L.; Geng, X., Leveraging implicit relative labeling-importance information for effective multi-label learning, Proceedings of the 15th IEEE International Conference on Data Mining, ICDM 2015 · doi:10.1109/ICDM.2015.41
[6] Zhu, W., Relationship between generalized rough sets based on binary relation and covering, Information Sciences, 179, 3, 210-225 (2009) · Zbl 1163.68339 · doi:10.1016/j.ins.2008.09.015
[7] He, Z.; Li, X.; Zhang, Z.; Wu, F.; Geng, X.; Zhang, Y.; Yang, M.-H.; Zhuang, Y., Data-dependent label distribution learning for age estimation, IEEE Transactions on Image Processing, 26, 8, 3846-3858 (2017) · doi:10.1109/TIP.2017.2655445
[8] Zhou, Y.; Xue, H.; Geng, X., Emotion distribution recognition from facial expressions, Proceedings of the 23rd ACM International Conference on Multimedia, MM 2015 · doi:10.1145/2733373.2806328
[9] Geng, X.; Hou, P., Pre-release prediction of crowd opinion on movies by label distribution learning, Proceedings of the 24th International Joint Conference on Artificial Intelligence, IJCAI 2015
[10] Geng, X., Label Distribution Learning, IEEE Transactions on Knowledge and Data Engineering, 28, 7, 1734-1748 (2016) · doi:10.1109/TKDE.2016.2545658
[11] Geng, X.; Yin, C.; Zhou, Z.-H., Facial age estimation by learning from label distributions, IEEE Transactions on Pattern Analysis and Machine Intelligence, 35, 10, 2401-2412 (2013) · doi:10.1109/TPAMI.2013.51
[12] van Erven, T.; Harremo, P., Rényi divergence and Kullback-Leibler divergence, Institute of Electrical and Electronics Engineers Transactions on Information Theory, 60, 7, 3797-3820 (2014) · Zbl 1360.94180 · doi:10.1109/TIT.2014.2320500
[13] Geng, X.; Wang, Q.; Xia, Y., Facial age estimation by adaptive label distribution learning, Proceedings of the 22nd International Conference on Pattern Recognition, ICPR 2014 · doi:10.1109/ICPR.2014.764
[14] Pearce, J.; Ferrier, S., Evaluating the predictive performance of habitat models developed using logistic regression, Ecological Modelling, 133, 3, 225-245 (2000) · doi:10.1016/S0304-3800(00)00322-7
[15] Zhang, Z.; Wang, M.; Geng, X., Crowd counting in public video surveillance by label distribution learning, Neurocomputing, 166, 151-163 (2015) · doi:10.1016/j.neucom.2015.03.083
[16] Xing, C.; Geng, X.; Xue, H., Logistic boosting regression for label distribution learning, Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’16)
[17] Yang, X.; Gao, B.-B.; Xing, C.; Huo, Z.-W.; Wei, X.-S.; Zhou, Y.; Wu, J.; Geng, X., Deep Label Distribution Learning for Apparent Age Estimation, Proceedings of the 15th IEEE International Conference on Computer Vision Workshops, ICCVW 2015 · doi:10.1109/ICCVW.2015.53
[18] Gao, B.-B.; Xing, C.; Xie, C.-W.; Wu, J.; Geng, X., Deep label distribution learning with label ambiguity, IEEE Transactions on Image Processing, 26, 6, 2825-2838 (2017) · Zbl 1409.94163 · doi:10.1109/TIP.2017.2689998
[19] Elhamifar, E.; Vidal, R., Sparse subspace clustering: algorithm, theory, and applications, IEEE Transactions on Pattern Analysis and Machine Intelligence, 35, 11, 2765-2781 (2013) · doi:10.1109/TPAMI.2013.57
[20] Zhao, H.; Zhu, P.; Wang, P.; Hu, Q., Hierarchical Feature Selection with Recursive Regularization, Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence · doi:10.24963/ijcai.2017/487
[21] Luo, X.; Chang, X.; Ban, X., Regression and classification using extreme learning machine based on L1-norm and L2-norm, Neurocomputing, 174, 179-186 (2016) · doi:10.1016/j.neucom.2015.03.112
[22] Hou, C.; Nie, F.; Li, X.; Yi, D.; Wu, Y., Joint embedding learning and sparse regression: A framework for unsupervised feature selection, IEEE Transactions on Cybernetics, 44, 6, 793-804 (2014) · doi:10.1109/TCYB.2013.2272642
[23] Zhang, B.; Perina, A.; Murino, V.; Del Bue, A., Sparse representation classification with manifold constraints transfer, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015 · doi:10.1109/CVPR.2015.7299086
[24] Luo, D.; Ding, C.; Huang, H., Towards structural sparsity: An explicit \(ℓ2/ℓ 0\) approach, Proceedings of the 10th IEEE International Conference on Data Mining, ICDM 2010 · doi:10.1109/ICDM.2010.155
[25] Nie, F.; Huang, H.; Cai, X.; Ding, C. H., Efficient and robust feature selection via joint l2,1-norms minimization, Advances in Neural Information Processing Systems, 1813-1821 (2010), MIT Press
[26] Mosci, S.; Rosasco, L.; Santoro, M.; Verri, A.; Villa, S., Solving structured sparsity regularization with proximal methods, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics): Preface, 6322, 2, 418-433 (2010) · doi:10.1007/978-3-642-15883-4_27
[27] Wu, F.; Han, Y.; Tian, Q.; Zhuang, Y., Multi-label boosting for image annotation by structural grouping sparsity, Proceedings of the 18th ACM International Conference on Multimedia ACM Multimedia 2010, (MM’10) · doi:10.1145/1873951.1873957
[28] Cai, D.; Zhang, C.; He, X., Unsupervised feature selection for multi-cluster data, Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’10), ACM · doi:10.1145/1835804.1835848
[29] Zhu, P.; Zuo, W.; Zhang, L.; Hu, Q.; Shiu, S. C. K., Unsupervised feature selection by regularized self-representation, Pattern Recognition, 48, 2, 438-446 (2015) · Zbl 1373.68344 · doi:10.1016/j.patcog.2014.08.006
[30] Hou, C.; Nie, F.; Yi, D., Feature selection via joint embedding learning and sparse regression, Proceedings of the 22nd International Joint Conference on Artificial Intelligence
[31] Shevade, S. K.; Keerthi, S. S., A simple and efficient algorithm for gene selection using sparse logistic regression, Bioinformatics, 19, 17, 2246-2253 (2003) · doi:10.1093/bioinformatics/btg308
[32] Zhu, J.; Rosset, S.; Hastie, T.; Tibshirani, R., 1-norm support vector machines, Conference on Neural Information Processing Systems
[33] Li, C.-N.; Shao, Y.-H.; Deng, N.-Y., Robust L1-norm two-dimensional linear discriminant analysis, Neural Networks, 65, 92-104 (2015) · Zbl 1394.68298 · doi:10.1016/j.neunet.2015.01.003
[34] McCalla, W. J., Linear equation solution, 37 (1988), USA: Springer, USA · doi:10.1007/978-1-4613-2011-1_2
[35] Cha, S., Comprehensive survey on distance/similarity measures between probability density functions, International Journal of Mathematical Models and Methods in Applied Sciences, 1, 2, 300-307 (2007)
[36] Duda, R. O.; Hart, P. E.; Stork, D. G., Pattern Classification (2001), New York, NY, USA · Zbl 0968.68140
[37] Fahroo, F.; Ross, I. M., Direct trajectory optimization by a Chebyshev pseudospectral method, Proceedings of the American Control Conference · doi:10.1109/ACC.2000.876945
[38] Pearson, K. X., On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling, The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 50, 302, 157-175 (1900) · JFM 31.0238.04
[39] Deza, E.; Deza, M.-M., Dictionary of distances (2006), Elsevier · doi:10.1016/B978-0-444-52087-6.X5000-8
[40] Lin, H.-T.; Lin, C.-J.; Weng, R. C., A note on Platt’s probabilistic outputs for support vector machines, Machine Learning, 68, 3, 267-276 (2007) · Zbl 1471.68220 · doi:10.1007/s10994-007-5018-6
[41] Wu, T.-F.; Lin, C.-J.; Weng, R. C., Probability estimates for multi-class classification by pairwise coupling, Journal of Machine Learning Research (JMLR), 5, 975-1005 (2004) · Zbl 1222.68336
[42] Malouf, R., A comparison of algorithms for maximum entropy parameter estimation, Proceedings of the proceeding of the 6th conference · doi:10.3115/1118853.1118871
[43] Chekroud, A. M.; Zotti, R. J.; Shehzad, Z.; Gueorguieva, R.; Johnson, M. K.; Trivedi, M. H.; Cannon, T. D.; Krystal, J. H.; Corlett, P. R., Cross-trial prediction of treatment outcome in depression: A machine learning approach, The Lancet Psychiatry, 3, 3, 243-250 (2016) · doi:10.1016/S2215-0366(15)00471-X
[44] Xu, C.; Liu, T.; Tao, D.; Xu, C., Local Rademacher complexity for multi-label learning, IEEE Transactions on Image Processing, 25, 3, 1495-1507 (2016) · Zbl 1408.94734 · doi:10.1109/TIP.2016.2524207
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.