×

Cost-sensitive label embedding for multi-label classification. (English) Zbl 1440.62245

Summary: Label embedding (LE) is an important family of multi-label classification algorithms that digest the label information jointly for better performance. Different real-world applications evaluate performance by different cost functions of interest. Current LE algorithms often aim to optimize one specific cost function, but they can suffer from bad performance with respect to other cost functions. In this paper, we resolve the performance issue by proposing a novel cost-sensitive LE algorithm that takes the cost function of interest into account. The proposed algorithm, cost-sensitive label embedding with multidimensional scaling (CLEMS), approximates the cost information with the distances of the embedded vectors by using the classic multidimensional scaling approach for manifold learning. CLEMS is able to deal with both symmetric and asymmetric cost functions, and effectively makes cost-sensitive decisions by nearest-neighbor decoding within the embedded vectors. We derive theoretical results that justify how CLEMS achieves the desired cost-sensitivity. Furthermore, extensive experimental results demonstrate that CLEMS is significantly better than a wide spectrum of existing LE algorithms and state-of-the-art cost-sensitive algorithms across different cost functions.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62C25 Compound decision problems in statistical decision theory

Software:

MULAN; Scikit; MEKA
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Balasubramanian, K., & Lebanon, G. (2012). The landmark selection method for multiple output prediction. In ICML.
[2] Barutçuoglu, Z., Schapire, R. E., & Troyanskaya, O. G. (2006). Hierarchical multi-label prediction of gene function. Bioinformatics, 22(7), 830-836. · doi:10.1093/bioinformatics/btk048
[3] Bhatia, K., Jain, H., Kar, P., Varma, M., & Jain, P. (2015). Sparse local embeddings for extreme multi-label classification. In NIPS (pp. 730-738). · Zbl 1280.68207
[4] Bi, W., & Kwok, J. T. (2013). Efficient multi-label classification with many labels. In ICML (pp. 405-413).
[5] Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32. · Zbl 1007.68152 · doi:10.1023/A:1010933404324
[6] Carneiro, G., Chan, A. B., Moreno, P. J., & Vasconcelos, N. (2007). Supervised learning of semantic classes for image annotation and retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(3), 394-410. · doi:10.1109/TPAMI.2007.61
[7] Chen, Y. N., & Lin, H. T. (2012). Feature-aware label space dimension reduction for multi-label classification. In NIPS (pp. 1538-1546).
[8] De Leeuw, J. (1977). Applications of convex analysis to multidimensional scaling. Recent Developments in Statistics (pp. 133-145).
[9] Dembczynski, K., Cheng, W., & Hüllermeier, E. (2010). Bayes optimal multilabel classification via probabilistic classifier chains. In ICML (pp. 279-286).
[10] Dembczynski, K., Waegeman, W., Cheng, W., & Hüllermeier, E. (2011). An exact algorithm for F-measure maximization. In NIPS (pp. 1404-1412).
[11] Ferng, C. S., & Lin, H. T. (2013). Multilabel classification using error-correcting codes of hard or soft bits. IEEE Transactions on Neural Networks and Learning Systems, 24(11), 1888-1900. · doi:10.1109/TNNLS.2013.2269615
[12] Hsu, D., Kakade, S., Langford, J., & Zhang, T. (2009). Multi-label prediction via compressed sensing. In NIPS (pp. 772-780).
[13] Kapoor, A., Viswanathan, R., & Jain, P. (2012). Multilabel classification using bayesian compressed sensing. In NIPS (pp. 2654-2662).
[14] Kruskal, J. B. (1964). Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika, 29(1), 1-27. · Zbl 0123.36803 · doi:10.1007/BF02289565
[15] Li, C. L., & Lin, H. T. (2014). Condensed filter tree for cost-sensitive multi-label classification. In ICML (pp. 423-431). · Zbl 1280.68207
[16] Lin, Z., Ding, G., Hu, M., & Wang, J. (2014). Multi-label classification via feature-aware implicit label space encoding. In ICML (pp. 325-333). · Zbl 1007.68152
[17] Lo, H. Y., Lin, S. D., & Wang, H. M. (2014). Generalized k-labelsets ensemble for multi-label and cost-sensitive classification. IEEE Transactions on Knowledge and Data Engineering, 26(7), 1679-1691. · doi:10.1109/TKDE.2013.112
[18] Lo, H. Y., Wang, J. C., Wang, H. M., & Lin, S. D. (2011). Cost-sensitive multi-label learning for audio tag annotation and retrieval. IEEE Transactions on Multimedia, 13(3), 518-529. · doi:10.1109/TMM.2011.2129498
[19] Madjarov, G., Kocev, D., Gjorgjevikj, D., & Dzeroski, S. (2012). An extensive experimental comparison of methods for multi-label learning. Pattern Recognition, 45(9), 3084-3104. · doi:10.1016/j.patcog.2012.03.004
[20] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al. (2011). Scikit-learn: machine learning in Python. Journal of Machine Learning Research, 12, 2825-2830. · Zbl 1280.68189
[21] Read, J., Pfahringer, B., Holmes, G., & Frank, E. (2011). Classifier chains for multi-label classification. Machine Learning, 85(3), 333-359. · doi:10.1007/s10994-011-5256-5
[22] Read, J., Reutemann, P., Pfahringer, B., & Holmes, G. (2016). MEKA: a multi-label/multi-target extension to Weka. Journal of Machine Learning Research, 17(21), 1-5. · Zbl 1360.68708
[23] Schölkopf, B., Smola, A., & Müller, K. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(5), 1299-1319. · doi:10.1162/089976698300017467
[24] Sun, L., Ji, S., & Ye, J. (2011). Canonical correlation analysis for multilabel classification: a least-squares formulation, extensions, and analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1), 194-200. · doi:10.1109/TPAMI.2010.160
[25] Tai, F., & Lin, H. T. (2012). Multilabel classification with principal label space transformation. Neural Computation, 24(9), 2508-2542. · Zbl 1269.68084 · doi:10.1162/NECO_a_00320
[26] Trohidis, K., Tsoumakas, G., Kalliris, G., & Vlahavas, I. P. (2008). Multi-label classification of music into emotions. In ISMIR (pp. 325-330).
[27] Tsoumakas, G., & Katakis, I. (2007). Multi-label classification: an overview. International Journal of Data Warehousing and Mining, 3(3), 1-13. · doi:10.4018/jdwm.2007070101
[28] Tsoumakas, G., Katakis, I., & Vlahavas, I. P. (2010). Mining multi-label data. In Data Mining and Knowledge Discovery Handbook (pp. 667-685). · Zbl 1280.68207
[29] Tsoumakas, G., Katakis, I., & Vlahavas, I. P. (2011a). Random k-labelsets for multilabel classification. IEEE Transactions on Knowledge and Data Engineering, 23(7), 1079-1089. · doi:10.1109/TKDE.2010.164
[30] Tsoumakas, G., Spyromitros-Xioufis, E., Vilcek, J., & Vlahavas, I. P. (2011b). MULAN: a java library for multi-label learning. Journal of Machine Learning Research, 12, 2411-2414. · Zbl 1280.68207
[31] Weston, J., Chapelle, O., Vapnik, V., Elisseeff, A., & Schölkopf, B. (2002). Kernel dependency estimation. In NIPS (pp. 873-880). · Zbl 1269.68084
[32] Yeh, C. K., Wu, W. C., Ko, W. J., & Wang, Y. C. F. (2017). Learning deep latent space for multi-label classification. In AAAI (pp. 2838-2844).
[33] Yu, H. F., Jain, P., Kar, P., & Dhillon, I. S. (2014). Large-scale multi-label learning with missing labels. In ICML (pp. 593-601) · Zbl 1007.68152
[34] Zhang, Y., & Schneider, J. G. (2011). Multi-label output codes using canonical correlation analysis. In AISTATS (pp. 873-882). · Zbl 1269.68084
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.