zbMATH — the first resource for mathematics

Compressed labeling on distilled labelsets for multi-label learning. (English) Zbl 1243.68259
Summary: Directly applying single-label classification methods to the multi-label learning problems substantially limits both the performance and speed due to the imbalance, dependence and high dimensionality of the given label matrix. Existing methods either ignore these three problems or reduce one with the price of aggravating another. In this paper, we propose a \(\{0,1\}\) label matrix compression and recovery method termed “compressed labeling (CL)” to simultaneously solve or at least reduce these three problems. CL first compresses the original label matrix to improve balance and independence by preserving the signs of its Gaussian random projections. Afterward, we directly utilize popular binary classification methods (e.g., support vector machines) for each new label. A fast recovery algorithm is developed to recover the original labels from the predicted new labels. In the recovery algorithm, a “labelset distilling method” is designed to extract distilled labelsets (DLs), i.e., the frequently appeared label subsets from the original labels via recursive clustering and subtraction. Given a distilled and an original label vector, we discover that the signs of their random projections have an explicit joint distribution that can be quickly computed from a geometric inference. Based on this observation, the original label vector is exactly determined after performing a series of Kullback-Leibler divergence based hypothesis tests on the distribution about the new labels. CL significantly improves the balance of the training samples and reduces the dependence between different labels. Moreover, it accelerates the learning process by training fewer binary classifiers for compressed labels, and makes use of label dependence via DLs based tests. Theoretically, we prove the recovery bounds of CL which verifies the effectiveness of CL for label compression and multi-label classification performance improvement brought by label correlations preserved in DLs. We show the effectiveness, efficiency and robustness of CL via 5 groups of experiments on 21 datasets from text classification, image annotation, scene classification, music categorization, genomics and web page classification.

68T05 Learning and adaptive systems in artificial intelligence
62H30 Classification and discrimination; cluster analysis (statistical aspects)
Full Text: DOI
[1] Bian, W.; Tao, D., MAX-MIN distance analysis by using sequential sdp relaxation for dimension reduction, IEEE Transactions on Pattern Analysis and Machine Intelligence, 33, 1037-1050, (2011)
[2] Bianchi, N.C.; Gentile, C.; Zaniboni, L., Incremental algorithms for hierarchical classification, Journal of Machine Learning Research, 7, 31-54, (2006) · Zbl 1222.68161
[3] Boutell, M.R.; Luo, J.; Shen, X.; Brown, C.M., Learning multi-label scene classification, Pattern Recognition, 37, 1757-1771, (2004)
[4] Breiman, L.; Friedman, J.H., Predicting multivariate responses in multiple linear regression (with discussion), The Journal of the Royal Statistical Society Series B, 54, 5-54, (1997) · Zbl 0897.62068
[5] Candès, E.J.; Romberg, J.K.; Tao, T., Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information, IEEE Transactions on Information Theory, 52, 489-509, (2006) · Zbl 1231.94017
[6] Chang, C.-C., & Lin, C.-J. (2001). LIBSVM: a library for support vector machines. Software available athttp://www.csie.ntu.edu.tw/ cjlin/libsvm. · Zbl 1111.68629
[7] Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P., Smote: synthetic minority over-sampling technique, Journal of Artificial Intelligence Research, 16, 321-357, (2002) · Zbl 0994.68128
[8] Chen, J.; Liu, J.; Ye, J., Learning incoherent sparse and low-rank patterns from multiple tasks, (2010)
[9] Cheng, W.; Hüllermeier, E., Combining instance-based learning and logistic regression for multilabel classification, Machine Learning, 76, 211-225, (2009)
[10] Clare, A.; King, R. D., Knowledge discovery in multi-label phenotype data, 42-53, (2001), London · Zbl 1009.68730
[11] Clarkson, K.L., Tighter bounds for random projections of manifolds, 39-48, (2008) · Zbl 1221.68261
[12] Crammer, K.; Singer, Y., A family of additive online algorithms for category ranking, Journal of Machine Learning Research, 3, 1025-1058, (2003) · Zbl 1061.68543
[13] Dasgupta, S., Experiments with random projection, San Francisco, CA, USA
[14] Dasgupta, S.; Freund, Y., Random projection trees and low dimensional manifolds, 537-546, (2008) · Zbl 1231.68114
[15] Dembczyński, K.; Cheng, W.; Hüllermeier, E., Bayes optimal multilabel classification via probabilistic classifier chains, (2010)
[16] Dembczyński, K.; Waegeman, W.; Cheng, W.; Hüllermeier, E., On label dependence in multi-label classification, 5-13, (2010) · Zbl 1243.68237
[17] Dietterich, T.G.; Bakiri, G., Solving multiclass learning problems via error-correcting output codes, Journal of Artificial Intelligence Research, 2, 263-282, (1995) · Zbl 0900.68358
[18] Diplaris, S.; Tsoumakas, G.; Mitkas, P.A.; Vlahavas, I., Protein classification with multiple algorithms, 448-456, (2005)
[19] Donoho, D.L., Compressed sensing, IEEE Transactions on Information Theory, 52, 1289-1306, (2006) · Zbl 1288.94016
[20] Duygulu, P.; Barnard, K.; Freitas, N.; Forsyth, D., Object recognition as machine translation: learning a lexicon for a fixed image vocabulary, 97-112, (2002), London · Zbl 1039.68623
[21] Efron, B.; Hastie, T.; Johnstone, L.; Tibshirani, R., Least angle regression, Annals of Statistics, 32, 407-499, (2002) · Zbl 1091.62054
[22] Escalera, S.; Pujol, O.; Radeva, P., On the decoding process in ternary error-correcting output codes, IEEE Transactions on Pattern Analysis and Machine Intelligence, 32, 120-134, (2010)
[23] Evgeniou, T.; Pontil, M., Regularized multi-task learning, 109-117, (2004)
[24] Fürnkranz, J.; Hüllermeier, E.; Mencía, E.L.; Brinker, K., Multilabel classification via calibrated label ranking, Machine Learning, 73, 133-153, (2008)
[25] Ghamrawi, N.; McCallum, A., Collective multi-label classification, 195-200, (2005)
[26] Goemans, M.X.; Williamson, D.P., Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming, Journal of the ACM, 42, 1115-1145, (1995) · Zbl 0885.68088
[27] Gomez, J., Boiy, E., & Moens, M.-F. (2011). Highly discriminative statistical features for email classification. Knowledge and Information Systems. doi:10.1007/s10115-011-0403-7. · Zbl 1168.62354
[28] Gretton, A.; Bousquet, O.; Smola, E.; Schölkopf, B., Measuring statistical dependence with Hilbert-Schmidt norms, 63-77, (2005), Berlin · Zbl 1168.62354
[29] Guan, N.; Tao, D.; Luo, Z.; Yuan, B., Non-negative patch alignment framework, IEEE Transactions on Neural Networks, 22, 1218-1230, (2011)
[30] Gupta, A.; Nowak, R.; Recht, B., Sample complexity for 1-bit compressed sensing and sparse classification, (2010)
[31] Halko, N., Martinsson, P.-G., & Tropp, J.A. (2009). Finding structure with randomness: Stochastic algorithms for constructing approximate matrix decompositions. arXiv:0909.4061. · Zbl 1269.65043
[32] Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: data mining, inference, and prediction (2nd ed.). Springer series in statistics. Berlin: Springer. Corr. 3rd printing edition. · Zbl 1273.62005
[33] Hsu, D.; Kakade, S.M.; Langford, J.; Zhang, T., Multi-label prediction via compressed sensing, (2009)
[34] Hüllermeier, E.; Fürnkranz, J.; Cheng, W.; Brinker, K., Label ranking by learning pairwise preferences, Artificial Intelligence, 172, 1897-1916, (2008) · Zbl 1184.68403
[35] Indyk, P.; Motwani, R., Approximate nearest neighbors: towards removing the curse of dimensionality, 604-613, (1998) · Zbl 1029.68541
[36] Jia, S.; Ye, J., An accelerated gradient method for trace norm minimization, 457-464, (2009)
[37] Jia, S., Tang, L., Yu, S., & Ye, J. (2010). A shared-subspace learning framework for multi-label classification. ACM Transactions on Knowledge Discovery from Data, 2(1). · Zbl 1288.94016
[38] Johnson, W.; Lindenstrauss, J., Extensions of Lipschitz mappings into a Hilbert space, New Haven, Conn., 1982, Providence · Zbl 0539.46017
[39] Katakis, I.; Tsoumakas, G.; Vlahavas, I., Multilabel text classification for automated tag suggestion, (2008)
[40] Kong, X., & Yu, P. (2011). gmlc: a multi-label feature selection framework for graph classification. Knowledge and Information Systems. doi:10.1007/s10115-011-0407-3.
[41] Koufakou, A.; Secretan, J.; Georgiopoulos, M., Non-derivable itemsets for fast outlier detection in large high-dimensional categorical data, Knowledge and Information Systems, 29, 697-725, (2011)
[42] Kullback, S.; Leibler, R.A., On information and sufficiency, The Annals of Mathematical Statistics, 22, 79-86, (1951) · Zbl 0042.38403
[43] Langford, J.; Beygelzimer, A., Sensitive error correcting output codes, No. 3559, 158-172, (2005) · Zbl 1137.68551
[44] Li, P., Estimators and tail bounds for dimension reduction in \(ℓ\)_{\(α\)}(0<\(α\)≤2) using stable random projections, 10-19, (2008), Philadelphia · Zbl 1193.62050
[45] Li, P., Approximating higher-order distances using random projections, (2010)
[46] Liu, L.; Liang, Q., A high-performing comprehensive learning algorithm for text classification without pre-labeled training set, Knowledge and Information Systems, 29, 727-738, (2011)
[47] Luxburg, U., A tutorial on spectral clustering, Statistics and Computing, 17, 395-416, (2007)
[48] MacQueen, J.B., Some methods for classification and analysis of multivariate observations, No. 1, 281-297, (1967)
[49] Masud, M., Woolam, C., Gao, J., Khan, L., Han, J., Hamlen, K., & Oza, N. (2011). Facing the reality of data stream classification: coping with scarcity of labeled data. Knowledge and Information Systems. doi:10.1007/s10115-011-0447-8.
[50] Mencía, L.; Fürnkranz, J., Pairwise learning of multilabel classifications with perceptrons, 995-1000, (2008)
[51] Ng, A.Y.; Jordan, M.I.; Weiss, Y., On spectral clustering: analysis and an algorithm, No. 2, 849-856, (2001) · Zbl 1024.34045
[52] Osuna, E., Freund, R., & Girosi, F. (1997). Support vector machines: Training and applications (Technical report). Massachusetts Institute of Technology. · Zbl 1221.68261
[53] Raginsky, M.; Lazebnik, S., Locality-sensitive binary codes from shift-invariant kernels, (2009)
[54] Read, J. (2010). Meka softwares.
[55] Read, J.; Pfahringer, B.; Holmes, G., Multi-label classification using ensembles of pruned sets, Washington, DC, USA
[56] Read, J.; Pfahringer, B.; Holmes, G.; Frank, E., Classifier chains for multi-label classification, Machine Learning, 85, 333-359, (2009)
[57] Schapire, R.E.; Singer, Y., Boostexter: a boosting-based system for text categorization, Machine Learning, 39, 135-168, (2000) · Zbl 0951.68561
[58] Si, S.; Tao, D.; Geng, B., Bregman divergence-based regularization for transfer subspace learning, IEEE Transactions on Knowledge and Data Engineering, 22, 929-942, (2010)
[59] Snoek, C. G. M.; Worring, M.; Gemert, J. C.; Geusebroek, J. M.; Smeulders, A. W. M., The challenge problem for automated detection of 101 semantic concepts in multimedia, 421-430, (2006), New York
[60] Tao, D.; Li, X.; Wu, X.; Maybank, S.J., Geometric Mean for subspace selection, IEEE Transactions on Pattern Analysis and Machine Intelligence, 31, 260-274, (2009)
[61] Trohidis, K.; Tsoumakas, G.; Kalliris, G.; Vlahavas, I., Multilabel classification of music into emotions, Philadelphia, PA, USA
[62] Tsoumakas, G. (2010). Mulan: A java library for multi-label learning. http://mulan.sourceforge.net/. · Zbl 1280.68207
[63] Tsoumakas, G.; Katakis, I., Multi-label classification: an overview, International Journal of Data Warehousing and Mining, 3, 1-13, (2007)
[64] Tsoumakas, G.; Katakis, I.; Vlahavas, I., Effective and efficient multilabel classification in domains with large number of labels, (2008)
[65] Tsoumakas, G.; Katakis, I.; Vlahavas, I., Mining multi-label data, (2010)
[66] Tsoumakas, G.; Vlahavas, I., Random \(k\)-labelsets: an ensemble method for multilabel classification, Warsaw, Poland, 2007
[67] Ueda, N.; Saito, K., Parametric mixture models for multi-labeled text, (2002)
[68] Vapnik, V. N. (1995). The nature of statistical learning theory. New York: Springer. · Zbl 0833.62008
[69] Vempala, S.S. (2004). The random projection method. DIMACS Series in Discrete Mathematics and Theoretical Computer Science: Vol.65. Providence: Am. Math. Soc. · Zbl 1058.68063
[70] Yates, F., Contingency tables involving small numbers and the \(χ\)\^{}{2} test, Journal of the Royal Statistical Society, 1, 217-235, (1934) · JFM 63.1093.01
[71] Zhang, M.; Wang, Z., Mimlrbf: RBF neural networks for multi-instance multi-label learning, Neurocomputing, 72, 3951-3956, (2009)
[72] Zhang, M.; Zhou, Z., Multi-label neural networks with applications to functional genomics and text categorization, IEEE Transactions on Knowledge and Data Engineering, 18, 1338-1351, (2006)
[73] Zhang, M.; Zhou, Z., Ml-knn: A lazy learning approach to multi-label learning, Pattern Recognition, 40, 2038-2048, (2007) · Zbl 1111.68629
[74] Zhang, Y.; Zhou, Z., Multi-label dimensionality reduction via dependence maximization, 1503-1505, (2008)
[75] Zhou, T.; Tao, D.; Wu, X., Nesvm: A fast gradient method for support vector machines, 679-688, (2010)
[76] Zhou, T.; Tao, D.; Wu, X., Manifold elastic net: a unified framework for sparse dimension reduction, Data Mining and Knowledge Discovery (Springer), 22, 340-371, (2011) · Zbl 1235.62097
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.