# zbMATH — the first resource for mathematics

Consensus and complementarity based maximum entropy discrimination for multi-view classification. (English) Zbl 1428.68232
Summary: Maximum entropy discrimination (MED) is a general framework for discriminative estimation based on the maximum entropy and large margin principles, but it just uses only one view of the data not all the views of the data. Although Multi-view maximum entropy discrimination considered the multi-view information of the data, it just respects the consensus principle. There are two common principles named consensus and complementarity in multi-view learning. We aim to take the full advantage of the multi-view information of the data for classification, and respect the two common principles consensus and complementarity simultaneously. In this paper, we propose a new method called consensus and complementarity based MED (MED-2C) for multi-view classification, which well utilizes the two principles consensus and complementarity for multi-view learning (MVL). We first transform data from two views into a common subspace and make the transformed data in the new subspace identical to respect the consensus principle. Then we augment the transformed data with their original features to take into account the complementarity principle. Built on the augmented features and meanwhile by relaxing the consensus principle, the objective function of MED-2C is naturally formed whose inner optimization recovers the traditional MED framework. We provide an instantiation of the MED-2C method and derive the corresponding solution. Experimental results on synthetic data and real-world data show the effectiveness of the proposed MED-2C. It not only performs better than three single-view classification methods but also generally outperforms three multi-view classification methods canonical correlation analysis (CCA), ensemble MED (EMED) and the multi-view variant SVM-2K of the classical support vector machine (SVM) algorithm. In addition, MED-2C performs better than or as well as state-of-the-art MVMED on all the data sets.

##### MSC:
 68T05 Learning and adaptive systems in artificial intelligence 62H30 Classification and discrimination; cluster analysis (statistical aspects)
Full Text:
##### References:
 [1] Blum, A.; Mitchell, T., Combining labeled and unlabeled data with co-training, Proceedings of the 11th Annual Conference on Computational Learning Theory, 92-100 (1998) [2] Chatzis, P. C., Infinite Markov-switching maximum entropy discrimination machines, Proceedings of the 30th International Conference on Machine Learning, 729-737 (2013) [3] Chen, N.; Zhu, J.; Sun, F.; Xing, E. P., Large-margin predictive latent subspace learning for multiview data analysis, IEEE Trans. Patt. Anal. Mach. Intell., 34, 2365-2378 (2012) [4] Daumé III, H., Frustratingly easy domain adaptation, Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, 256-263 (2007) [5] Duan, L.; Xu, D.; Tsang, I. W., Learning with augmented features for heterogeneous domain adaptation, Proceedings of the 29th International Conference on Machine Learning, 711-718 (2012) [6] Farquhar, J.; Hardoon, D.; Meng, H.; Shawe-Taylor, J.; Szedmak, S., Two view learning: SVM-2K, theory and practice, Adv. Neural Inf. Process. Syst., 18, 355-362 (2005) [7] Feldman, D.; Schmidt, M.; Sohler, C., Turning big data into tiny data: constant-size coresets for k-means, pca and projective clustering, Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, 1434-1453 (2013) · Zbl 1421.68219 [8] Feng, J.; Xu, H.; Yan, S., Online robust PCA via stochastic optimization, Advances in Neural Information Processing Systems 26, 404-412 (2013) [9] Finkel, J. R.; Manning, C. D., Hierarchical Bayesian domain adaptation, Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 602-610 (2009) [10] García, S.; Fernández, A.; Luengo, J.; Herrera, F., Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power, Inf. Sci., 180, 2044-2064 (2010) [11] Hardoon, D. R.; Szedmak, S.; Shawe-Taylor, J., Canonical correlation analysis: an overview with application to learning methods, Neural Comput., 16, 2639-2664 (2004) · Zbl 1062.68134 [12] Hong, C.; Yu, J.; You, J.; Chen, X.; Tao, D., Multi-view ensemble manifold regularization for 3d object recognition, Inf. Sci., 320, 395-405 (2015) [13] Hotelling, H., Relations between two sets of variants, Biometrika, 28, 321-377 (1936) · Zbl 0015.40705 [14] Jaakkola, T.; Meila, M.; Jebara, T., Maximum entropy discrimination, Adv. Neural Inf. Process. Syst., 12, 470-476 (1999) [15] Janez, D., Statistical comparisons of classifiers over multiple data sets, J. Mach. Learn. Res., 7, 1-30 (2006) · Zbl 1222.68184 [16] Jebara, T., Multi-task feature and kernel selection for SVMs, Proceedings of the 21st International Conference on Machine Learning, 55-62 (2004) [17] Jebara, T., Multitask sparsity via maximum entropy discrimination, J. Mach. Learn. Res., 12, 75-110 (2011) · Zbl 1280.62077 [18] Jebara, T.; Jaakkola, T., Feature selection and dualities in maximum entropy discrimination, Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence, 291-300 (2000) [19] Jéjou, H.; Chum, O., Negative evidences and co-occurences in image retrieval: The benefit of pca and whitening, 12th European Conference on Computer Vision, 774-787 (2012) [20] Joaquín, D.; García, S.; Molina, D.; Herrera, F., A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms, Swarm. Evol. Compute., 1, 3-18 (2011) [21] Kloft, M.; Brefeld, U.; Sonnenburg, S.; Zien, A., $$ℓ_p$$-norm multiple kernel learning, J. Mach. Learn. Res., 12, 953-997 (2011) · Zbl 1280.68173 [22] Kulis, B.; Saenko, K.; Darrell, T., What you saw is not what you get: Domain adaptation using asymmetric kernel transforms, Proceedings of the 24th IEEE Conference on Computer Vision and Pattern Recognition, 1785-1792 (2011) [23] Nigam, K.; Ghani, R., Analyzing the effectiveness and applicability of co-training, Proceedings of the 9th International Conference on Information and Knowledge Management, 86-93 (2000) [24] Pan, S. J.; Ni, X.; Sun, J. T.; Yang, Q.; Chen, Z., Cross-domain sentiment classification via spectral feature alignment, Proceedings of the 19th International Conference on World Wide Web, 751-760 (2010) [25] Rafailidis, D.; Manolopoulou, S.; Daras, P., A unified framework for multimodal retrieval, Patt. Recog., 46, 3358-3370 (2013) [26] Rosenberg, D. S.; Bartlett, P. L., The Rademacher complexity of co-regularized kernel classes, Proceedings of the 17th International Conference on Artificial Intelligence and Statistics, 396-403 (2007) [27] Sindhwani, V.; Niyogi, P.; Belkin, M., A co-regularization approach to semi-supervised learning with multiple views, Proceedings of the 22th International Conference on Machine Learning Workshop on Learning with Multiple Views, 74-79 (2005) [28] Sindhwani, V.; Rosenberg, D. S., An RKHS for multi-view learning and manifold co-regularization, Proceedings of the 25th International Conference on Machine Learning, 976-983 (2008) [29] Sun, S., A survey of multi-view machine learning, Neural Comput. Appl., 23, 2031-2038 (2013) [30] Sun, S.; Chao, G., Multi-view maximum entropy discrimination, Proceedings of the 23th International Joint Conference on Artificial Intelligence, 1706-1712 (2013) [31] Sun, S.; Shawe-Taylor, J., Sparse semi-supervised learning using conjugate functions, J. Mach. Learn. Res., 11, 2423-2455 (2010) · Zbl 1242.68251 [32] Wang, R.; Kwon, S.; Wang, X.-Z.; Jiang, Q.-S., Segment based decision tree induction with continuous valued attributes, IEEE Trans. Cybernet., 45, 1262-1275 (2015) [33] Wang, W.; Zhou, Z. H., A new analysis of co-training, Proceedings of the 27th International Conference on Machine Learning, 1135-1142 (2010) [34] Wang, X.-Z.; Aamir, R.; Fu, A.-M., Fuzziness based sample categorization for classifier performance improvement, J. Intell. Fuzzy Syst., 29, 1185-1196 (2015) [35] Wang, X.-Z.; Dong, C.-R., Improving generalization of fuzzy if-then rules by maximizing fuzzy entropy, IEEE Trans. Fuzzy Syst., 17, 556-567 (2009) [36] Wang, X.-Z.; Xing, H.-J.; Li, Y.; Hua, Q.; Dong, C.-R.; Pedrycs, W., A study on relationship between generalization abilities and fuzziness of base classifiers in ensemble learning, IEEE Trans. Fuzzy Syst., 23, 1638-1654 (2015) [37] Xu, C.; Tao, D.; Xu, C., A survey on multi-view learning, CoRR (2013) [38] Yu, S.; Krishnapuram, B.; Rosales, R.; Rao, R. B., Bayesian co-training, J. Mach. Learn. Res., 12, 2649-2680 (2011) · Zbl 1280.68219 [39] Zhang, Q.; Sun, S., Multiple-view multiple-learner active learning, Patt. Recog., 43, 3113-3119 (2010) · Zbl 1213.68512 [40] Zhu, J.; Xing, E. P., Maximum entropy discrimination Markov networks, J. Mach. Learn. Res., 10, 2531-2569 (2009) · Zbl 1235.62007 [41] Zhu, J.; Xing, E. P.; Zhang, B., Laplace maximum margin Markov networks, Proceedings of the 25th International Conference on Machine Learning, 1256-1263 (2008) [42] Zhuang, F.; Karypis, G.; Ning, X.; H, Q.; Shi, Z., Multi-view learning via probabilistic latent semantic analysis, Inf. Sci., 199, 20-30 (2012)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.