zbMATH — the first resource for mathematics

Feature selection with dynamic mutual information. (English) Zbl 1183.68540
Summary: Feature selection plays an important role in data mining and pattern recognition, especially for large scale data. During past years, various metrics have been proposed to measure the relevance between different features. Since mutual information is nonlinear and can effectively represent the dependencies of features, it is one of widely used measurements in feature selection. Just owing to these, many promising feature selection algorithms based on mutual information with different parameters have been developed. In this paper, at first a general criterion function about mutual information in feature selector is introduced, which can bring most information measurements in previous algorithms together. In traditional selectors, mutual information is estimated on the whole sampling space. This, however, cannot exactly represent the relevance among features. To cope with this problem, the second purpose of this paper is to propose a new feature selection algorithm based on dynamic mutual information, which is only estimated on unlabeled instances. To verify the effectiveness of our method, several experiments are carried out on sixteen UCI datasets using four typical classifiers. The experimental results indicate that our algorithm achieved better results than other methods in most cases.

68T10 Pattern recognition, speech recognition
C4.5; UCI-ml
Full Text: DOI
[1] Fayyad, U.; Piatetsky-Shapiro, G.; Smyth, P., From data mining to knowledge discovery in databases, AI magazine, 17, 37-54, (1996)
[2] Lindenbaum, M.; Markovitch, S.; Rusakov, D., Selective sampling for nearest neighbor classifiers, Machine learning, 54, 125-152, (2004) · Zbl 1057.68087
[3] Schein, A.I.; Ungar, L.H., Active learning for logistic regression: an evaluation, Machine learning, 68, 235-265, (2007)
[4] M.A. Hall, Correlation-based feature subset selection for machine learning, Ph.D. Dissertation, Department of Computer Science, University of Waikato, Hamilton, New Zealand, 1999
[5] I.K. Fodor, A survey of dimension reduction techniques, Technical Report UCRL-ID-148494, Lawrence Livermore National Laboratory, US Department of Energy, 2002.
[6] Bellman, R., Adaptive control processes: A guided tour, (1961), Princeton University Press Princeton · Zbl 0103.12901
[7] Kohavi, R.; John, G.H., Wrappers for feature subset selection, Artificial intelligence, 97, 1-2, 273-324, (1997) · Zbl 0904.68143
[8] Forman, G., An extensive empirical study of feature selection metrics for text classification, Journal of machine learning research, 3, 1289-1305, (2003) · Zbl 1102.68553
[9] Dy, J.G.; Brodley, C.E.; Kak, A.C.; Broderick, L.S.; Asien, A.M., Unsupervised feature selection applied to content-based retrieval of lung images, IEEE transactions on pattern analysis and machine intelligence, 25, 3, 373-378, (2003)
[10] Swets, D.L.; Weng, J.J., Efficient content-based image retrieval using automatic feature selection, (), 85-90
[11] Saeys, Y.; Inza, I.; Larrañaga, P., A review of feature selection techniques in bioinformatics, Bioinformatics, 23, 19, 2507-2517, (2007)
[12] Xing, E.; Jordan, M.; Karp, R., Feature selection for high-dimensional genomic microarray data, (), 601-608
[13] Lee, W.; Stolfo, S.J.; Mok, K.W., Adaptive intrusion detection: a data mining approach, AI review, 14, 6, 533-567, (2000) · Zbl 0984.68545
[14] Guyon, I.; Elisseeff, A., An introduction to variable and feature selection, Journal of machine learning research, 3, 1157-1182, (2003) · Zbl 1102.68556
[15] Blum, A.L.; Langley, P., Selection of relevant features and examples in machine learning, Artificial intelligence, 97, 245-271, (1997) · Zbl 0904.68142
[16] Liu, H.; Yu, L., Toward integrating feature selection algorithms for classification and clustering, IEEE transactions on knowledge and data engineering, 17, 4, 491-502, (2005)
[17] Quinlan, R., C4.5: programs for machine learning, (1993), Morgan Kaufmann Publishers San Mateo, CA
[18] K. Kira, L. Rendell, A practical approach to feature selection, in: Proceedings of the 9th International Conference on Machine Learning, Morgan Kaufmann, Los Altos, CA, 1992, pp. 249-256.
[19] Dash, M.; Liu, H., Feature selection for classification, Intelligent data analysis: an international journal, 1, 3, 131-156, (1997)
[20] Qu, G.; Hariri, S.; Yousif, M., A new dependency and correlation analysis for features, IEEE transactions on knowledge and data engineering, 17, 9, 1199-1207, (2005)
[21] Dash, M.; Liu, H., Consistency-based search in feature selection, Artificial intelligence, 151, 155-176, (2003) · Zbl 1082.68791
[22] Kononenko, I., Estimating attributes: analysis and extensions of relief, (), 171-182
[23] Robnik-Sikonja, M.; Kononenko, I., Theoretical and empirical analysis of relieff and rrelieff, Machine learning, 53, 23-69, (2003) · Zbl 1076.68065
[24] Liu, H.; Motoda, H.; Yu, A., A selective sampling approach to active feature selection, Artificial intelligence, 159, 1-2, 49-74, (2004) · Zbl 1086.68572
[25] Liang, J.; Yang, S.; Winstanley, A., Invariant optimal feature selection: a distance discriminant and feature ranking based solution, Pattern recognition, 41, 5, 1429-1439, (2008) · Zbl 1140.68470
[26] Devijver, P.A.; Kittler, J., Pattern recognition—A statistical approach, (1992), Prentice-Hall London · Zbl 0458.62047
[27] Mitra, P.; Murthy, C.A.; Pal, S.K., Unsupervised feature selection using feature similarity, IEEE transactions on pattern analysis and machine intelligence, 24, 3, 301-312, (2002)
[28] Bishop, C.M., Neural networks for pattern recognition, (1995), Oxford University Press Oxford
[29] Zhang, D.; Chen, S.; Zhou, Z.-H, Constraint score: a new filter method for feature selection with pairwise constraints, Pattern recognition, 41, 5, 1440-1451, (2008) · Zbl 1140.68490
[30] Pawlak, Z., Rough sets: theoretical aspects of reasoning about data, (1991), Kluwer Academic Publishers Dordrecht · Zbl 0758.68054
[31] Bazan, J.G., A comparison of dynamic and non-dynamic rough set methods for extracting laws from decision table, (), 321-365 · Zbl 1067.68711
[32] Yu, L.; Liu, H., Efficient feature selection via analysis of relevance and redundancy, Journal of machine learning research, 5, 1205-1224, (2004) · Zbl 1222.68340
[33] Amaldi, E.; Kann, V., On the approximation of minimizing non zero variables or unsatisfied relations in linear systems, Theoretical computer science, 209, 1-2, 237-260, (1998) · Zbl 0915.68072
[34] Bell, D.A.; Wang, H., A formalism for relevance and its application in feature subset selection, Machine learning, 41, 175-195, (2000) · Zbl 0968.68072
[35] Somol, P.; Pudil, P.; Kittler, J., Fast branch & bound algorithms for optimal feature selection, IEEE transactions on pattern analysis and machine intelligence, 26, 7, 900-912, (2004)
[36] Huang, J.; Cai, Y.; Xu, X., A hybrid genetic algorithm for feature selection wrapper based on mutual information, Pattern recognition letters, 28, 1825-1844, (2007)
[37] Neumann, J.; Schnorr, C.; Steidl, G., Combined SVM-based feature selection and classification, Machine learning, 61, 129-150, (2005) · Zbl 1137.90643
[38] Dasgupta, A.; Drineas, P.; Harb, B.; Josifovski, V.; Mahnoney, M.W., Feature selection methods for text classification, (), 230-239
[39] Gadat, S.; Younes, L., A stochastic algorithm for feature selection in pattern recognition, Journal of machine learning research, 8, 509-547, (2007) · Zbl 1222.68349
[40] Dy, J.G.; Brodley, C.E., Feature selection for unsupervised learning, Journal of machine learning research, 5, 845-889, (2004) · Zbl 1222.68187
[41] Z. Zhao, H. Liu, Spectral feature selection for supervised and unsupervised learning, in: Proceedings of the 24th international Conference on Machine Learning, Corvalis, Oregon, 2007, pp. 1151-1157.
[42] S. Wu, P.A. Flach, Feature selection with labelled and unlabelled data, in: ECML/PKDD’02 workshop on Integration and Collaboration Aspects of Data Mining, Decision Support and Meta-Learning, 2002, pp. 156-167.
[43] X. Zhu, Semi-supervised learning literature survey, Technical Report 1530, Department of Computer Sciences, University of Wisconsin, Madison, WI, 2007 \(\langle\)http://www.cs.wisc.edu/∼jerryzhu/pub/ssl_survey.pdf⟩.
[44] Song, Y.; Nie, F.; Zhang, C.; Xiang, S., A unified framework for semi-supervised dimensionality reduction, Pattern recognition, 41, 9, 2789-2799, (2008) · Zbl 1154.68501
[45] C.T.A. Traina, Jr., L. Wu, C. Faloutsos, Fast feature selection using the fractal dimension, in: Proceedings of the XV Brazilian Symposium on Databases (SBBD), Paraiba, Brazil, 2000, pp. 158-171.
[46] Bhavani, S.D.; Rani, T.S.; Bapi, R.S., Feature selection using correlation fractal dimension: issues and applications in binary classification problems, Applied soft computing, 8, 555-563, (2007)
[47] Elaine P. Sousa, Caetano Traina, Jr., Agma J. Traina, Leejay Wu, Christos Faloutsos, A fast and effective method to find correlations among attributes in databases, Data Mining and Knowledge Discovery 14 (2007) 367-407.
[48] Cover, T.M.; Thomas, J.A., Elements of information theory, (1991), Wiley New York · Zbl 0762.94001
[49] John, G.H.; Kohavi, R.; Pfleger, K., Irrelevant feature and the subset selection problem, (), 121-129
[50] Fleuret, F., Fast binary feature selection with conditional mutual information, Journal of machine learning research, 5, 1531-1555, (2004) · Zbl 1222.68200
[51] Wang, G.; Lochovsky, F.H.; Yang, Q., Feature selection with conditional mutual information maximin in text categorization, (), 342-349
[52] Al-Ani, A.; Deriche, M.; Chebil, J., A new mutual information based measure for feature selection, Intelligent data analysis, 7, 1, 43-57, (2003)
[53] Battiti, R., Using mutual information for selecting features in supervised neural net learning, IEEE transactions on neural networks, 5, 4, 537-550, (1994)
[54] Peng, H.; Long, F.; Ding, C., Feature selection based on mutual information: criteria of MAX-dependency, MAX-relevance and MIN-redundancy, IEEE transactions on pattern analysis and machine intelligence, 27, 8, 1226-1238, (2005)
[55] Kwak, N.; Choi, C.-H., Input feature selection by mutual information based on parzen window, IEEE transactions on pattern analysis and machine intelligence, 24, 12, 1667-1671, (2002)
[56] Huang, D.; Chow, T.W.S., Effective feature selection scheme using mutual information, Neurocomputing, 63, 325-343, (2005)
[57] Hall, M.A.; Holmes, G., Benchmarking attribute selection techniques for discrete class data mining, IEEE transactions on knowledge and data engineering, 15, 3, 1041-4347, (2003)
[58] C.L. Blake, C.J. Merz, UCI repository of machine learning databases, Available from: \(\langle\)http://www.ics.uci.edu/∼mlearn/MLRepository.html⟩, Department of Information and Computer Science, University of California, Irvine, 1998.
[59] Fayyad, U.M.; Irani, K.B., Multi-interval discretization of continuous valued attributes for classification learning, (), 1022-1027
[60] John, G.H.; Langley, P., Estimating continuous distributions in Bayesian classifiers, (), 338-345
[61] Aha, D.; Kibler, D., Instance-based learning algorithms, Machine learning, 6, 37-66, (1991)
[62] Cohen, W.W., Fast effective rule induction, (), 115-123
[63] Witten, I.H.; Frank, E., Data mining—practical machine learning tools and techniques with JAVA implementations, (2005), Morgan Kaufmann Publishers Los Altos, CA
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.