×

zbMATH — the first resource for mathematics

Wrappers for feature subset selection. (English) Zbl 0904.68143
Summary: In the feature subset selection problem, a learning algorithm is faced with the problem of selecting a relevant subset of features upon which to focus its attention, while ignoring the rest. To achieve the best possible performance with a particular learning algorithm on a particular training set, a feature subset selection method should consider how the algorithm and the training set interact. We explore the relation between optimal feature subset selection and relevance. Our wrapper method searches for an optimal feature subset tailored to a particular algorithm and a domain. We study the strengths and weaknesses of the wrapper approach and show a series of improved designs. We compare the wrapper approach to induction without feature subset selection and to Relief, a filter approach to feature subset selection. Significant improvement in accuracy is achieved for some datasets for the two families of induction algorithms used: decision trees and Naive-Bayes.

MSC:
68T05 Learning and adaptive systems in artificial intelligence
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Aha, D.W., Tolerating noisy, irrelevant and novel attributes in instance-based learning algorithms, Internat. J. man-machine studies, 36, 267-287, (1992)
[2] Aha, D.W.; Bankert, R.L., Feature selection for case-based classification of cloud types: an empirical comparison, (), 106-112
[3] Aha, D.W.; Bankert, R.L., A comparative evaluation of sequential feature selection algorithms, (), 1-7
[4] Aha, D.W.; Kibler, D.; Albert, M.K., Instance-based learning algorithms, Machine learning, 6, 37-66, (1991)
[5] Almuallim, H.; Dietterich, T.G., Learning with many irrelevant features, (), 547-552
[6] Almuallim, H.; Dietterich, T.G., Learning Boolean concepts in the presence of many irrelevant features, Artificial intelligence, 69, 279-306, (1994) · Zbl 0942.68657
[7] Anderson, J.R.; Matessa, M., Explorations of an incremental, Bayesian algorithm for categorization, Machine learning, 9, 275-308, (1992)
[8] Atkeson, C.G., Using locally weighted regression for robot learning, (), 958-963
[9] Bala, J.; Jong, K.A.D.; Haung, J.; Vafaie, H.; Wechsler, H., Hybrid learning using genetic algorithms and decision trees for pattern classification, (), 719-724
[10] Ben-Bassat, M., Use of distance measures, information measures and error bounds in feature evaluation, (), 773-791
[11] Berliner, H., The B^∗ tree search algorithm: a best-first proof procedure, (), 12, 79-87, (1979), reprinted in:
[12] Blum, A.L.; Rivest, R.L., Training a 3-node neural network is NP-complete, Neural networks, 5, 117-127, (1992)
[13] Boddy, M.; Dean, T., Solving time-dependent planning problems, (), 979-984
[14] Brazdil, P.; Gama, J.; Henery, B., Characterizing the applicability of classification algorithms using meta-level learning, ()
[15] Breiman, L., Bagging predictors, Machine learning, 24, 123-140, (1996) · Zbl 0858.68080
[16] Breiman, L.; Friedman, J.H.; Olshen, R.A.; Stone, C.J., ()
[17] Buntine, W., Learning classification trees, Statist. and comput., 2, 63-73, (1992)
[18] Cardie, C., Using decision trees to improve case-based learning, (), 25-32
[19] Caruana, R.; Freitag, D., Greedy attribute selection, (), 28-36
[20] Cestnik, B., Estimating probabilities: a crucial task in machine learning, (), 147-149
[21] Cover, T.M.; Campenhout, J.M.V., On the possible orderings in the measurement selection problem, IEEE trans. systems man cybernet., 7, 657-661, (1977) · Zbl 0371.62036
[22] Dasarathy, B.V., ()
[23] De Mántaras, R.L., A distance-based attribute selection measure for decision tree induction, Machine learning, 6, 81-92, (1991)
[24] Devijver, P.A.; Kittler, J., ()
[25] Doak, J., An evaluation of feature selection methods and their application to computer security, ()
[26] Domingos, P.; Pazzani, M., Beyond independence: conditions for the optimality of the simple Bayesian classifier, (), 105-112
[27] Dougherty, J.; Kohavi, R.; Sahami, M., Supervised and unsupervised discretization of continuous features, (), 194-202
[28] Draper, N.R.; Smith, H., ()
[29] Duda, R.; Hart, P., ()
[30] Fayyad, U.M., On the induction of decision trees for multiple concept learning, ()
[31] Fayyad, U.M.; Irani, K.B., The attribute selection problem in decision tree generation, (), 104-110
[32] Fong, P.W.L., A quantitative study of hypothesis selection, (), 226-234
[33] Freund, Y., Boosting a weak learning algorithm by majority, (), 202-216
[34] also: Inform. and Comput., to appear.
[35] Freund, Y.; Schapire, R.E., A decision-theoretic generalization of on-line learning and an application to boosting, (), 23-37
[36] Furnival, G.M.; Wilson, R.W., Regression by leaps and bounds, Technometrics, 16, 499-511, (1974) · Zbl 0294.62079
[37] Geman, S.; Bienenstock, E.; Doursat, R., Neural networks and the bias/variance dilemma, Neural comput., 1-48, (1992)
[38] Gennari, J.H.; Langley, P.; Fisher, D., Models of incremental concept formation, Artificial intelligence, 40, 11-61, (1989)
[39] Ginsberg, M.L., ()
[40] Goldberg, D.E., ()
[41] Good, I.J., ()
[42] Greiner, R., Probabilistic Hill climbing: theory and applications, (), 60-67
[43] Hancock, T.R., On the difficulty of finding small consistent decision trees, (1989), Harvard University Cambridge, MA, Unpublished manuscript
[44] Hoeffding, I.W., Probability inequalities for sums of bounded random variables, J. amer. statist. assoc., 58, 13-30, (1963) · Zbl 0127.10602
[45] Holland, J.H., ()
[46] Hyafil, L.; Rivest, R.L., Constructing optimal binary decision trees is NP-complete, Inform. process. lett., 5, 15-17, (1976) · Zbl 0333.68029
[47] John, G.H., Enhancements to the data mining process, ()
[48] John, G.; Kohavi, R.; Pfleger, K., Irrelevant features and the subset selection problem, (), 121-129
[49] Judd, S., On the complexity of loading shallow neural networks, J. complexity, 4, 177-192, (1988) · Zbl 0648.68086
[50] Kaelbling, L.P., ()
[51] Kira, K.; Rendell, L.A., The feature selection problem: traditional methods and a new algorithm, (), 129-134
[52] Kira, K.; Rendell, L.A., A practical approach to feature selection, ()
[53] Kittler, J., Une généralisation de quelques algorithms sous-optimaux de recherche d’ensembles d’attributs, ()
[54] Kittler, J., (), 59-83, Chapter 3
[55] Kohavi, R., Feature subset selection as search with probabilistic estimates, (), 122-126
[56] Kohavi, R., The power of decision tables, (), 174-189
[57] Kohavi, R., A study of cross-validation and bootstrap for accuracy estimation and model selection, (), 1137-1143
[58] Kohavi, R., Wrappers for performance enhancement and oblivious decision graphs, ()
[59] Kohavi, R.; Frasca, B., Useful feature subsets and rough set reducts, (), 310-317
[60] also: in: Soft Computing by Lin and Wildberger.
[61] Kohavi, R.; John, G., Automatic parameter selection by minimizing estimated error, (), 304-312
[62] Kohavi, R.; Sommerfield, D., Feature subset selection using the wrapper model: overfilling and dynamic search space topology, (), 192-197
[63] Kohavi, R.; Wolpert, D.H., Bias plus variance decomposilion for zero-one loss funclions, (), 275-283, available at:
[64] Kohavi, R.; Sommerfield, D.; Dougherty, J., Data mining using MLC++: A machine learning library in C++, (), 234-245
[65] Kononenko, I.; De Raedt, L., Estimating attributes: analysis and extensions of relief, ()
[66] Kononenko, I., On biases in estimating multi-valued attributes, (), 1034-1040
[67] Koza, J., ()
[68] Krogh, A.; Vedelsby, J., Neural network ensembles, cross validation, and active learning, ()
[69] Kwok, S.W.; Carter, C., Multiple decision trees, (), 327-335
[70] Laarhoven, P.; Aarts, E., ()
[71] Langley, P., Selection of relevant features in machine learning, (), 140-144
[72] Langley, P.; Sage, S., Sage, induction of selective Bayesian classifiers, (), 399-406
[73] Langley, P.; Sage, S., Oblivious decision trees and abslracl cases, (), 113-117
[74] Langley, P.; Iba, W.; Thompson, K., An analysis of Bayesian classifiers, (), 223-228
[75] Linhart, H.; Zucchini, W., ()
[76] Littlestone, N.; Warmuth, M.K., The weighted majority algorithm, Inform. and comput., 108, 212-261, (1994) · Zbl 0804.68121
[77] Mallows, C.L., Some comments on cp, Technometrics, 15, 661-675, (1973) · Zbl 0269.62061
[78] Marill, T.; Green, D.M., On the effectiveness of receptors in recognition systems, IEEE trans. inform. theory, 9, 11-17, (1963)
[79] Maron, O.; Moore, A.W., Hoeffding races: accelerating model selection search for classification and function approximation, ()
[80] Merz, C.J.; Murphy, P.M., UCI repository of machine learning databases, (1996)
[81] Miller, A.J., Selection of subsets of regression variables, J. roy. statist. soc. A, 147, 389-425, (1984) · Zbl 0584.62106
[82] Miller, A.J., ()
[83] Minsky, M.L.; Papert, S., (), expanded ed.
[84] Mladenić, D., Automated model selection, ()
[85] Modrzejewski, M., Feature selection using rough sets theory, (), 213-226
[86] Moore, A.W.; Lee, M.S., Efficient algorithms for minimizing cross validation error, ()
[87] Moret, B.M.E., Decision trees and diagrams, ACM comput. surveys, 14, 593-623, (1982)
[88] Murthy, S.; Salzberg, S., Lookahead and pathology in decision tree induction, (), 1025-1031
[89] Narendra, M.P.; Fukunaga, K., A branch and bound algorithm for feature subset selection, IEEE trans. comput., 26, 917-922, (1977) · Zbl 0363.68059
[90] Neter, J.; Wasserman, W.; Kutner, M.H., ()
[91] Pawlak, Z., ()
[92] Pawlak, Z., Rough sets: present state and the future, Found. comput. decision sci., 18, 157-166, (1993) · Zbl 0803.04007
[93] Pazzani, M.J., Searching for dependencies in Bayesian classifiers, () · Zbl 0728.68098
[94] Perrone, M., Improving regression estimation: averaging methods for variance reduction with extensions to general convex measure optimization, ()
[95] Provan, G.M.; Singh, M., Learning Bayesian networks using feature selection, (), 450-456
[96] Provost, F.J., Policies for the selection of bias in inductive machine learning, ()
[97] Provost, F.J.; Buchanan, B.G., Inductive policy: the pragmatics of bias selection, Machine learning, 20, 35-61, (1995)
[98] Quinlan, J.R., Induction of decision trees, (), 1, 81-106, (1986), reprinted in
[99] Quinlan, J.R., ()
[100] Quinlan, J.R., Oversearching and layered search in empirical learning, (), 1019-1024
[101] Rendell, L.; Seshu, R., Learning hard concepts through constructive induction: framework and rationale, Comput. intell., 6, 247-270, (1990)
[102] Rosenblatt, F., The perceptron: a probabilistic model for information storage and organization in the brain, Psychological review, 65, 386-408, (1958)
[103] Russell, S.J.; Norvig, P., ()
[104] Schaffer, C., Selecting a classification method by cross-validation, Machine learning, 13, 135-143, (1993)
[105] Schapire, R.E., The strength of weak learnability, Machine learning, 5, 197-227, (1990)
[106] Siedlecki, W.; Sklansky, J., On automatic feature selection, Internat. J. pattern recognition and artificial intelligence, 2, 197-220, (1988)
[107] Singh, M.; Provan, G.M., A comparison of induction algorithms for selective and non-selective Bayesian classifiers, (), 497-505
[108] Skalak, D.B., Prototype and feature selection by sampling and random mutation Hill climbing algorithms, ()
[109] Street, W.N.; Mangasarian, O.L.; Wolberg, W.H., An inductive learning approach to prognostic prediction, ()
[110] Taylor, C.; Michie, D.; Spiegelhalter, D., ()
[111] Thrun, The Monk’s problems: a performance comparison of different learning algorithms, ()
[112] Turney, P.D., Exploiting context when learning to classify, (), 402-407
[113] Turney, P.D., The identification of context-sensitive features, a formal definition of context for concept learning, (), 53-59, also available as: National Research Council of Canada Tech. Rept. #39222
[114] Utgoff, P.E., An improved algorithm for incremental induction of decision trees, (), 318-325
[115] Utgoff, P.E., Decision tree induction based on efficient tree restructuring, () · Zbl 0886.68105
[116] Vafai, H.; De Jong, K., Genetic algorithms as a tool for feature selection in machine learning, (), 200-203
[117] Vafai, H.; De Jong, K., Robust feature selection algorithms, (), 356-363
[118] Wolpert, D.H., On the connection between in-sample testing and generalization error, Complex systems, 6, 47-94, (1992) · Zbl 0792.68144
[119] Wolpert, D.H., Stacked generalization, Neural networks, 5, 241-259, (1992)
[120] Xu, L.; Yan, P.; Chang, T., Best first strategy for feature selection, (), 706-708
[121] Yan, D.; Mukai, H., Stochastic discrete optimization, SIAM J. control and optimization, 30, 594-612, (1992) · Zbl 0764.90066
[122] Yu, B.; Yuan, B., A more efficient branch and bound algorithm for feature selection, Pattern recognition, 26, 883-889, (1993)
[123] Ziarko, W., The discovery, analysis and representation of data dependencies in databases, ()
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.