zbMATH — the first resource for mathematics

A pool-based pattern generation algorithm for logical analysis of data with automatic fine-tuning. (English) Zbl 1347.62107
Summary: We address the binary classification problem, in which one is given a set of observations, characterized by a number of (binary and non-binary) attributes and wants to determine which class each observation belongs to. The proposed classification algorithm is based on the Logical Analysis of Data (LAD) technique and belongs to the class of supervised learning algorithms. We introduce a novel metaheuristic-based approach for pattern generation within LAD. The key idea relies on the generation of a pool of patterns for each given observation of the training set. Such a pool is built with one or more criteria in mind (e.g., diversity, homogeneity, coverage, etc.), and is paramount in the achievement of high classification accuracy, as shown by the computational results we obtained. In addition, we address one of the major concerns of many data mining algorithms, i.e., the fine-tuning and calibration of parameters. We employ here a novel technique, called biased Random-Key Genetic Algorithm that allows the calibration of all the parameters of the algorithm in an automatic fashion, hence reducing the fine-tuning effort required and enhancing the performance of the algorithm itself. We tested the proposed approach on 10 benchmark instances from the UCI repository and we proved that the algorithm is competitive, both in terms of classification accuracy and running time.

62H30 Classification and discrimination; cluster analysis (statistical aspects)
68T05 Learning and adaptive systems in artificial intelligence
Full Text: DOI
[1] Alexe, G.; Alexe, S.; Axelrod, D. E.; Hammer, P. L.; Weissmann, D., Logical analysis of diffuse large B-cell lymphomas, Artificial Intelligence in Medicine, 34, 3, 235-267, (2005)
[2] Alexe, G.; Alexe, S.; Hammer, P. L., Pattern-based clustering and attribute analysis, Soft Computing, 10, 5, 442-452, (2006)
[3] Alexe, G.; Hammer, P. L., Spanned patterns for the logical analysis of data, Discrete Applied Mathematics, 154, 7, 1039-1049, (2006) · Zbl 1090.68094
[4] Bazan, J.; Nguyen, H.; Nguyen, S.; Synak, P.; Wróblewski, J., Rough set algorithms in classification problem, (Polkowski, L.; Tsumoto, S.; Lin, T. Y., Rough set methods and applications, (2000), Physica-Verlag Heidelberg, Germany), 49-88 · Zbl 0992.68197
[5] Blake, C. L. & Merz, C. J. (1998). UCI Repository of Machine Learning Databases.
[6] Bonates, T. O.; Hammer, P. L.; Kogan, A., Maximum patterns in datasets, Discrete Applied Mathematics, 156, 6, 846-861, (2008) · Zbl 1140.68457
[7] Boros, E.; Crama, Y.; Hammer, P. L.; Ibaraki, T.; Kogan, A.; Makino, K., Logical analysis of data: classification with justification, Annals of Operations Research, 188, 1, 33-61, (2011) · Zbl 1230.68165
[8] Boros, E.; Hammer, P. L.; Ibaraki, T.; Kogan, A., Logical analysis of numerical data, Mathematical Programming, 79, 1-3, 163-190, (1997) · Zbl 0887.90179
[9] Boros, E.; Hammer, P. L.; Ibaraki, T.; Kogan, A.; Mayoraz, E.; Muchnik, I., An implementation of logical analysis of data, Knowledge and Data Engineering, IEEE Transactions on, 12, 2, 292-306, (2000)
[10] Caserta, M.; Quiñonez, E., A cross entropy-Lagrangean hybrid algorithm for the multi-item capacitated lot-sizing problem with setup times, Computers & Operations Research, 36, 2, 530-548, (2009) · Zbl 1163.90452
[11] Chikalov, I.; Lozin, V.; Lozina, I.; Moshkov, M.; Nguyen, H.; Skowron, A.; Zielosko, B., Logical analysis of data: theory, methodology and applications, Three approaches to data analysis, Intelligent Systems Reference Library, Vol. 41, 147-192, (2013), Springer Berlin, Heidelberg
[12] Cohen, W.; Singer, Y., A simple, fast, and effective rule learner, Proceedings of the sixteenth national conference on artificial intelligence, 335-342, (1999), AAAI Press
[13] Cohen, W. W., Fast effective rule induction, Proceedings of the twelfth international conference on machine learning, 115-123, (1995), Morgan Kaufmann
[14] De Boer, P.; Kroese, D. P.; Mannor, S.; Rubinstein, R. Y., A tutorial on the cross-entropy method, Annals of Operations Research, 134, 1, 19-67, (2005) · Zbl 1075.90066
[15] Dembczynski, K.; Kotlowski, W.; Slowinski, R., ENDER: A statistical framework for boosting decision rules, Data Mining and Knowledge Discovery, 21, 1, 52-90, (2010)
[16] Demsar, J., Statistical comparisons of classifiers over multiple data sets, Journal of Machine Learning Research, 7, 1-30, (2006) · Zbl 1222.68184
[17] Dietterich, T., Overfitting and undercomputing in machine learning, ACM Computing Surveys, 27, 3, 326-327, (1995)
[18] Friedman, J.; Popescu, B. E., Predictive learning via rule ensemble, The Annals of Applied Statistics, 2, 3, 916-954, (2008) · Zbl 1149.62051
[19] Fürnkranz, J., Separate-and-conquer rule learning, Artificial Intellingence Review, 13, 1, 3-54, (1999) · Zbl 0922.68030
[20] Ghasemi, A.; Esameili, S., Optimal condition-based maintenance replacement based on logical analysis of data (LAD), Lecture Notes in Engineering and Computer Science, 2203, 1, 830-833, (2013)
[21] Gonçalves, J. F.; Resende, M., Biased random-key genetic algorithms for combinatorial optimization, Journal of Heuristics, 17, 5, 487-525, (2011)
[22] Greco, S.; Matarazzo, B.; Slowinski, R., Rough sets theory for multicriteria decision analysis, European Journal of Operational Research, 129, 1, 1-47, (2001) · Zbl 1008.91016
[23] Hammer, P. L.; Bonates, T., Logical analysis of data - an overview: from combinatorial optimization to medical applications, Annals of Operations Research, 148, 1, 203-225, (2006) · Zbl 1104.92034
[24] Hammer, P. L.; Kogan, A.; Lejeune, M. A., Modeling country risk ratings using partial orders, European Journal of Operational Research, 175, 2, 836-859, (2006) · Zbl 1142.91724
[25] Hammer, P. L.; Kogan, A.; Simeone, B.; Szedmk, S., Pareto-optimal patterns in logical analysis of data, Discrete Applied Mathematics, 144, 12, 79-102, (2004) · Zbl 1078.62504
[26] Kotlowski, W.; Slowinski, R., Rule learning with monotonicity constraints, Proceedings of the 26th international conference on machine learning, 537-544, (2009)
[27] Malioutov, D.; Varshney, K., Exact rule learning via Boolean compressed sensing, (JMLR:W&CP, Proceedings of the 30th international conference on machine learning, Vol. 28, (2013)), 765-773
[28] Marchand, M.; Shawe-Taylor, J., The set covering machine, Journal of Machine Learning Research, 3, 723-746, (2002) · Zbl 1112.68392
[29] Morán-Mirabal, L. F.; González-Velarde, J. L.; Resende, M., Automatic tuning of GRASP with evolutionary path-relinking, (Blesa, M. J.; Blum, C.; Festa, P.; Roli, A.; Sampels, M., Hybrid metaheuristics, Lecture Notes in Computer Science, Vol. 7919, (2013), Springer), 62-77
[30] Mortada, M. A.; Yacout, S.; Lakis, A., Fault diagnosis in power transformers using multi-class logical analysis of data, Journal of Intelligent Manufacturing, 1-11, (2013)
[31] Pawlak, Z., Rough sets, International Journal of Information and Computer Sciences, 11, 145-172, (1982)
[32] Pawlak, Z., Rough sets - Theoretical aspects of reasoning about data, (1991), Kluwer · Zbl 0758.68054
[33] Pawlak, Z., Rough set theory and its applications to data analysis, Cybernetics and Systems, 29, 7, 661-688, (1998) · Zbl 1008.03526
[34] Quinland, J. R., C45: Programs for machine learning, (1993), Morgan Kaufmann
[35] Resende, M.; Ribeiro, C., GRASP with path-relinking: recent advances and applications, (Ibaraki, T.; Nonobe, K.; Yagiura, M., Metaheuristics: Progress as real problem solvers, (2005), Kluwer Academic Publishers), 29-63
[36] Rubinstein, R. Y.; Kroese, D. P., The cross-entropy method: A unified approach to combinatorial optimization, monte carlo simulation, and machine learning, (2004), Springer-Verlag · Zbl 1140.90005
[37] Sörensen, K., Distance measures based on the edit distance for permutation-type representations, Journal of Heuristics, 13, 1, 35-47, (2007)
[38] Toso, R. F.; Resende, M., A C++ application programming interface for biased random-key genetic algorithms, Technical Report, (2012), AT&T Labs Research Florham Park, NJ
[39] Witten, I.; Frank, E., Data mining: Practical machine learning tools and techniques, (2005), Morgan Kaufmann · Zbl 1076.68555
[40] Wnek, J.; Michalski, R., Hypothesis-driven constructive induction in AQ17-HCI: A method and experiments, Machine Learning, 14, 139-168, (1994) · Zbl 0804.68125
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.