Logical analysis of numerical data. (English) Zbl 0887.90179

Summary: “Logical analysis of data” (LAD) is a methodology developed since the late eighties, aimed at discovering hidden structural information in data sets. LAD was originally developed for analyzing binary data by using the theory of partially defined Boolean functions. An extension of LAD for the analysis of numerical data sets is achieved through the process of “binarization” consisting in the replacement of each numerical variable by binary “indicator” variables, each showing whether the value of the original variable is above or below a certain level. Binarization was successfully applied to the analysis of a variety of real life data sets. This paper develops the theoretical foundations of the binarization process studying the combinatorial optimization problems related to the minimization of the number of binary variables. To provide an algorithmic framework for the practical solution of such problems, we construct compact linear integer programming formulations of them. We develop polynomial time algorithms for some of these minimization problems, and prove NP-hardness of others.


90C90 Applications of mathematical programming
68P99 Theory of data
90C60 Abstract computational complexity for mathematical programming problems
90C27 Combinatorial optimization


Full Text: DOI


[1] D. Angluin, Queries and concept learning,Machine Learning 2 (1988) 319–342.
[2] E. Balas and A. Ho, Set covering algorithms using cutting planes, heuristics and subgradient optimization: A computational study,Mathematical Programming 12 (1980) 37–60. · Zbl 0435.90074
[3] R. Bellman,Introduction to Matrix Analysis (McGraw-Hill, New York, 1960). · Zbl 0124.01001
[4] J.C. Bioch and T. Ibaraki, Complexity of identification and dualization of positive Boolean functions,Information and Computation 123 (1995) 50–63. · Zbl 1096.68633
[5] E. Boros, V. Gurvich, P.L. Hammer, T. Ibaraki and A. Kogan, Decompositions of partially defined Boolean functions,Discrete Applied Mathematics 62 (1995) 51–75. · Zbl 0833.68090
[6] E. Boros, P.L. Hammer, T. Ibaraki, A. Kogan, Logical analysis of numerical data, RUTCOR Research Report RRR 04-97, RUTCOR, Rutgers University, 1997. · Zbl 0887.90179
[7] E. Boros, P.L. Hammer, T. Ibaraki, A. Kogan, E. Mayoraz and I. Muchnik, An implementation of logical analysis of data, RUTCOR Research Report RRR 22-96, RUTCOR, Rutgers University, 1996.
[8] E. Boros, T. Ibaraki and K. Makino, Error-free and best-fit extensions of partially defined Boolean functions, RUTCOR Research Report RRR 14-95, RUTCOR, Rutgers University, 1995. · Zbl 0892.68091
[9] E. Boros, T. Ibaraki and K. Makino, Extensions of partially defined Boolean functions with missing data, RUTCOR Research Report RRR 06-96, RUTCOR, Rutgers University, 1996. · Zbl 1010.94568
[10] P. Clark and T. Niblett, The CN2 induction algorithm,Machine Learning 3 (1989) 261–283.
[11] Y. Crama, P.L. Hammer and T. Ibaraki, Cause-effect relationships and partially defined Boolean functions,Annals of Operations Research 16 (1988) 299–326. · Zbl 0709.03533
[12] R. Dechter and J. Pearl, Structure identification in relational data,Artificial Intelligence 58 (1992) 237–270. · Zbl 0782.68095
[13] O. Ekin, P.L. Hammer and A. Kogan, On connected Boolean functions, RUTCOR Research Report RRR 36-96, RUTCOR, Rutgers University, 1996. · Zbl 0937.06014
[14] M.R. Garey and D.S. Johnson,Computers and Intractability (Freeman, New York, 1979). · Zbl 0411.68039
[15] P.L. Hammer, Partially defined Boolean functions and cause-effect relationships, in:Proceedings of the International Conference on Multi-Attribute Decision Making Via OR-Based Expert Systems, University of Passau, Passau, Germany, 1986.
[16] A.B. Hammer, P.L. Hammer and I. Muchnik, Logical analysis of Chinese productivity patterns, RUTCOR Report RR 1-96, RUTCOR, Rutgers University, 1996.
[17] S.J. Hong, R-MINI: An iterative approach for generating minimal rules from examples,IEEE Transactions on Knowledge and Data Engineering, to appear.
[18] O.L. Mangasarian, Mathematical programming in machine learning, in: G. Di Pillo and F. Giannessi, eds.,Nonlinear Optimization and Applications (Plenum Publishing, New York, 1996) 283–295. · Zbl 1052.90637
[19] I. Muchnik and L. Yampolsky, A pilot study of the concurrent validity of logical analysis of data, as applied to two Beck inventories, RUTCOR Technical Report RTR 02-95, RUTCOR, Rutgers University, 1995.
[20] S. Muroga,Threshold Logic and Its Applications (Wiley-Interscience, New York, 1971). · Zbl 0243.94014
[21] P.M. Murphy and D.W. Aha, UCI repository of machine learning databases: Machine readable data repository, Department of Computer Science, University of California, Irvine, 1994.
[22] J.R. Quinlan, Induction of decision trees,Machine Learning 1 (1986) 81–106.
[23] J.R. Quinlan and R. Rivest, Inferring decision trees using minimum description length principle,Information and Computation 80 (1989) 227–248. · Zbl 0664.94015
[24] L.G. Valiant, A theory of the learnable,Communications of the ACM 27 (1984) 1134–1142. · Zbl 0587.68077
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.