×

A global MINLP approach to symbolic regression. (English) Zbl 1402.90092

Summary: Symbolic regression methods generate expression trees that simultaneously define the functional form of a regression model and the regression parameter values. As a result, the regression problem can search many nonlinear functional forms using only the specification of simple mathematical operators such as addition, subtraction, multiplication, and division, among others. Currently, state-of-the-art symbolic regression methods leverage genetic algorithms and adaptive programming techniques. Genetic algorithms lack optimality certifications and are typically stochastic in nature. In contrast, we propose an optimization formulation for the rigorous deterministic optimization of the symbolic regression problem. We present a mixed-integer nonlinear programming (MINLP) formulation to solve the symbolic regression problem as well as several alternative models to eliminate redundancies and symmetries. We demonstrate this symbolic regression technique using an array of experiments based upon literature instances. We then use a set of 24 MINLPs from symbolic regression to compare the performance of five local and five global MINLP solvers. Finally, we use larger instances to demonstrate that a portfolio of models provides an effective solution mechanism for problems of the size typically addressed in the symbolic regression literature.

MSC:

90C10 Integer programming
90C26 Nonconvex programming, global optimization
62J02 General nonlinear regression
68T05 Learning and adaptive systems in artificial intelligence
68Q99 Theory of computing
65K05 Numerical mathematical programming methods
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Akaike, H, A new look at the statistical model identification, IEEE Trans. Autom. Control, 19, 716-723, (1974) · Zbl 0314.62039 · doi:10.1109/TAC.1974.1100705
[2] Astarabadi, SSM; Ebadzadeh, MM, A decomposition method for symbolic regression problems, Appl. Soft Comput., 62, 514-523, (2018) · doi:10.1016/j.asoc.2017.10.041
[3] Austel, V., Dash, S., Gunluk, O., Horesh, L., Liberti, L., Nannicini, G., Schieber, B.: Globally optimal symbolic regression. https://arxiv.org/abs/1710.10720 (2017) · Zbl 1035.90051
[4] Balasubramaniam, P; Kumar, AVA, Solution of matrix Riccati differential equation for nonlinear singular system using genetic programming, Genet. Program. Evol. Mach., 10, 71-89, (2008) · doi:10.1007/s10710-008-9072-z
[5] Belotti, P; Lee, J; Liberti, L; Margot, F; Wächter, A, Branching and bounds tightening techniques for non-convex MINLP, Optim. Methods Softw., 24, 597-634, (2009) · Zbl 1179.90237 · doi:10.1080/10556780903087124
[6] Berthold, T., Gamrath, G., Hendel, G., Heinz, S., Koch, T., Pfetsch, M., Vigerske, S., Waniek, R., Winkler, M., Wolter, K.: SCIP 3.2, User’s Manual. Zuse Institute, Berlin, Germany (2016) · Zbl 0850.62538
[7] Bettenhausen, K.D., Marenbach, P., Freyer, S., Rettenmaier, H., Nieken, U.: Self-organizing structured modelling of a biotechnological fed-batch fermentation by means of genetic programming. In: First International Conference on (Conf. Publ. No. 414) Genetic Algorithms in Engineering Systems: Innovations and Applications, 1995. GALESIA, pp. 481-486 (1995) · Zbl 1099.90047
[8] Bonami, P; Biegler, LT; Conn, AR; Cornuejols, G; Grossmann, IE; Laird, CD; Lee, J; Lodi, A; Margot, F; Sawaya, N; Wächter, A, An algorithmic framework for convex mixed integer nonlinear programs, Discrete Optim., 5, 186-204, (2008) · Zbl 1151.90028 · doi:10.1016/j.disopt.2006.10.011
[9] Byrd, RH; Nocedal, J; Waltz, RA; Pillo, G (ed.); Roma, M (ed.), KNITRO: an integrated package for nonlinear optimization, 35-59, (2006), Boston · Zbl 1108.90004 · doi:10.1007/0-387-30065-1_4
[10] Chen, C; Luoa, C; Jiang, Z, Block building programming for symbolic regression, Neurocomputing, 275, 1973-1980, (2018) · doi:10.1016/j.neucom.2017.10.047
[11] Chen, S.H.: Genetic Algorithms and Genetic Programming in Computational Finance. Springer, New York, NY (2002) · doi:10.1007/978-1-4615-0835-9
[12] Conn, A.R., Scheinberg, K., Vicente, L.N.: Introduction to Derivative-Free Optimization. Society for Industrial and Applied Mathematics, Philadelphia, PA (2009) · Zbl 1163.49001 · doi:10.1137/1.9780898718768
[13] Cozad, A.: Data- and theory-driven techniques for surrogate-based optimization. Ph.D. thesis, Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh, PA (2014)
[14] Cozad, A; Sahinidis, NV; Miller, DC, Learning surrogate models for simulation-based optimization, AIChE J., 60, 2211-2227, (2014) · doi:10.1002/aic.14418
[15] Dubčáková, R, Eureqa: software review, Genet. Program. Evol. Mach., 12, 173-178, (2011) · doi:10.1007/s10710-010-9124-z
[16] Duran, MA; Grossmann, IE, An outer-approximation algorithm for a class of mixed-integer nonlinear programs, Math. Program., 36, 307-339, (1986) · Zbl 0619.90052 · doi:10.1007/BF02592064
[17] GAMS/SBB. User’s Manual. https://www.gams.com/latest/docs/S_SBB.html. Accessed 8 May 2018
[18] Grossmann, IE, Review of nonlinear mixed-integer and disjunctive programming techniques, Optim. Eng., 3, 227-252, (2002) · Zbl 1035.90050 · doi:10.1023/A:1021039126272
[19] Keane, MA; Koza, JR; Rice, JP, Finding an impulse response function using genetic programming, IEEE Am. Control Conf., 1, 2345-2350, (1993)
[20] Keijzer, M.: Improving symbolic regression with interval arithmetic and linear scaling. In: Ryan, C., Soule, T., Keijzer, M., Tsang, E., Poli, R., Costa, E. (eds.) Genetic Programming, pp. 70-82. Springer, Berlin (2003) · Zbl 1033.68630
[21] Kishore, JK; Patnaik, LM; Mani, V; Agrawal, VK, Application of genetic programming for multicategory pattern classification, IEEE Trans. Evolut. Comput., 4, 242-258, (2000) · doi:10.1109/4235.873235
[22] Korns, M.F.: Accuracy in symbolic regression. In: Riolo, R., Vladislavleva, E., Moore, J.H. (eds.) Genetic Programming Theory and Practice IX, pp. 129-151. Springer, Berlin (2011)
[23] Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge, MA (1992) · Zbl 0850.68161
[24] Koza, J.R.: Genetic Programming II: Automatic Discovery of Reusable Programs. MIT Press, Cambridge, MA (1994) · Zbl 0850.68160
[25] Lin, Y; Schrage, L, The global solver in the LINDO API, Optim. Methods Softw., 24, 657-668, (2009) · Zbl 1177.90325 · doi:10.1080/10556780902753221
[26] McDermott, J., O’Reilly, U.-M., Luke, S., White, D.: Problem Classification. http://www.gpbenchmarks.org/wiki/ (2014). Accessed 8 May 2018
[27] McKay, B; Willis, M; Barton, G, Steady-state modelling of chemical process systems using genetic programming, Comput. Chem. Eng., 21, 981-996, (1997) · doi:10.1016/S0098-1354(96)00329-8
[28] McKay, B; Willis, M; Searson, D; Montague, G, Non-linear continuum regression using genetic programming, GECCO, 2, 1106-1111, (1999)
[29] Misener, R; Floudas, ChA, ANTIGONE: algorithms for continuous/integer global optimization of nonlinear equations, J. Glob. Optim., 59, 503-526, (2014) · Zbl 1301.90063 · doi:10.1007/s10898-014-0166-2
[30] Schmidt, M; Lipson, H, Distilling free-form natural laws from experimental data, Science, 324, 81-85, (2009) · doi:10.1126/science.1165893
[31] Smits, G.F., Kotanchek, M.: Pareto-front exploitation in symbolic regression. In: O’Reilly, U.-M., Yu, T., Riolo, R., Worzel, B. (eds.) Genetic Programming Theory and Practice II, pp. 283-299. Springer, Berlin (2005) · Zbl 1108.90004
[32] Stoica, P; Selén, Y, Model-order selection: a review of information criterion rules, IEEE Signal Process. Mag., 21, 36-47, (2004) · doi:10.1109/MSP.2004.1311138
[33] Symbolic regression problems. http://minlp.com/nlp-and-minlp-test-problems. Accessed 8 May 2018
[34] Tawarmalani, M; Sahinidis, NV, A polyhedral branch-and-cut approach to global optimization, Math. Program., 103, 225-249, (2005) · Zbl 1099.90047 · doi:10.1007/s10107-005-0581-8
[35] Tibshirani, R, Regression shrinkage and selection via the lasso, J. R. Stat. Soc. Ser. B (Methodol.), 58, 267-288, (1996) · Zbl 0850.62538
[36] Uy, NQ; Hoai, NX; O’Neill, M; McKay, RI; Galván-López, E, Semantically-based crossover in genetic programming: application to real-valued symbolic regression, Genet. Program. Evol. Mach., 12, 91-119, (2011) · doi:10.1007/s10710-010-9121-2
[37] Watson, A.H., Parmee, I.C.: Identification of fluid systems using genetic programming. In: Proceedings of the Second Online Workshop on Evolutionary Computation, pp. 45-48 (1996)
[38] Westerlund, T; Pörn, R, Solving pseudo-convex mixed integer optimization problems by cutting plane techniques, Optim. Eng., 3, 253-280, (2002) · Zbl 1035.90051 · doi:10.1023/A:1021091110342
[39] White, DR; McDermott, J; Castelli, M; Manzoni, L; Goldman, BW; Kronberger, G; Jaśkowski, W; O’Reilly, U-M; Luke, S, Better GP benchmarks: community survey results and proposals, Genet. Program. Evol. Mach., 14, 3-29, (2013) · doi:10.1007/s10710-012-9177-2
[40] Willis, MJ; Hiden, HG; Marenbach, P; McKay, B; Montague, GA, Genetic programming: an introduction and survey of applications, IEEE Conf. Publ., 1, 314-319, (1997)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.