zbMATH — the first resource for mathematics

Learning to steer nonlinear interior-point methods. (English) Zbl 1437.90148
Summary: Interior-point or barrier methods handle nonlinear programs by sequentially solving barrier subprograms with a decreasing sequence of barrier parameters. The specific barrier update rule strongly influences the theoretical convergence properties as well as the practical efficiency. While many global and local convergence analyses consider a monotone update that decreases the barrier parameter for every approximately solved subprogram, computational studies show a superior performance of more adaptive strategies. In this paper we interpret the adaptive barrier update as a reinforcement learning task. A deep Q-learning agent is trained by both imitation and random action selection. Numerical results based on an implementation within the nonlinear programming solver WORHP show that the agent successfully learns to steer the barrier parameter and additionally improves WORHP’s performance on the CUTEst test set.
90C30 Nonlinear programming
68T05 Learning and adaptive systems in artificial intelligence
60J20 Applications of Markov chains and discrete-time Markov processes on general state spaces (social mobility, learning theory, industrial processes, etc.)
90C51 Interior-point methods
Full Text: DOI
[1] Abadi, M.; Barham, P.; Chen, J.; Chen, Z.; Davis, A.; Dean, J.; Devin, M.; Ghemawat, S.; Irving, G.; Isard, M.; etal., Tensorflow: a system for large-scale machine learning, OSDI, 16, 265-283 (2016)
[2] Armand, P.; Benoist, J., A local convergence property of primal-dual methods for nonlinear programming, Math Program, 115, 199-222 (2008) · Zbl 1167.65031
[3] Armand, P.; Benoist, J.; Orban, D., Dynamic updates of the barrier parameter in primal-dual methods for nonlinear programming, Comput Optim Appl, 41, 1-25 (2008) · Zbl 1180.90305
[4] Armand P, Orban D, Benoist J (2008b) Global convergence of primal-dual methods for nonlinear programming. In: Technical report, Laboratoire XLIM et Université de Limoges · Zbl 1180.90305
[5] Auer, P.; Cesa-Bianchi, N.; Fischer, P., Finite-time analysis of the multiarmed bandit problem, Mach Learn, 47, 235-256 (2002) · Zbl 1012.68093
[6] Balcan MF, Dick T, Sandholm T, Vitercik E (2018) Learning to branch. arXiv preprint arXiv:1803.10150
[7] Byrd, RH; Liu, G.; Nocedal, J., On the local behavior of an interior point method for nonlinear programming, Numer Anal, 1997, 37-56 (1997)
[8] Büskens, Christof; Wassel, Dennis, The ESA NLP Solver WORHP, 85-110 (2012), New York, NY · Zbl 1365.90007
[9] Chen SY, Yu Y, Da Q, Tan J, Huang HK, Tang HH (2018) Stabilizing reinforcement learning in dynamic environment with application to online recommendation. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’18, pp 1187-1196. ACM, New York, NY, USA. https://doi.org/10.1145/3219819.3220122
[10] Curtis, FE, A penalty-interior-point algorithm for nonlinear constrained optimization, Math Program Comput, 4, 181-209 (2012) · Zbl 1269.49045
[11] Dolan, ED; Moré, JJ, Benchmarking optimization software with performance profiles, Math Program, 91, 201-213 (2002) · Zbl 1049.90004
[12] El-Bakry, AS; Tapia, RA; Tsuchiya, T.; Zhang, Y., On the formulation and theory of the newton interior-point method for nonlinear programming, J Optim Theory Appl, 89, 507-541 (1996) · Zbl 0851.90115
[13] Fiacco AV, McCormick GP (1990) Nonlinear programming: sequential unconstrained minimization techniques, vol 4. Siam · Zbl 0713.90043
[14] Forsgren, A.; Gill, PE; Wright, MH, Interior methods for nonlinear optimization, SIAM Rev, 44, 525-597 (2002) · Zbl 1028.90060
[15] Geffken S, Büskens C (2016) WORHP multi-core interface, parallelisation approaches for an NLP solver. In: Proceedings of the 6th international conference on astrodynamics tools and techniques, Darmstadt, Germany
[16] Gertz, EM; Wright, SJ, Object-oriented software for quadratic programming, ACM Trans Math Softw, 29, 58-81 (2003) · Zbl 1068.90586
[17] Gondzio, J.; Grothey, A., A new unblocking technique to warmstart interior point methods based on sensitivity analysis, SIAM J Optim, 19, 1184-1210 (2008) · Zbl 1177.90411
[18] Gould, NIM; Orban, D.; Toint, PL, CUTEst: a constrained and unconstrained testing environment with safe threads for mathematical optimization, Comput Optim Appl, 60, 545-557 (2015) · Zbl 1325.90004
[19] Gould NIM, Toint PL (2006) Global convergence of a non-monotone trust-region filter algorithm for nonlinear programming, pp 125-150. Springer US, Boston, MA. https://doi.org/10.1007/0-387-29550-X_5 · Zbl 1130.90399
[20] Hausknecht M, Stone P (2015) Deep recurrent q-learning for partially observable MDPs. In: AAAI fall symposium on sequential decision making for intelligent agents
[21] Hendel G, Miltenberger M, Witzig J (2018) Adaptive algorithmic behavior for solving mixed integer programs using bandit algorithms. In: Technical report, pp. 18-36, ZIB
[22] Hoos, Holger H., Automated Algorithm Configuration and Parameter Tuning, 37-71 (2011), Berlin, Heidelberg
[23] Hutter, F.; Hoos, HH; Leyton-Brown, K.; Lodi, A. (ed.); Milano, M. (ed.); Toth, P. (ed.), Automated configuration of mixed integer programming solvers, 186-202 (2010), Berlin
[24] Kadioglu S, Malitsky Y, Sellmann M, Tierney K (2010) ISAC -instance-specific algorithm configuration. In: Proceedings of the 2010 conference on ECAI 2010: 19th European conference on artificial intelligence, pp 751-756. IOS Press, Amsterdam
[25] Kaelbling, LP; Littman, ML; Cassandra, AR, Planning and acting in partially observable stochastic domains, Artif Intell, 101, 99-134 (1998) · Zbl 0908.68165
[26] Khalil EB, Le Bodic P, Song L, Nemhauser G, Dilkina B (2016) Learning to branch in mixed integer programming. In: 30th AAAI conference on artificial intelligence
[27] Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980
[28] Kruber, Markus; Lübbecke, Marco E.; Parmentier, Axel, Learning When to Use a Decomposition, 202-210 (2017), Cham · Zbl 06756586
[29] Kuhlmann R (2018) A primal-dual augmented lagrangian penalty-interior-point algorithm for nonlinear programming. Ph.d. thesis, Universität Bremen · Zbl 1394.49025
[30] Kuhlmann, R.; Büskens, C., A primal-dual augmented lagrangian penalty-interior-point filter line search algorithm, Math Methods Oper Res, 87, 451-483 (2018) · Zbl 1394.49025
[31] Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971
[32] Lodi, A.; Zarpellon, G., On learning and branching: a survey, TOP, 25, 207-236 (2017) · Zbl 1372.90003
[33] Mehrotra, S., On the implementation of a primal-dual interior point method, SIAM J Optim, 2, 575-601 (1992) · Zbl 0773.90047
[34] Mittelmann H. Benchmarks for optimization software. http://plato.asu.edu/ftp/ampl-nlp.html. Accessed 15 April 2019
[35] Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602
[36] Morales, José Luis; Nocedal, Jorge; Waltz, Richard A.; Liu, Guanghui; Goux, Jean-Pierre, Assessing the Potential of Interior Methods for Nonlinear Optimization, 167-183 (2003), Berlin, Heidelberg · Zbl 1062.65063
[37] Nocedal, J.; Wächter, A.; Waltz, RA, Adaptive barrier update strategies for nonlinear interior methods, SIAM J Optim, 19, 1674-1693 (2009) · Zbl 1176.49036
[38] Baltean-Lugojan Radu, Bonami Pierre, Misener R, Tramontani A (2018) Selecting cutting planes for quadratic semidefinite outer-approximation via trained neural networks. In: Technical report, Imperial College London
[39] Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347
[40] Shen, C.; Leyffer, S.; Fletcher, R., A nonmonotone filter method for nonlinear optimization, Comput Optim Appl, 52, 583-607 (2011) · Zbl 1259.90140
[41] Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge
[42] Tits, AL; Wächter, A.; Bakhtiari, S.; Urban, TJ; Lawrence, CT, A primal-dual interior-point method for nonlinear programming with strong global and local convergence properties, SIAM J Optim, 14, 173-199 (2003) · Zbl 1075.90078
[43] Ulbrich, M.; Ulbrich, S.; Vicente, NL, A globally convergent primal-dual interior-point filter method for nonlinear programming, Math Program, 100, 379-410 (2004) · Zbl 1070.90110
[44] Vanderbei, RJ; Shanno, DF, An interior-point algorithm for nonconvex nonlinear programming, Comput Optim Appl, 13, 231-252 (1999) · Zbl 1040.90564
[45] Waltz, R.; Morales, J.; Nocedal, J.; Orban, D., An interior algorithm for nonlinear optimization that combines line search and trust region steps, Math Program, 107, 391-408 (2006) · Zbl 1134.90053
[46] Wang Z, Schaul T, Hessel M, Van Hasselt H, Lanctot M, De Freitas N (2015) Dueling network architectures for deep reinforcement learning. arXiv preprint arXiv:1511.06581
[47] Watkins, CJCH; Dayan, P., Q-learning, Mach Learn, 8, 279-292 (1992) · Zbl 0773.68062
[48] Wächter, A.; Biegler, LT, On the implementation of a primal-dual interior point filter line search algorithm for large-scale nonlinear programming, Math Program, 106, 25-57 (2006) · Zbl 1134.90542
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.