A matrix-free line-search algorithm for nonconvex optimization.

*(English)*Zbl 1416.90023Summary: In this paper, we have developed a new algorithm for solving nonconvex large-scale problems. The new algorithm performs explicit matrix modifications adaptively, mimicing the implicit modifications used by trust-region methods. Thus, it shares the equivalent theoretical strength of trust-region approaches, without needing to accommodate an explicit step-size constraint. We show that the algorithm is well suited for solving very large-scale nonconvex problems whenever Hessian-vector products are available. The numerical results on the CUTEr problems demonstrate the effectiveness of this approach in the context of a line-search method for large-scale unconstrained nonconvex optimization. Moreover, applications in deep-learning problems further illustrate the usefulness of this algorithm. It does not share any of the prohibitive traits of popular matrix-free algorithms such as truncated conjugate gradient (CG) due to the difficult nature of deep-learning problems. Thus the proposed algorithm serves to bridge the gap between the needs of data-mining community and existing state-of-the-art approaches embraced foremost by the optimization community. Moreover, the proposed approach can be realized with minimal modification to the CG algorithm itself with negligible storage and computational overhead.

##### MSC:

90C06 | Large-scale problems in mathematical programming |

90C26 | Nonconvex programming, global optimization |

90C30 | Nonlinear programming |

65K05 | Numerical mathematical programming methods |

68T01 | General topics in artificial intelligence |

##### Keywords:

nonlinear programming; nonconvex large-scale problems; trust-region methods; conjugate gradient method; Hessian-free methods; machine learning
PDF
BibTeX
XML
Cite

\textit{W. Zhou} et al., Optim. Methods Softw. 34, No. 1, 1--24 (2019; Zbl 1416.90023)

Full Text:
DOI

##### References:

[1] | Computing a search direction for large-scale linearly-constrained nonlinear optimization calculations, Rutherford Appleton Laboratory, 1993 |

[2] | An implicit turst-region method on Riemannian manifolds, Tech. Rep. FSU-SCS-2007-449, Florida State University, 2007. Available at |

[3] | CUTE: Constrained and unconstrained testing environment, Report 93/10, Département de Mathématique, Facultés Universitaires de Namur, 1993 |

[4] | Branch, M. A.; Coleman, T. F.; Li, Y., A subspace, interior, and conjugate gradient method for large-scale bound-constrained minimization problems, SIAM J. Sci. Comput., 21, 1-23, (1999) · Zbl 0940.65065 |

[5] | Conn, A. R.; Gould, N. I.M.; Toint, Ph. L., Trust-Region Methods, (2000), Society for Industrial and Applied Mathematics (SIAM): Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA |

[6] | Dolan, E. D.; Moré, J. J., Benchmarking optimization software with performance profiles, Math. Program., 91, 201-213, (2002) · Zbl 1049.90004 |

[7] | An interior method for computing a trust-region step, Numerical Analysis Report 08-1, Department of Mathematics, University of California San Diego, La Jolla, CA, 2008 |

[8] | Erway, J. B.; Gill, P. E., A subspace minimization method for the trust-region step, SIAM J. Optim., 20, 1439-1461, (2009) · Zbl 1195.49042 |

[9] | Iterative methods for finding a trust-region step, Numerical Analysis Report 07-2, Department of Mathematics, University of California San Diego, La Jolla, CA, 2007 |

[10] | Erway, J. B.; Gill, P. E.; Griffin, J. D., Iterative methods for finding a trust-region step, SIAM J. Optim., 20, 1110-1131, (2009) · Zbl 1189.49049 |

[11] | Fletcher, R., Practical Methods of Optimization, (1987), John Wiley and Sons: John Wiley and Sons, Chichester · Zbl 0905.65002 |

[12] | Fletcher, R., Practical Methods of Optimization, (2001), Wiley-Interscience [John Wiley & Sons]: Wiley-Interscience [John Wiley & Sons], New York · Zbl 0988.65043 |

[13] | Combination trust-region line-search methods for unconstrained optimization, Ph.D. diss., Department of Mathematics, University of California San Diego, San Diego, 1999 |

[14] | Gertz, E. M.; Gill, P. E., A primal-dual trust-region algorithm for nonlinear programming, Math. Program. Ser. B, 100, 49-94, (2004) · Zbl 1146.90514 |

[15] | Trust-search algorithms for unconstrained optimization, Numerical Analysis Report NA 04-1, University of California San Diego, San Diego, 2004 |

[16] | Gill, P. E.; Murray, W.; Wright, M. H., Practical Optimization, (1981), Academic Press: Academic Press, London · Zbl 0503.90062 |

[17] | Gould, N. I.M.; Lucidi, S.; Roma, M.; Toint, Ph. L., Solving the trust-region subproblem using the Lanczos method, SIAM J. Optim., 9, 504-525, (1999) · Zbl 1047.90510 |

[18] | Gould, N. I.M.; Orban, D.; Toint, Ph. L., GALAHAD, a library of thread-safe Fortran 90 packages for large-scale nonlinear optimization, ACM Trans. Math. Softw., 29, 353-372, (2003) · Zbl 1068.90525 |

[19] | Interior-point methods for large-scale nonconvex optimization, Ph.D. diss., Department of Mathematics, University of California San Diego, San Diego, 2005 |

[20] | Hager, W. W., Minimizing a quadratic over a sphere, SIAM J. Optim., 12, 188-208, (2001) · Zbl 1058.90045 |

[21] | Trust-search algorithms for unconstrained optimization, Ph.D. diss., Department of Mathematics, University of California San Diego, San Diego, 2004 |

[22] | A shifted Steihaug–Toint method for computing a trust-region step, Tech. Rep. V914-04, Institute of Computer Science, Academy of Sciences of the Czech Republic, 2004 |

[23] | Deep Learning Via Hessian-free Optimization, Proceedings of the 27th International Conference on Machine Learning (ICML-10), 2010, pp. 735–742 |

[24] | Learning Recurrent Neural Networks with Hessian-free Optimization, Proceedings of the 28th International Conference on Machine Learning (ICML-11), 2011, pp. 1033–1040 |

[25] | Training deep and recurrent networks with hessian-free optimization, in Neural Networks: Tricks of the Trade, Springer, Berlin, 2012, pp. 479–535 |

[26] | Mizutani, E.; Dreyfus, S. E., Second-order stagewise backpropagation for Hessian-matrix analyses and investigation of negative curvature, Neural Netw., 21, 193-203, (2008) · Zbl 1254.68214 |

[27] | Nash, S. G., Newton-type minimization via the Lanczos method, SIAM J. Numer. Anal., 21, 770-788, (1984) · Zbl 0558.65041 |

[28] | Nocedal, J.; Wright, S. J., Numerical Optimization, (1999), Springer-Verlag: Springer-Verlag, New York · Zbl 0930.65067 |

[29] | Combining trust region and line search techniques, in Advances in Nonlinear Programming. Proceedings of the 96 International Conference on Nonlinear Programming, Springer US, 1998, pp. 153–175 · Zbl 0909.90243 |

[30] | O’Leary, D. P., A discrete Newton algorithm for minimizing a function of many variables, Math. Program., 23, 20-33, (1982) · Zbl 0477.90055 |

[31] | Pearlmutter, B. A., Fast exact multiplication by the hessian, Neural Comput., 6, 147-160, (1994) |

[32] | Rojas, M.; Santos, S. A.; Sorensen, D. C., A new matrix-free algorithm for the large-scale trust-region subproblem, SIAM J. Optim., 11, 611-646, (2000) · Zbl 0994.65067 |

[33] | Schraudolph, N. N., Fast curvature matrix–vector products for second-order gradient descent, Neural Comput., 14, 1723-1738, (2002) · Zbl 1037.68119 |

[34] | Steihaug, T., The conjugate gradient method and trust regions in large scale optimization, SIAM J. Numer. Anal., 20, 626-637, (1983) · Zbl 0518.65042 |

[35] | Towards an efficient sparsity exploiting Newton method for minimization, in Sparse Matrices and Their Uses, I.S. Duff, ed., Academic Press, London, 1981, pp. 57–88 |

[36] | Investigations on Hessian-free optimization for cross-entropy training of deep neural networks, Proc. Interspeech, Lyon, France, Aug. 2013, pp. 3317–3321 |

[37] | Yuan, Y., On the truncated conjugate gradient method, Math. Program., 87, 561-573, (2000) · Zbl 0955.65039 |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.