Training multi-layered neural network with a trust-region based algorithm. (English) Zbl 0707.90097

Summary: We first show how the problem of training a neural network is modelized as an optimization problem and describe the generally used training algorithm. Then we propose a new algorithm based on a trust-region technique which is very efficient for non-convex optimization problems. Experimental results show that the new algorithm is much faster and robust compared with GBP. It makes the design of neural net architecture much less problem-dependent.


90C90 Applications of mathematical programming
92B20 Neural networks for/in biological studies, artificial life and related topics
90-08 Computational methods for problems pertaining to operations research and mathematical programming
90C26 Nonconvex programming, global optimization


Full Text: DOI EuDML


[1] A. AUSLENDER (1976), Optimisation, méthodes numériques. Masson, Paris. Zbl0326.90057 MR441204 · Zbl 0326.90057
[2] J. CEA (1971), Optimisation : Théories et algorithmes. Dunod. Zbl0211.17402 MR298892 · Zbl 0211.17402
[3] A. R. CONN, N. GOULD & Ph. TOINT (1986), Testing a class of methods for solving minimization problems with simple bounds on the variables. Report n^\circ 86-3, University of Waterloo. Zbl0645.65033 · Zbl 0645.65033
[4] J. E. DENNIS, R. B. SCHNABEL (1983, Numerical methods for unconstrained optimization and nonlinear equations. Printice-Hall. Zbl0579.65058 MR702023 · Zbl 0579.65058
[5] I. S. DUFF, J. NOCEDAL & J.K. REID (1987), The use linear programming for solutions of sparse sets of nonlinear equations. SIAM J. Sci. Stat. Comput. vol. 8, N^\circ 2, pp. 99-108. Zbl0636.65053 MR879405 · Zbl 0636.65053
[6] I. EKELAND & R. TEMAM (1974), Analyse Convexe et problèmes variationnels. Dunod, Gauthier-Villars. Zbl0281.49001 MR463993 · Zbl 0281.49001
[7] R. FLETCHER (1980), Practical Methods of Optimization, vol. 1, John Wiley, New York. Zbl0439.93001 MR585160 · Zbl 0439.93001
[8] F. FOGELMAN-SOULIE, P. GALLINARI, Y. LE CUN, S. THIRIA, (1987), Automata networks and artificial intelligence. In F. Fogelman-Soulie, Y. Robert, M. Tchuente (Eds.), Computing on automata networks, Manchester University Press. MR942907
[9] N. GASTINEL (1966), Analyse numérique linéaire. Hermann, Paris. Zbl0151.21202 MR201053 · Zbl 0151.21202
[10] D. M. GAY (1981), Computing optimal constrained steps. SIAM J. Sci. Stat. Comput. 2, pp. 186-197. Zbl0467.65027 MR622715 · Zbl 0467.65027
[11] P. E. GILL & W. MURRAY & (1972), Quasi-Newton methods for unconstrained optimization, The Journal of the Institute of Mathematics and its Applications, vol, 9, pp. 91-108. Zbl0264.49026 MR300410 · Zbl 0264.49026
[12] P. E. GILL, W. MURRAY & M. H. WRIGHT (1981), Practical Optimization. Academie Press. MR634376 · Zbl 0503.90062
[13] M. D. HEBDEN (1973), An algorithm for minimization using exact second derivatives. Atomic Energy Research Establishment report T.P. 515, Harwell, England.
[14] S. KANIEL & A. DAX (1979), A modified Newtons method for unconstrained minimization. SIAM J. Num. Anal., pp. 324-331. Zbl0403.65027 MR526493 · Zbl 0403.65027
[15] P. LANCASTER (1969), Theory of Matrix. Academie Press, NewYork and London. MR245579 · Zbl 0186.05301
[16] P. J. LAURENT (1972), Approximation et Optimisation. Hermann, Paris. Zbl0238.90058 MR467080 · Zbl 0238.90058
[17] Y. LE CUN (1987), Modèles connectionnistes de l’apprentissage. Thèse de doctora, Université de Paris VI.
[18] MINOUX (1983), Programmation Mathématique. Tomel, Dunod. Zbl0546.90056 · Zbl 0546.90056
[19] M. MINSKY & S. PAPERT (1969), Perceptrons. Cambridge, MA : MIT Press.
[20] J. J. MORÉ (1978), The Levenberg-Marquart algorithm : implementation and theory. Lecture Notes in Mathematics 630, G. A. Waston, ed., Springer-Verlag, Berlin-Heidelberg-New York, pp. 105-116. Zbl0372.65022 MR483445 · Zbl 0372.65022
[21] J. J. MORÉ (1983), Recent developments in algorithm and software for Trust Region Methods. Mathematical Programming, The State of the Art, Springer, Berlin, pp. 258-287. Zbl0546.90077 MR717404 · Zbl 0546.90077
[22] J. J. MORÉ& D. C. SORENSEN (1979), On the use of directions of negative curvature in a modified Newton method. Math. Prog. 16, pp. 1-20. Zbl0394.90093 MR517757 · Zbl 0394.90093
[23] J. J. MORÉ & D. C. SORENSEN (1981), Computing a trust region step. Argonne National Laboratory report, Argonne, Illinois. · Zbl 0551.65042
[24] H. MUKAI& E. POLAK (1978), A second order method for unconstrained optimization. J.O.T.A. vol. 26, pp. 501-513. Zbl0373.90068 MR526650 · Zbl 0373.90068
[25] J. P. PENOT & A. ROGER, Updating the spectrum of a real matrix. Mathematics of Computation.
[26] M. J. D. POWELL (1975), Convergence properties of a class of minimization algorithms. O. L. Mangazarian, R. R. Meyer, S. M. Robinson Editors, Nonlinear prograrnming 2 pp. 1-27, Academic press, New York. Zbl0321.90045 MR386270 · Zbl 0321.90045
[27] REINSCH (1967), Smoothing by spline functions. Numer. Math. 10, 177-183. Zbl0161.36203 MR295532 · Zbl 0161.36203
[28] REINSCH (1971), Smoothing by spline functions II. Numer. Math. 16, 451-454. MR1553981 · Zbl 1248.65020
[29] D. E. RHUMELHART & J. C. MCCLELLAND (1986) (Eds.), Parallel Distributed Processing. Cambridge, MA : MIT Press.
[30] F. ROBERT & S. WANG (1988), Implementation of a Neural Network on a Hypercube F.P.S. T20. Proceeding of IF1P WG 10.3 Working Conference on Parallel Processing. Pisa : Italy, 25-27 April. North-Holland.
[31] R. T. ROCKAFELLAR (1970), Convex Analysis. Princeton University Press, Princeton, New Jersey. Zbl0193.18401 MR274683 · Zbl 0193.18401
[32] A. ROGER (1987), Mise à jour du spectre d’une matrice symétrique, Rapport de recherche SNEA (P), n^\circ AR/87-970.
[33] S. ROUSSET, A. SCHREIBER & S. WANG (1988), Modélisation et simulation connexionniste de l’identification des visages en contexte. Le système FACENET RR 742 -M-. IMAG Grenoble.
[34] G. A. SHULTZ, R. B. SCHNABEL & R. H. BYRD (1985), A family of trust-regionbased algorithms for unconstrained minimization with strong global convergence properties. SIAM Journal on Numerical Analysis 22, pp. 47-67. Zbl0574.65061 MR772882 · Zbl 0574.65061
[35] G. A. SHULTZ, R. B. SCHNABEL & R. H. BYRD (1988), Approximate solution of the trust region problem by minimization over two-dimensional subspaces Mathematical Programming. Vol. 40, pp. 247-263, North-Holland. Zbl0652.90082 MR941311 · Zbl 0652.90082
[36] D. C. SORENSEN (1982), Newton’s method with a model trust region modification. SIAM J. Numer. Anal. vol. 19, n^\circ 2, pp. 409-426. Zbl0483.65039 MR650060 · Zbl 0483.65039
[37] G. W. STEWART (1973), Introduction to matrix computation. Academic Press, New York. Zbl0302.65021 MR458818 · Zbl 0302.65021
[38] S. WANG (1988), Implementation of threshold automata networks with multilayers on a Hypercube F.P.S. T20. RR 725 -M-. IMAG, Grenoble.
[39] S. WANG, H. YÉ & F. ROBERT (1988), A PNML neural network for isolated words recognition. Proceedings of nEuro ’88. First european conference on neural network, 6-9 Juin 1988 : Paris.
[40] Y. YUAN (1984), An example of only linear convergence of trust region algorithms for nonsmooth optimization. IMA Journal of Numerical Analysis 4, pp. 327-335. Zbl0555.65037 MR752609 · Zbl 0555.65037
[41] Y. YUAN (1985), On the superlinear convergence of a trust region algorithm for nonsmooth optimization. Mathematical Programming, vol. 3, pp. 269-285. North-Holland. Zbl0577.90066 MR783392 · Zbl 0577.90066
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.