×

zbMATH — the first resource for mathematics

Global optimization issues in deep network regression: an overview. (English) Zbl 1421.90154
Summary: The paper presents an overview of global issues in optimization methods for training feedforward neural networks (FNN) in a regression setting. We first recall the learning optimization paradigm for FNN and we briefly discuss global scheme for the joint choice of the network topologies and of the network parameters. The main part of the paper focuses on the core subproblem which is the continuous unconstrained (regularized) weights optimization problem with the aim of reviewing global methods specifically arising both in multi layer perceptron/deep networks and in radial basis networks. We review some recent results on the existence of non-global stationary points of the unconstrained nonlinear problem and the role of determining a global solution in a supervised learning paradigm. Local algorithms that are widespread used to solve the continuous unconstrained problems are addressed with focus on possible improvements to exploit the global properties. Hybrid global methods specifically devised for FNN training optimization problems which embed local algorithms are discussed too.

MSC:
90C35 Programming involving graphs or networks
90C26 Nonconvex programming, global optimization
Software:
CRIO; SemiPar
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Abraham, A., Meta learning evolutionary artificial neural networks, Neurocomputing, 56, 1-38, (2004)
[2] Adam, S.; Magoulas, G.; Karras, D.; Vrahatis, M., Bounding the search space for global optimization of neural networks learning error: an interval analysis approach, J. Mach. Learn. Res., 17, 1-40, (2016) · Zbl 1392.68335
[3] Adamu, A., Maul, T., Bargiela, A.: On training neural networks with transfer function diversity. In: International Conference on Computational Intelligence and Information Technology (CIIT 2013), Elsevier (2013)
[4] Amato, S.; Apolloni, B.; Caporali, G.; Madesani, U.; Zanaboni, A., Simulated annealing approach in backpropagation, Neurocomputing, 3, 207-220, (1991)
[5] An, G., The effects of adding noise during backpropagation training on a generalization performance, Neural Comput., 8, 643-674, (1996)
[6] Bagirov, A.; Rubinov, A.; Soukhoroukova, N.; Yearwood, J., Unsupervised and supervised data classification via nonsmooth and global optimization, Top, 11, 1-75, (2003) · Zbl 1048.65059
[7] Baldi, P.; Hornik, K., Neural networks and principal component analysis: learning from examples without local minima, Neural Netw., 2, 53-58, (1989)
[8] Baldi, P.; Lu, Z., Complex-valued autoencoders, Neural Netw., 33, 136-147, (2012) · Zbl 1258.68111
[9] Baldi, P.; Sadowski, P., The dropout learning algorithm, Artif. Intell., 210, 78-122, (2014) · Zbl 1333.68225
[10] Barhen, J.; Protopopescu, V.; Reister, D., TRUST: a deterministic algorithm for global optimization, Science, 276, 1094-1097, (1997) · Zbl 1226.90073
[11] Bates, D.M., Watts, D.G.: Nonlinear Regression Analysis and Its Applications. Wiley Series in Probability and Statistics. Wiley, Hoboken (2007)
[12] Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: Proceedings of the 26th annual international conference on machine learning, pp. 41-48. ACM (2009)
[13] Bergstra, J.; Bengio, Y., Random search for hyper-parameter optimization, J. Mach. Learn. Res., 13, 281-305, (2012) · Zbl 1283.68282
[14] Bertsekas, D.P.: Nonlinear Programming. Athena Scientific, Belmont (1999) · Zbl 1015.90077
[15] Bertsekas, DP, Incremental gradient, subgradient, and proximal methods for convex optimization: a survey, Optim. Mach. Learn., 2010, 3, (2011)
[16] Bertsekas, D.P., Tsitsiklis, J.N.: Parallel and Distributed Computation: Numerical Methods. Prentice-Hall, Englewood Cliffs (1989) · Zbl 0743.65107
[17] Bertsekas, DP; Tsitsiklis, JN, Gradient convergence in gradient methods with errors, SIAM J. Optim., 10, 627-642, (2000) · Zbl 1049.90130
[18] Bertsimas, D.; Dunn, J., Optimal classification trees, Mach. Learn., 106, 1039-1082, (2017) · Zbl 06841416
[19] Bertsimas, D.; Shioda, R., Classification and regression via integer optimization, Oper. Res., 55, 252-271, (2007) · Zbl 1167.90593
[20] Bianchini, M.; Frasconi, P.; Gori, M., Learning without local minima in radial basis function networks, IEEE Trans. Neural Netw., 6, 749-756, (1995)
[21] Bishop, C., Improving the generalization properties of radial basis function neural networks, Neural Comput., 3, 579-588, (1991)
[22] Bishop, C.: Pattern Recognition and Machine Learning (Information Science and Statistics), 1st edn. 2006. corr. 2nd printing edn (2007)
[23] Blum, A., Rivest, R.L.: Training a 3-node neural network is NP-complete. In: Proceedings of the 1st International Conference on Neural Information Processing Systems, pp. 494-501. MIT Press (1988)
[24] Blundell, C., Cornebise, J., Kavukcuoglu, K., Wierstra, D.: Weight uncertainty in neural networks (2015). arXiv preprint arXiv:1505.05424
[25] Bottou, L., Bousquet, O.: The tradeoffs of large scale learning. In: Proceedings of the 20th International Conference on Neural Information Processing Systems, NIPS’07, pp. 161-168. Curran Associates Inc., USA (2007). http://dl.acm.org/citation.cfm?id=2981562.2981583
[26] Bottou, L.; Curtis, FE; Nocedal, J., Optimization methods for large-scale machine learning, SIAM Rev., 60, 223-311, (2018) · Zbl 1397.65085
[27] Boubezoul, A.; Paris, S., Application of global optimization methods to model and feature selection, Pattern Recognit., 45, 3676-3686, (2012) · Zbl 1242.68207
[28] Branke, J.: Evolutionary algorithms for neural network design and training. In: Proceedings of the First Nordic Workshop on Genetic Algorithms and its Applications, pp. 145-163 (1995)
[29] Bravi, L.; Piccialli, V.; Sciandrone, M., An optimization-based method for feature ranking in nonlinear regression problems, IEEE Trans. Neural Netw. Learn. Syst., 28, 1005-1010, (2017)
[30] Bray, AJ; Dean, DS, Statistics of critical points of Gaussian fields on large-dimensional spaces, Phys. Rev. Lett., 98, 150 201, (2007)
[31] Breuel, T.M.: On the convergence of SGD training of neural networks (2015). arXiv preprint arXiv:1508.02790
[32] Buchtala, O.; Klimek, M.; Sick, B., Evolutionary optimization of radial basis function classifiers for data mining applications, IEEE Trans. Syst. Man Cybern. Part B (Cybern.), 35, 928-947, (2005)
[33] Burges, CJ, A tutorial on support vector machines for pattern recognition, Data Min. Knowl. Discov., 2, 121-167, (1998)
[34] Buzzi, C.; Grippo, L.; Sciandrone, M., Convergent decomposition techniques for training RBF neural networks, Neural Comput., 13, 1891-1920, (2001) · Zbl 0986.68109
[35] Carrizosa, E.; Martín-Barragán, B.; Morales, DR, A nested heuristic for parameter tuning in support vector machines, Comput. Oper. Res., 43, 328-334, (2014) · Zbl 1349.62260
[36] Carrizosa, E.; Morales, DR, Supervised classification and mathematical optimization, Comput. Oper. Res., 40, 150-165, (2013) · Zbl 1349.68135
[37] Cetin, B.; Barhen, J.; Burdick, J., Terminal repeller unconstrained subenergy tunneling ( trust) for fast global optimization, J. Optim. Theory Appl., 77, 97-126, (1993) · Zbl 0801.49001
[38] Cetin, B.C., Burdick, J.W., Barhen, J.: Global descent replaces gradient descent to avoid local minima problem in learning with artificial neural networks. In: IEEE International Conference onNeural Networks, 1993, pp. 836-842. IEEE (1993)
[39] Chandrashekar, G.; Sahin, F., A survey on feature selection methods, Comput. Electr. Eng., 40, 16-28, (2014)
[40] Chao, J., Hoshino, M., Kitamura, T., Masuda, T.: A multilayer RBF network and its supervised learning. In: International Joint Conference on Neural Networks, 2001 (IJCNN’01), Proceedings, vol. 3, pp. 1995-2000. IEEE (2001)
[41] Chapelle, O.; Sindhwani, V.; Keerthi, SS, Optimization techniques for semi-supervised support vector machines, J. Mach. Learn. Res., 9, 203-233, (2008) · Zbl 1225.68158
[42] Chen, S.; Wu, Y.; Luk, B., Combined genetic algorithm optimization and regularized orthogonal least squares learning for radial basis function networks, IEEE Trans. Neural Netw., 10, 1239-1243, (1999)
[43] Chiang, H.D., Reddy, C.K.: TRUST-TECH based neural network training. In: International Joint Conference on Neural Networks, 2007. (IJCNN 2007), pp. 90-95. IEEE (2007)
[44] Cho, Sy; Chow, TW, Training multilayer neural networks using fast global learning algorithm—least-squares and penalized optimization methods, Neurocomputing, 25, 115-131, (1999) · Zbl 0941.68110
[45] Choromanska, A., Henaff, M., Mathieu, M., Arous, G.B., LeCun, Y.: The loss surfaces of multilayer networks. In: AISTATS (2015)
[46] Choromanska, A., LeCun, Y., Arous, G.B.: Open problem: the landscape of the loss surfaces of multilayer networks. In: COLT, pp. 1756-1760 (2015)
[47] Cohen, S., Intrator, N.: Global optimization of RBF networks (2000). http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.31.5955
[48] Cohen, S.; Intrator, N., A hybrid projection-based and radial basis function architecture: initial values and global optimisation, Pattern Anal. Appl., 5, 113-120, (2002) · Zbl 1024.68091
[49] Dai, Q.; Ma, Z.; Xie, Q., A two-phased and ensemble scheme integrated backpropagation algorithm, Appl. Soft Comput., 24, 1124-1135, (2014)
[50] Dauphin, Y.N., Pascanu, R., Gulcehre, C., Cho, K., Ganguli, S., Bengio, Y.: Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. In: Advances in neural information processing systems, pp. 2933-2941 (2014)
[51] David, O.E., Greental, I.: Genetic algorithms for evolving deep neural networks. In: Proceedings of the Companion Publication of the 2014 Annual Conference on Genetic and Evolutionary Computation, pp. 1451-1452. ACM (2014)
[52] Dietterich, T.G.: Ensemble methods in machine learning. In: International workshop on multiple classifier systems, pp. 1-15. Springer (2000)
[53] Duarte Silva, AP, Optimization approaches to supervised classification, Eur. J. Oper. Res., 261, 772-788, (2017) · Zbl 1403.62114
[54] Duch, W.; Jankowski, N., New neural transfer functions, Appl. Math. Comput. Sci., 7, 639-658, (1997) · Zbl 0902.68168
[55] Duch, W.; Jankowski, N., Survey of neural transfer functions, Neural Comput. Surv., 2, 163-212, (1999)
[56] Duch, W.; Korczak, J., Optimization and global minimization methods suitable for neural networks, Neural Comput. Surv., 2, 163-212, (1998)
[57] Feng-wen, H., Ai-ping, J.: An improved method of wavelet neural network optimization based on filled function method. In: 16th International Conference on Industrial Engineering and Engineering Management, 2009 (IE&EM’09), pp. 1694-1697. IEEE (2009)
[58] Fischetti, M., Fast training of support vector machines with gaussian kernel, Discrete Optim., 22, 183-194, (2016) · Zbl 1387.68197
[59] Floudas, C.A.: Deterministic Global Optimization: Theory, Methods and Applications, vol. 37. Springer, Berlin (2013)
[60] Fukumizu, K.; Amari, Si, Local minima and plateaus in hierarchical structures of multilayer perceptrons, Neural Netw., 13, 317-327, (2000)
[61] Ge, R., A filled function method for finding a global minimizer of a function of several variables, Math. Program., 46, 191-204, (1990) · Zbl 0694.90083
[62] González, J.; Rojas, I.; Ortega, J.; Pomares, H.; Fernandez, FJ; Díaz, AF, Multiobjective evolutionary optimization of the size, shape, and position parameters of radial basis function networks for function approximation, IEEE Trans. Neural Netw., 14, 1478-1495, (2003)
[63] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016) · Zbl 1373.68009
[64] Goodfellow, I.J., Vinyals, O.: Qualitatively characterizing neural network optimization problems. CoRR (2014). http://arxiv.org/abs/1412.6544
[65] Gori, M.; Tesi, A., On the problem of local minima in backpropagation, IEEE Trans. Pattern Anal. Mach. Intell., 14, 76-86, (1992)
[66] Gorse, D., Shepherd, A.J., Taylor, J.G.: Avoiding local minima by a classical range expansion algorithm. In: ICANN94, pp. 525-528. Springer, London (1994)
[67] Gorse, D., Shepherd, A.J., Taylor, J.G.: A classical algorithm for avoiding local minima. In: Proceedings of the World Congress on Neural Networks, pp. 364-369. Citeseer (1994)
[68] Gorse, D.; Shepherd, AJ; Taylor, JG, The new ERA in supervised learning, Neural Netw., 10, 343-352, (1997)
[69] Graves, A.: Practical variational inference for neural networks. In: Advances in Neural Information Processing Systems, pp. 2348-2356 (2011)
[70] Grippo, L., Convergent on-line algorithms for supervised learning in neural networks, IEEE Trans. Neural Netw., 11, 1284-1299, (2000)
[71] Grippo, L.; Manno, A.; Sciandrone, M., Decomposition techniques for multilayer perceptron training, IEEE Trans. Neural Netw. Learn. Syst., 27, 2146-2159, (2016)
[72] Grippo, L.; Sciandrone, M., Globally convergent block-coordinate techniques for unconstrained optimization, Optim. Methods Softw., 10, 587-637, (1999) · Zbl 0940.65070
[73] Grippo, L.; Sciandrone, M., Nonmonotone globalization techniques for the Barzilai-Borwein gradient method, Comput. Optim. Appl., 23, 143-169, (2002) · Zbl 1028.90061
[74] Györfi, L., Kohler, M., Krzyzak, A., Walk, H.: A Distribution-free Theory of Nonparametric Regression. Springer, Berlin (2006) · Zbl 1021.62024
[75] Hamey, LG, XOR has no local minima: a case study in neural network error surface analysis, Neural Netw., 11, 669-681, (1998)
[76] Hamm, L.; Brorsen, BW; Hagan, MT, Comparison of stochastic global optimization methods to estimate neural network weights, Neural Process. Lett., 26, 145-158, (2007)
[77] Haykin, S.: Neural Networks and Learning Machines, vol. 3. Pearson, Upper Saddle River (2009)
[78] Hochreiter, S.; Schmidhuber, J., Flat minima, Neural Comput., 9, 1-42, (1997) · Zbl 0872.68150
[79] Horst, R., Tuy, H.: Global Optimization: Deterministic Approaches. Springer, Berlin (2013) · Zbl 0704.90057
[80] Huang, G.; Huang, GB; Song, S.; You, K., Trends in extreme learning machines: a review, Neural Netw., 61, 32-48, (2015) · Zbl 1325.68190
[81] Huang, G.B., Zhu, Q.Y., Siew, C.K.: Extreme learning machine: a new learning scheme of feedforward neural networks. In: 2004 IEEE International Joint Conference on Neural Networks, 2004. Proceedings, vol. 2, pp. 985-990. IEEE (2004)
[82] Hui, LCK; Lam, KY; Chea, CW, Global optimisation in neural network training, Neural Comput. Appl., 5, 58-64, (1997)
[83] Jin, Y.; Sendhoff, B., Pareto-based multiobjective machine learning: an overview and case studies, IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.), 38, 397-415, (2008)
[84] Kawaguchi, K.: Deep learning without poor local minima. In: Advances In Neural Information Processing Systems, pp. 586-594 (2016)
[85] Keskar, N.S., Mudigere, D., Nocedal, J., Smelyanskiy, M., Tang, P.T.P.: On large-batch training for deep learning: generalization gap and sharp minima. In: ICLR 2017 (2016)
[86] Lang, K.: Learning to tell two spiral apart. In: Proceedings of the 1988 Connectionist Models Summer School, pp. 52-59 (1989)
[87] Laurent, T., von Brecht, J.: The multilinear structure of ReLU networks (2017). arXiv preprint arXiv:1712.10132
[88] LeCun, Y.; Bengio, Y.; Hinton, G., Deep learning, Nature, 521, 436-444, (2015)
[89] LeCun, Y.A., Bottou, L., Orr, G.B., Müller, K.R.: Efficient backprop. In: Neural networks: Tricks of the trade, pp. 9-48. Springer (2012)
[90] Lee, J.D., Simchowitz, M., Jordan, M.I., Recht, B.: Gradient descent only converges to minimizers. In: Conference on Learning Theory, pp. 1246-1257 (2016)
[91] Lee, JS; Park, CH, Global optimization of radial basis function networks by hybrid simulated annealing, Neural Netw. World, 20, 519, (2010)
[92] Li, HR; Li, HL, A global optimization algorithm based on filled-function for neural networks, J. Northeast. Univ. Nat. Sci., 28, 1247, (2007) · Zbl 1150.68413
[93] Lin, SW; Tseng, TY; Chou, SY; Chen, SC, A simulated-annealing-based approach for simultaneous parameter optimization and feature selection of back-propagation networks, Expert Syst. Appl., 34, 1491-1499, (2008)
[94] Lisboa, P.; Perantonis, S., Complete solution of the local minima in the XOR problem, Network: Comput. Neural Syst., 2, 119-124, (1991) · Zbl 0719.94512
[95] Liu, H.; Wang, Y.; Guan, S.; Liu, X., A new filled function method for unconstrained global optimization, Int. J. Comput. Math., 94, 2283-2296, (2017) · Zbl 1398.65125
[96] Locatelli, M., Schoen, F.: Global optimization: theory, algorithms, and applications. Society for Industrial and Applied Mathematics, Philadelphia, PA (2013). https://doi.org/10.1137/1.9781611972672 · Zbl 1286.90003
[97] Magoulas, G., Plagianakos, V., Vrahatis, M.: Hybrid methods using evolutionary algorithms for on-line training. In: International Joint Conference on Neural Networks, 2001 (IJCNN’01) Proceedings, vol. 3, pp. 2218-2223. IEEE (2001)
[98] Martin-Guerreo, J., Gómez-Chova, L., Calpe-Maravilla, J., Camps-Valls, G., Soria-Olivas, E., Moreno, J.: A soft approach to ERA algorithm for hyperspectral image classification. In: Proceedings of the 3rd International Symposium on Image and Signal Processing and Analysis, 2003 (ISPA 2003), vol. 2, pp. 761-765. IEEE (2003)
[99] Neelakantan, A., Vilnis, L., Le, Q.V., Sutskever, I., Kaiser, L., Kurach, K., Martens, J.: Adding gradient noise improves learning for very deep networks (2015). arXiv preprint arXiv:1511.06807
[100] Nesterov, Y., A method of solving a convex programming problem with convergence rate \(o(1/k^2)\), Sov. Math. Doklady, 27, 372-376, (1983) · Zbl 0535.90071
[101] Nguyen, Q., Hein, M.: The loss surface and expressivity of deep convolutional neural networks (2017). arXiv preprint arXiv:1710.10928
[102] Nguyen, Q., Hein, M.: The loss surface of deep and wide neural networks (2017). arXiv preprint arXiv:1704.08045
[103] Ojha, VK; Abraham, A.; Snášel, V., Metaheuristic design of feedforward neural networks: a review of two decades of research, Eng. Appl. Artif. Intell., 60, 97-116, (2017)
[104] Palmes, PP; Hayasaka, T.; Usui, S., Mutation-based genetic neural network, IEEE Trans. Neural Netw., 16, 587-600, (2005)
[105] Peng, C.C., Magoulas, G.D.: Adaptive nonmonotone conjugate gradient training algorithm for recurrent neural networks. In: 19th IEEE International Conference on Tools with Artificial Intelligence, 2007 (ICTAI 2007), vol. 2, pp. 374-381. IEEE (2007)
[106] Peng, CC; Magoulas, GD, Nonmonotone Levenberg-Marquardt training of recurrent neural architectures for processing symbolic sequences, Neural Comput. Appl., 20, 897-908, (2011)
[107] Piccialli, V.; Sciandrone, M., Nonlinear optimization and support vector machines, 4OR, 16, 111-149, (2018) · Zbl 1398.65126
[108] Pintér, JD, Calibrating artificial neural networks by global optimization, Expert Syst. Appl., 39, 25-32, (2012)
[109] Plagianakos, V.; Magoulas, G.; Vrahatis, M., Learning in multilayer perceptrons using global optimization strategies, Nonlinear Anal. Theory Methods Appl., 47, 3431-3436, (2001) · Zbl 1042.90653
[110] Plagianakos, V., Magoulas, G., Vrahatis, M.: Improved learning of neural nets through global search. In: Global Optimization, pp. 361-388. Springer (2006) · Zbl 1123.92002
[111] Plagianakos, VP; Magoulas, GD; Vrahatis, MN, Deterministic nonmonotone strategies for effective training of multilayer perceptrons, IEEE Transactions on Neural Networks, 13, 1268-1284, (2002)
[112] Poggio, T.; Girosi, F., Networks for approximation and learning, Proc. IEEE, 78, 1481-1497, (1990) · Zbl 1226.92005
[113] Polyak, BT, Some methods of speeding up the convergence of iteration methods, USSR Comput. Math. Math. Phys., 4, 1-17, (1964)
[114] Prieto, A.; Prieto, B.; Ortigosa, EM; Ros, E.; Pelayo, F.; Ortega, J.; Rojas, I., Neural networks: an overview of early research, current frameworks and new challenges, Neurocomputing, 214, 242-268, (2016)
[115] Rere, LR; Fanany, MI; Arymurthy, AM, Simulated annealing algorithm for deep learning, Proc. Comput. Sci., 72, 137-144, (2015)
[116] Robbins, H.; Monro, S., A stochastic approximation method, Ann. Math. Stat., 22, 400-407, (1951) · Zbl 0054.05901
[117] RoyChowdhury, P.; Singh, YP; Chansarkar, R., Dynamic tunneling technique for efficient training of multilayer perceptrons, IEEE Trans. Neural Netw., 10, 48-55, (1999)
[118] Ruppert, D., Wand, M.P., Carroll, R.J.: Semiparametric regression. In: Cambridge Series in Statistical and Probabilistic mathematics, vol. 12. Mathematical Reviews (MathSciNet): MR1998720. Cambridge Univ. Press, Cambridge (2003) · Zbl 1038.62042
[119] Ruppert, D.; Wand, MP; Carroll, RJ, Semiparametric regression during 2003-2007, Electron. J. Stat., 3, 1193, (2009) · Zbl 1326.62094
[120] Saad, D.: On-Line Learning in Neural Networks, vol. 17. Cambridge University Press, Cambridge (2009) · Zbl 1185.68566
[121] Scardapane, S.; Wang, D., Randomness in neural networks: an overview, Wiley Interdiscip. Rev. Data Min. Knowl. Discov., 7, 1200, (2017)
[122] Schaffer, J.D., Whitley, D., Eshelman, L.J.: Combinations of genetic algorithms and neural networks: a survey of the state of the art. In: International Workshop on Combinations of Genetic Algorithms and Neural Networks, 1992 (COGANN-92), pp. 1-37. IEEE (1992)
[123] Schmidhuber, J., Deep learning in neural networks: an overview, Neural Netw., 61, 85-117, (2015)
[124] Schwenker, F.; Kestler, HA; Palm, G., Three learning phases for radial-basis-function networks, Neural Netw., 14, 439-458, (2001) · Zbl 0991.68061
[125] Sexton, RS; Dorsey, RE; Johnson, JD, Toward global optimization of neural networks: a comparison of the genetic algorithm and backpropagation, Decis. Support Syst., 22, 171-185, (1998)
[126] Sexton, RS; Dorsey, RE; Johnson, JD, Optimization of neural networks: a comparative analysis of the genetic algorithm and simulated annealing, Eur. J. Oper. Res., 114, 589-601, (1999) · Zbl 0938.90069
[127] Shang, Y.; Wah, BW, Global optimization for neural network training, Computer, 29, 45-54, (1996)
[128] Šíma, J., Training a single sigmoidal neuron is hard, Neural Comput., 14, 2709-2728, (2002) · Zbl 1060.68099
[129] Soudry, D., Carmon, Y.: No bad local minima: data independent training error guarantees for multilayer neural networks (2016). arXiv preprint arXiv:1605.08361
[130] Sprinkhuizen-Kuyper, IG; Boers, EJ, The error surface of the 2-2-1 XOR network: The finite stationary points, Neural Netw., 11, 683-690, (1998)
[131] Srivastava, N.; Hinton, GE; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R., Dropout: a simple way to prevent neural networks from overfitting, J. Mach. Learn. Res., 15, 1929-1958, (2014) · Zbl 1318.68153
[132] Steijvers, M., Grünwald, P.: A recurrent network that performs a context-sensitive prediction task. In: Proceedings of the 18th Annual Conference of the Cognitive Science Society, pp. 335-339 (1996)
[133] Sutskever, I.; Martens, J.; Dahl, GE; Hinton, GE, On the importance of initialization and momentum in deep learning, ICML, 3, 1139-1147, (2013)
[134] Swirszcz, G., Czarnecki, W.M., Pascanu, R.: Local minima in training of deep networks. CoRR (2016). arXiv:1611.06310v1
[135] Teboulle, M., A unified continuous optimization framework for center-based clustering methods, J. Mach. Learn. Res., 8, 65-102, (2007) · Zbl 1222.68318
[136] Teo, C.H., Smola, A., Vishwanathan, S., Le, Q.V.: A scalable modular convex solver for regularized risk minimization. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 727-736. ACM (2007)
[137] Tirumala, S.S., Ali, S., Ramesh, C.P.: Evolving deep neural networks: A new prospect. In: 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD), 2016, pp. 69-74. IEEE (2016)
[138] Toh, KA, Deterministic global optimization for FNN training, IEEE Trans. Syst. Man Cybern. Part B (Cybern.), 33, 977-983, (2003)
[139] Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Berlin (2013) · Zbl 0934.62009
[140] Voglis, C.; Lagaris, I., A global optimization approach to neural network training, Neural Parallel Sci. Comput., 14, 231, (2006) · Zbl 1152.90673
[141] Voglis, C.; Lagaris, IE, Towards ideal multistart: a stochastic approach for locating the minima of a continuous function inside a bounded domain, Appl. Math. Comput., 213, 216-229, (2009) · Zbl 1167.65377
[142] Wang, D., Editorial: Randomized algorithms for training neural networks, Inf. Sci., 364-365, 126-128, (2016)
[143] Werbos, P.J.: Supervised learning: Can it escape its local minimum? In: Theoretical Advances in Neural Computation and Learning, pp. 449-461. Springer (1994) · Zbl 0825.68543
[144] Yeung, DS; Li, JC; Ng, WWY; Chan, PPK, Mlpnn training via a multiobjective optimization of training error and stochastic sensitivity, IEEE Trans. Neural Netw. Learn. Syst., 27, 978-992, (2016)
[145] Yu, W.; Zhuang, F.; He, Q.; Shi, Z., Learning deep representations via extreme learning machines, Neurocomputing, 149, 308-315, (2015)
[146] Zhang, JR; Zhang, J.; Lok, TM; Lyu, MR, A hybrid particle swarm optimization-back-propagation algorithm for feedforward neural network training, Appl. Math. Comput., 185, 1026-1037, (2007) · Zbl 1112.65059
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.