×

Convergence analyses on sparse feedforward neural networks via group lasso regularization. (English) Zbl 1429.68258

Summary: In this paper, a new variant of feedforward neural networks has been proposed for a class of nonsmooth optimization problems. The penalty term of the presented neural networks stems from the Group Lasso method which selects hidden variables in a grouped manner. To deal with the non-differentiability of the original penalty term (\(l_1 - l_2\) norm) and avoid oscillations, smoothing techniques have been used to approximate the objective function. It is assumed that the training samples are supplied to the networks in a specific incremental way during training, that is, in each cycle samples are supplied in a fixed order. Then, under suitable assumptions on learning rate, penalization coefficients and smoothing parameters, the weak and strong convergence of the training process for the smoothing neural networks have been proved. The convergence analysis shows that the gradient of the smoothing error function approaches zero and the weight sequence converges to a fixed point, respectively. We demonstrate how the smoothing approximation parameter can be updated in the training procedure so as to guarantee the convergence of the procedure to a Clarke stationary point of the original optimization problem. In addition, we have proved that the original nonsmoothing algorithm with \(l_1 - l_2\) norm penalty converges consistently to the same optimum solution with the corresponding smoothed algorithm. Numerical simulations demonstrate the convergence and effectiveness of the proposed training algorithm.

MSC:

68T07 Artificial neural networks and deep learning

Software:

PRMLT
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Augasta, M. G.; Kathirvalavakumar, T., A novel pruning algorithm for optimizing feedforward neural network of classification problems, Neural Process. Lett., 34, 3, 241-258 (2011)
[2] Augasta, M. G.; Kathirvalavakumar, T., Pruning algorithms of neural networks-a comparative study, Cent. Eur. J. Comput. Sci., 3, 3, 105-115 (2013)
[3] Belue, L. M.; Bauer, K. W., Determining input features for multilayer perceptrons, Neurocomputing, 7, 2, 111-121 (1995)
[4] Bertsekas, D. P., Nondifferentiable optimization via approximation, Math. Program. Study, 3, 1-25 (1975) · Zbl 0383.49025
[5] Bian, W.; Xue, X., Subgradient-based neural networks for nonsmooth nonconvex optimization problems, Neural Netw. IEEE Trans., 20, 6, 1024-1038 (2009)
[6] Bian, W.; Chen, X., Worst-case complexity of smoothing quadratic regularization methods for non-lipschitzian optimization, SIAM J. Optim., 23, 3, 1718-1741 (2013) · Zbl 1282.90175
[7] Bian, W.; Chen, X., Smoothing neural network for constrained non-lipschitz optimization with applications, Neural Netw. Learn. Syst. IEEE Trans., 23, 3, 399-411 (2012)
[8] Bian, W.; Chen, X., Neural network for nonsmooth, nonconvex constrained minimization via smooth approximation, Neural Netw. Learn. Syst. IEEE Trans., 25, 3, 545-556 (2014)
[9] Bishop, C. M., Pattern Recognition and Machine Learning (2006), Springer · Zbl 1107.68072
[10] Chakraborty, D.; Pal, N. R., A novel training scheme for multilayered perceptrons to realize proper generalization and incremental learning, Neural Netw. IEEE Trans., 14, 1, 1-14 (2003)
[11] Chen, S., Local regularization assisted orthogonal least squares regression, Neurocomputing, 69, 4-6, 559-585 (2006)
[12] Chen, X., Smoothing methods for nonsmooth, nonconvex minimization, Math. Program., 134, 1, 71-99 (2012) · Zbl 1266.90145
[13] Chen, X.; Xu, F.; Ye, Y., Lower bound theory of nonzero entries in solutions of \(\ell_2 - \ell_p\) minimization, SIAM J. Sci. Comput., 32, 5, 2832-2852 (2010) · Zbl 1242.90174
[14] Chen, X.; Zhou, W., Smoothing nonlinear conjugate gradient method for image restoration using nonsmooth nonconvex minimization, SIAM J. Imaging Sci., 3, 4, 765-790 (2010) · Zbl 1200.65031
[15] Chen, Z.; Haykin, S., On different facets of regularization theory, Neural Comput., 14, 12, 2791-2846 (2002) · Zbl 1079.68585
[16] Fletcher, L.; Katkovnik, V.; Steffens, F. E.; Engelbrecht, A. P., Optimizing the number of hidden nodes of a feedforward artificial neural network, Proc. IEEE world congress on computational intelligence, The international joint conference on neural networks, 1608-1612 (1998)
[18] Forti, M.; Nistri, P.; Quincampoix, M., Generalized neural network for nonsmooth nonlinear programming problems, Circuits Syst. I IEEE Trans., 51, 9, 1741-1754 (2004) · Zbl 1374.90356
[19] Hagiwara, M., A simple and effective method for removal of hidden units and weights, Neurocomputing, 6, 2, 207-218 (1994)
[20] Hanson, S. J.; Pratt, L. Y., Comparing biases for minimal network construction with back-propagation, Advances in Neural Information Processing Systems, 177-185 (1989)
[21] Haykin, S. S., Neural Networks : A Comprehensive Foundation (1999), Prentice Hall: Prentice Hall 2nd, Englewood Cliffs, NJ, USA · Zbl 0934.68076
[22] Haykin, S., Neural Networks and Learning Machines (2009), Prentice Hall: Prentice Hall McMaster University. Hamilton, Ontario, Canada.
[23] Heskes, T.; Wiegerinck, W., A theoretical comparison of batch-mode, on-line, cyclic, and almost-cyclic learning, Neural Netw. IEEE Trans., 7, 4, 919-925 (1996)
[24] Hinton, G. E., Connectionist learning procedures, Artif. Intell., 40, 1, 185-234 (1989)
[25] Hsieh, W. W., Machine Learining Methods in the Envriomental Sciences (2009), Cambridge Univ. Press: Cambridge Univ. Press Cambridge, U.K. · Zbl 1179.68120
[27] Huang, S. C.; Huang, Y. F., Bounds on the number of hidden neurons in multilayer perceptrons, Neural Netw. IEEE Trans., 2, 1, 47-55 (1991)
[28] Huynh, T. Q.; Setiono, R., Effective neural network pruning using cross-validation, Proc. IEEE international joint conference on neural networks (IJCNN), 972-977 (2005)
[29] Iskandarani, M. Z., A novel approach to system security using derived odor keys with weight elimination neural algorithm (DOK-WENA), Trans. Mach. Learn. Artif. Intell., 2, 2, 20-31 (2014)
[30] Li, Z.; Wu, W.; Tian, Y., Convergence of an online gradient method for feedforward neural networks with stochastic inputs, J. Comput. Appl. Math., 163, 1, 165-176 (2004) · Zbl 1062.68101
[31] Lu, W.; Wang, J., Convergence analysis of a class of nonsmooth gradient systems, Circuits Syst. I IEEE Trans., 55, 11, 3514-3527 (2008) · Zbl 1246.65013
[32] May, P.; Zhou, E.; Lee, C., A comprehensive evaluation of weight growth and weight elimination methods using the tangent plane algorithm, Int. J. Adv. Comput. Sci. Appl., 4, 6 (2013)
[33] Moody, J. O.; Antsaklis, P. J., The dependence identification neural network construction algorithm, Neural Netw. IEEE Trans., 7, 1, 3-15 (1996)
[34] Moody, J. E.; Rögnvaldsson, T. S., Smoothing Regularizers for Projective Basis Function Networks (1996), CSETech
[35] Ripley, B. D., Pattern Recognition and Neural Network (2008), Cambridge Univ. Press: Cambridge Univ. Press Cambridge, U.K. · Zbl 1163.62047
[36] Sabo, D.; Yu, X.-H., Neural network dimension selection for dynamical system identification, Proc. IEEE International Conference on Control Applications, 972-977 (2008)
[37] Setiono, R.; Hui, L. C.K., Use of a quasi-newton method in a feedforward neural network construction algorithm, Neural Netw. IEEE Trans., 6, 1, 273-277 (1995)
[38] Setiono, R., A penalty-function approach for pruning feedforward neural networks, Neural Comput., 9, 1, 185-204 (1997) · Zbl 0872.68154
[39] Sum, J.; Leung, C. S.; Ho, K., Convergence analyses on on-line weight noise injection-based training algorithms for MLPs, Neural Netw. Learn. Syst. IEEE Trans., 23, 11, 1827-1840 (2012)
[40] Sun, W.; Yuan, Y.-X., Optimization Theory and Methods: Nonlinear Programming (2006), Springer Science & Business Media · Zbl 1129.90002
[41] Tadic, V.; Stankovic, S., Learning in neural networks by normalized stochastic gradient algorithm: local convergence, Proc. Seminar Neural Networks Application Electronic Engineering, 11-17 (2000)
[42] Tibshirani, R., Regression shrinkage and selection via the lasso, J. R. Stat. Soc. Ser. B (Methodological), 267-288 (1996) · Zbl 0850.62538
[43] Wang, J.; Wu, W.; Zurada, J. M., Boundedness and convergence of MPN for cyclic and almost cyclic learning with penalty, Proc. IEEE International Joint Conference on Neural Networks (IJCNN), 125-132 (2011)
[44] Weigend, A. S.; Rumelhart, D. E.; Huberman, B. A., Generalization by weight-elimination applied to currency exchange rate prediction, Proc. IEEE International Joint Conf. Neural Netw. (IJCNN), 837-841 (1991)
[45] Whitley, D.; Starkweather, T.; Bogart, C., Genetic algorithms and neural networks: optimizing connections and connectivity, Parallel Comput., 14, 3, 347-361 (1990)
[46] Wu, W.; Wang, J.; Cheng, M.; Li, Z., Convergence analysis of online gradient method for BP neural networks, Neural Netw., 24, 1, 91-98 (2011) · Zbl 1217.68191
[47] Wu, W.; Xu, Y., Deterministic convergence of an online gradient method for neural networks, J. Comput. Appl. Math., 144, 1, 335-347 (2002) · Zbl 1066.68111
[48] Wu, L.; Moody, J., A smoothing regularizer for feedforward and recurrent neural networks, Neural Comput., 8, 3, 461-489 (1996)
[49] Bottou, L., Large-scale machine learning with stochastic gradient descent, Proceedings of COMPSTAT’2010, 177-186 (2010) · Zbl 1436.68293
[50] Shalev-Shwartz, S., Online learning and online convex optimization, Found. Trends Mach. Learn., 4, 2, 107-194 (2012) · Zbl 1253.68190
[51] Xu, L.; Chen, J.; Huang, D.; Lu, J.; Fang, L., Analysis of boundedness and convergence of online gradient method for two-layer feedforward neural networks, Neural Netw. Learn. Syst. IEEE Trans., 24, 8, 1327-1337 (2013)
[52] Xu, Z. B.; Zhang, R.; Jing, W.-F., When does online BP training converge?, Neural Netw. IEEE Trans., 20, 10, 1529-1539 (2009)
[53] Zhang, H.; Wu, W.; Liu, F.; Yao, M., Boundedness and convergence of online gradient method with penalty for feedforward neural networks, Neural Netw. IEEE Trans., 20, 6, 1050-1054 (2009)
[54] Yuan, M.; Lin, Y., Model selection and estimation in regression with grouped variables, J. R. Stat. Soc, 68, 1, 49-67 (2006) · Zbl 1141.62030
[55] Zhang, J.; Morris, A. J., A sequential learning approach for single hidden layer neural networks, Neural Netw., 11, 1, 65-80 (1998)
[56] Zou, H., The adaptive lasso and its oracle properties, J. Am. Stat. Assoc., 101, 476, 1418-1429 (2006) · Zbl 1171.62326
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.