×

zbMATH — the first resource for mathematics

Training a multilayer network with low-memory kernel-and-range projection. (English) Zbl 07160360
Summary: Recently, a learning method based on the kernel and the range space projections has been proposed. This method has been applied to learn the multilayer network analytically with interpretable relationships among the weight matrices. However, the bulk matrix based formulation suffers from a high memory demand during network learning. In this study, a low-memory resolution is proposed to address the memory demanding problem. Essentially, the bulk matrix operations are implemented by a low-memory formulation in which only one training sample is processed at a time. Such a formulation is proved to be mathematically equivalent to the original batch learning version. We also point out that the rounding errors in computing systems could hinder the performance of the proposed formulation. This formulation is then robustified by introducing a regularization technique with the cost of an additional but negligible memory usage. Our experiments show that the proposed low-memory resolution can indeed tremendously reduce the memory consumption while maintaining reasonably good performances in both regression and classification tasks.
MSC:
68 Computer science
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Alimoglu, F.; Alpaydin, E.; Denizhan, Y., Combining multiple classifiers for pen-based handwritten digit recognition (1996), arXiv:1411.6191
[2] D. Balduzzi, H. Vanchinathan, J. Buhmann, Kickback cuts Backprop’s red-tape: Biologically plausible credit assignment in neural networks (2014).
[3] Barton, S. A., A matrix method for optimizing a neural network, Neural Comput., 3, 3, 450-459 (1991)
[4] Ben-Israel, A.; Greville, T. N.E., Generalized Inverses: Theory and Applications (2003), Springer: Springer New York · Zbl 1026.15004
[5] Cai, L.; Zhu, J.; Zeng, H.; Chen, J.; Cai, C.; Ma, K.-K., Hog-assisted deep feature learning for pedestrian gender recognition, J. Frankl. Inst., 355, 4, 1991-2008 (2018)
[6] Campbell, S. L.; Meyer, C. D., Generalized inverses of linear transformations, 56 (2009), SIAM · Zbl 1158.15301
[7] Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A. L., Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFS, IEEE Trans. Pattern Anal. Mach. Intell., 40, 4, 834-848 (2018)
[8] Chen, S.; Grant, P.; Cowan, C., Orthogonal least-squares algorithm for training multioutput radial basis function networks, IEE Proceedings F (Radar and Signal Processing), 139, 378-384 (1992), IET
[9] Cheney, E. W.; Kincaid, D. R., Numerical mathematics and computing (2012), Cengage Learn.
[10] K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, Y. Bengio, Learning phrase representations using RNN encoder-decoder for statistical machine translation, arXiv preprint arXiv:1406.1078 (2014).
[11] Chong, E. K.; Zak, S. H., An Introduction to Optimization, 76 (2013), John Wiley & Sons · Zbl 1266.90001
[12] M. Dawson, J. Olvera, A. Fung, M. Manry, Inversion of surface parameters using fast learning neural networks (1992).
[13] Dawson, M. S.; Fung, A. K.; Manry, M. T., Surface parameter retrieval using fast learning neural networks, Remote Sens. Rev., 7, 1, 1-18 (1993)
[14] Diestel, R., Graph Theory (2018), Springer Publishing Company, Incorporated
[15] Ding, C.; Tao, D., Trunk-branch ensemble convolutional neural networks for video-based face recognition, IEEE Trans. Pattern Anal. Mach. Intell., 40, 4, 1002-1014 (2018)
[16] S. Duan, S. Yu, Y. Chen, J. Principe, Learning backpropagation-free deep architectures with kernels (2018). arXiv:1802.03774
[17] Elman, J. L., Finding structure in time, Cogn. Sci., 14, 2, 179-211 (1990)
[18] Goodfellow, I.; Bengio, Y.; Courville, A.; Bengio, Y., Deep Learning, 1 (2016), MIT Press: MIT Press Cambridge · Zbl 1373.68009
[19] Greville, T., Some applications of the pseudoinverse of a matrix, SIAM Rev., 2, 1, 15-22 (1960) · Zbl 0168.13303
[20] He, K.; Zhang, X.; Ren, S.; Sun, J., Deep residual learning for image recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770-778 (2016)
[21] Hochreiter, S.; Schmidhuber, J., Long short-term memory, Neural Comput., 9, 8, 1735-1780 (1997)
[22] Hoo-Chang, S.; Roth, H. R.; Gao, M.; Lu, L.; Xu, Z.; Nogues, I.; Yao, J.; Mollura, D.; Summers, R. M., Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning, IEEE Trans. Med. Imaging, 35, 5, 1285 (2016)
[23] Huang, G.; Liu, Z.; van der Maaten, L.; Weinberger, K. Q., Densely connected convolutional networks, Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2261-2269 (2017), IEEE
[24] Huang, G.-B.; Liang, N.-Y.; Rong, H.-J.; Saratchandran, P.; Sundararajan, N., On-line sequential extreme learning machine., Proceedings of the IASTED International Conference on Computational Intelligence, 2005, 232-237 (2005)
[25] Huang, G.-B.; Zhu, Q.-Y.; Siew, C.-K., Extreme learning machine: theory and applications, Neurocomputing, 70, 1-3, 489-501 (2006)
[26] Jaderberg, M.; Czarnecki, W. M.; Osindero, S.; Vinyals, O.; Graves, A.; Silver, D.; Kavukcuoglu, K., Decoupled neural interfaces using synthetic gradients, Proceedings of the Thirty-forth International Conference on Machine Learning-Volume 70, 1627-1635 (2017), JMLR. org
[27] Kang, K.; Li, H.; Yan, J.; Zeng, X.; Yang, B.; Xiao, T.; Zhang, C.; Wang, Z.; Wang, R.; Wang, X., T-CNN: tubelets with convolutional neural networks for object detection from videos, IEEE Trans. Circuits Syst. Video Technol., 28, 10, 2896-2907 (2018)
[28] Kay, S. M., Fundamentals of Statistical Signal Processing: Practical Algorithm Development, 3 (2013), Pearson Education
[29] Kingma, D. P.; Mohamed, S.; Rezende, D. J.; Welling, M., Semi-supervised learning with deep generative models, Advances in Neural Information Processing Systems, 3581-3589 (2014)
[30] A. Krizhevsky, I. Sutskever, G.E. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, 2012, http://papers.nips.cc/paper/4824-imagenetclassification-with-deep-convolutional-neural-networks.
[31] LeCun, Y.; Bengio, Y., Convolutional networks for images, speech, and time series, Handb. Brain Theory Neural Netw., 3361, 10, 1995 (1995)
[32] LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P., Gradient-based learning applied to document recognition, Proc. IEEE, 86, 11, 2278-2324 (1998)
[33] LeCun, Y.; Huang, F. J.; Bottou, L., Learning methods for generic object recognition with invariance to pose and lighting, CVPR (2), 97-104 (2004), Citeseer
[34] Malalur, S. S.; Manry, M. T., Multiple optimal learning factors for feed-forward networks, Proceedings of the Independent Component Analyses, Wavelets, Neural Networks, Biosystems, and Nanoengineering VIII, 7703, 77030F (2010), International Society for Optics and Photonics
[35] Martínez-Rego, D.; Fontenla-Romero, O.; Alonso-Betanzos, A., Nonlinear single layer neural network training algorithm for incremental, nonstationary and distributed learning scenarios, Pattern Recogn., 45, 12, 4536-4546 (2012) · Zbl 1248.68412
[36] Martínez-Rego, D.; Fontenla-Romero, O.; Alonso-Betanzos, A., Exact incremental learning for a single non-linear neuron based on taylor expansion and Greville formula, Proceedings of the Conference of the Spanish Association for Artificial Intelligence, 149-158 (2013), Springer
[37] Special Issue on Recent advances in machine learning for signal analysis and processing. · Zbl 1395.94053
[38] Mirza, B.; Lin, Z., Meta-cognitive online sequential extreme learning machine for imbalanced and concept-drifting data classification, Neural Netw., 80, 79-94 (2016)
[39] Oppenheim, A. V., Discrete-Time Signal Processing (1999), Pearson Education India
[40] A. Radford, L. Metz, S. Chintala, Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks (2015). arXiv:1511.06434
[41] http://www.sciencedirect.com/science/article/pii/0024379585901260. · Zbl 0561.15006
[42] Robbins, H.; Monro, S., A stochastic approximation method, Herbert Robbins Selected Papers, 102-109 (1985), Springer
[43] Rumelhart, D. E.; Hinton, G. E.; Williams, R. J., Learning representations by back-propagating errors, Nature, 323, 6088, 533 (1986) · Zbl 1369.68284
[44] Simon, D., Optimal State Estimation: Kalman, H Infinity, and Nonlinear Approaches (2006), John Wiley & Sons
[45] K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv:1409.1556 (2014).
[46] Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A., Going deeper with convolutions, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1-9 (2015)
[47] Tasi, M. B.; Stanimirovi, P. S.; Pepi, S. H., About the generalized LM-inverse and the weighted Moore-Penrose inverse, Appl. Math. Comput., 216, 114-124 (2010) · Zbl 1191.65038
[48] Toh, K.-A., Analytic network learning, Technical Report (2018), arXiv:1811.08227
[49] Toh, K.-A., Kernel and range approach to analytic network learning, Int. J. Netw. Distrib. Comput., 7, 1, 20-28 (2018)
[50] Toh, K.-A., Learning from the kernel and the range space, Proceedings of the Seventeenth 2018 IEEE Conference on Computer and Information Science (ICIS), 417-422 (2018), IEEE
[51] Udwadia, F.; Kalaba, R., An alternative proof of the Greville formula, J. Opt. Theory Appl., 94, 1, 23-28 (1997) · Zbl 0893.65020
[52] Udwadia, F.; Kalaba, R., General forms for the recursive determination of generalized inverses: unified approach, J. Opt. Theory Appl., 101, 3, 509-521 (1999) · Zbl 0946.90117
[53] Udwadia, F.; Kalaba, R., Sequential determination of the 1, 4-inverse of a matrix, J. Opt. Theory Appl., 117, 1, 1-7 (2003) · Zbl 1040.65033
[54] Udwadia, F. E.; Phohomsiri, P., Recursive determination of the generalized Moore-Penrose M-inverse of a matrix, J. Opt. Theory Appl. (2005) · Zbl 1100.65040
[55] (Ph. D. dissertation)
[56] Werbos, P. J., Generalization of backpropagation with application to a recurrent gas market model, Neural Netw., 1, 4, 339-356 (1988)
[57] H. Xiao, K. Rasul, R. Vollgraf, Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms (2017). arXiv:1708.07747.
[58] Xu, L.; Krzyzak, A.; Suen, C. Y., Methods of combining multiple classifiers and their applications to handwriting recognition, IEEE Trans. Syst. Man Cybern., 22, 3, 418-435 (1992)
[59] Zhou, J.; Zhu, Y.; Li, X. R.; You, Z., Variants of the Greville formula with applications to exact recursive least squares, SIAM J. Matrix Anal. Appl., 24, 1, 150-164 (2002) · Zbl 1029.65040
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.