×

zbMATH — the first resource for mathematics

Scale-invariant learning and convolutional networks. (English) Zbl 1392.68340
Summary: Multinomial logistic regression and other classification schemes used in conjunction with convolutional networks (convnets) were designed largely before the rise of the now standard coupling with convnets, stochastic gradient descent, and backpropagation. In the specific application to supervised learning for convnets, a simple scale-invariant classification stage is more robust than multinomial logistic regression, appears to result in somewhat lower errors on several standard test sets, has similar computational costs, and features precise control over the actual rate of learning. “Scale-invariant” means that multiplying the input values by any nonzero real number leaves the output unchanged.

MSC:
68T05 Learning and adaptive systems in artificial intelligence
62H30 Classification and discrimination; cluster analysis (statistical aspects)
65T60 Numerical methods for wavelets
94A08 Image processing (compression, reconstruction, etc.) in information and communication theory
94A12 Signal theory (characterization, reconstruction, filtering, etc.)
Software:
CIFAR; ElemStatLearn
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Carandini, M.; Heeger, D. J., Normalization as a canonical neural computation, Nat. Rev., Neurosci., 13, 51-52, (2012)
[2] Hastie, T.; Tibshirani, R.; Friedman, J., Elements of statistical learning: data mining, inference, and prediction, (2009), Springer · Zbl 1273.62005
[3] Hill, S. I.; Doucet, A., A framework for kernel-based multi-category classification, J. Artificial Intelligence Res., 30, 525-564, (2007) · Zbl 1182.68197
[4] Ioffe, S.; Szegedy, C., Batch normalization: accelerating deep network training by reducing internal covariate shift, (2015), Tech. rep.
[5] Krizhevsky, A., Learning multiple layers of features from tiny images, (2009), University of Toronto, Department of Computer Science, Tech. rep., Master’s thesis
[6] Lange, K.; Wu, T. T., An MM algorithm for multicategory vertex discriminant analysis, J. Comput. Graph. Statist., 17, 527-544, (2008)
[7] LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P., Gradient-based learning applied to document recognition, Proc. IEEE, 86, 2278-2324, (1998)
[8] Magnus, J. R.; Neudecker, H., Matrix differential calculus with applications in statistics and econometrics, (2007), John Wiley and Sons
[9] Mallat, S., Recursive interferometric representations, (Proc. EUSIPCO Conf. 2010, EURASIP, (August 2010)), 716-720
[10] Mroueh, Y.; Poggio, T.; Rosasco, L.; Slotine, J.-J., Multiclass learning with simplex coding, Adv. Neural Inf. Process. Syst., vol. 25, 2789-2797, (2012), Curran Associates
[11] Russakovsky, O.; Deng, J.; Su, H.; Kruse, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; Berg, A. C.; Fei-Fei, L., Imagenet large scale visual recognition challenge, (2015), Tech. rep.
[12] Saberian, M. J.; Vasconcelos, N., Multiclass boosting: theory and algorithms, (Adv. Neural Inf. Process. Syst., vol. 24, (2011), Curran Associates), 2124-2132
[13] Tygert, M.; Bruna, J.; Chintala, S.; LeCun, Y.; Piantino, S.; Szlam, A., A mathematical motivation for complex-valued convolutional networks, Neural Comput., 28, 815-825, (2016)
[14] Wu, T. T.; Lange, K., Multicategory vertex discriminant analysis for high-dimensional data, Ann. Appl. Stat., 4, 1698-1721, (2010) · Zbl 1220.62086
[15] Wu, T. T.; Wu, Y., Nonlinear vertex discriminant analysis with reproducing kernels, Stat. Anal. Data Min., 5, 167-176, (2012)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.