×

Rethinking arithmetic for deep neural networks. (English) Zbl 1462.68168

Summary: We consider efficiency in the implementation of deep neural networks. Hardware accelerators are gaining interest as machine learning becomes one of the drivers of high-performance computing. In these accelerators, the directed graph describing a neural network can be implemented as a directed graph describing a Boolean circuit. We make this observation precise, leading naturally to an understanding of practical neural networks as discrete functions, and show that the so-called binarized neural networks are functionally complete. In general, our results suggest that it is valuable to consider Boolean circuits as neural networks, leading to the question of which circuit topologies are promising. We argue that continuity is central to generalization in learning, explore the interaction between data coding, network topology, and node functionality for continuity and pose some open questions for future research. As a first step to bridging the gap between continuous and Boolean views of neural network accelerators, we present some recent results from our work on LUTNet, a novel Field-Programmable Gate Array inference approach. Finally, we conclude with additional possible fruitful avenues for research bridging the continuous and discrete views of neural networks.

MSC:

68T07 Artificial neural networks and deep learning
68M07 Mathematical problems of computer architecture
68Q06 Networks and circuits as models of computation; circuit complexity
68R10 Graph theory (including graph drawing) in computer science
68T05 Learning and adaptive systems in artificial intelligence
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Goodfellow I, Bengio Y, Courville A. 2016 Deep learning. Cambridge, MA: MIT Press. · Zbl 1373.68009
[2] Wang E, Davis J, Zhao R, Ng H-C, Niu X, Luk W, Cheung P, Constantinides G. 2019 Deep neural network approximation for custom hardware: where we’ve been, where we’re going. ACM Comput. Surv. 52 .
[3] Searcóid MO. 2007 Metric spaces. Berlin, Germany: Springer. · Zbl 1109.54001
[4] Scheinberg K. 2016 Evolution of randomness in optimization methods for supervised machine learning. SIAG/OPT views and news (ed. S Wild), vol. 24, pp. 1-7. http://wiki.siam.org/siag-op/index.php/View_and_News.
[5] LeCun Y. 1989 Generalization and network design strategies. University of Toronto, Technical Report. CRG-TR-89-4.
[6] Hochreiter S, Schmidhuber J. 1997 Long short-term memory. Neural Comput. 9, 1735-1780. (doi:10.1162/neco.1997.9.8.1735) · doi:10.1162/neco.1997.9.8.1735
[7] Kahn G. 1974 The semantics of a simple language for parallel programming. In Proc. IFIP Congress on Information Processing, Stockholm, Sweden, 5-10 August 1974. Amsterdam, Netherlands: North-Holland. · Zbl 0299.68007
[8] Strang G. 2018 The functions of deep learning. SIAM News, December.
[9] Blum L, Shub M, Smale S. 1989 On a theory of computation and complexity over the real numbers: NP-completeness, recursive functions and universal machines. Bull. Am. Math. Soc. 21, 1-46. (doi:10.1090/S0273-0979-1989-15750-9) · Zbl 0681.03020 · doi:10.1090/S0273-0979-1989-15750-9
[10] Hornik K. 1991 Approximation capabilities of multilayer feedforward networks. Neural Netw. 4, 251-257. (doi:10.1016/0893-6080(91)90009-T) · doi:10.1016/0893-6080(91)90009-T
[11] Neal R. 1994 Priors for infinite networks. University of Toronto, Technical Report. CRG-TR-94-1.
[12] Du SS, Zhai X, Póczos B, Singh A. 2018 Gradient descent provably optimizes over-parameterized neural networks. CoRR (http://arxiv.org/abs/quant-ph/1810.02054).
[13] Brayton R, Hachtel G, Sangiovanni-Vincentelli A. 1990 Multilevel logic synthesis. Proc. IEEE 78, 264-300. (doi:10.1109/5.52213) · doi:10.1109/5.52213
[14] IEEE standard for floating-point arithmetic. IEEE Std. 754-2008. 2008.
[15] Har-Even B. 2018 PowerVR Series2NX: Raising the bar for embedded AI. www.imgtec.com/blog/powervr-series2nx-raising-the-bar-for-embedded-ai/.
[16] Higham N. 2002 Accuracy and stability of numerical algorithms. Philadelphia, PA: Society for Industrial and Applied Mathematics. · Zbl 1011.65010
[17] Wolf C. 2013 Yosys open synthesis suite. www.clifford.at/yosys/.
[18] Gephi 2017 The open graph viz platform. http://gephi.org/.
[19] Courbariaux M, Bengio Y. 2016 Binarized neural networks: training deep neural networks with weights and activations constrained to +1 or −1. CoRR (http://arxiv.org/abs/quant-ph/abs/1602.02830).
[20] Umuroglu Y, Fraser NJ, Gambardella G, Blott M, Leong P, Jahre M, Vissers K. 2017 FINN: A framework for fast, scalable binarized neural network inference. In Proc. ACM/SIGDA Int. Symp. on Field-Programmable Gate Arrays, Monterey, CA, 22-24 February 2017, pp. 65-74. New York, NY: ACM.
[21] Warren H. 2012 Hacker’s delight, 2nd edn. Reading, MA: Addison Wesley.
[22] Triggs R. 2018 A closer look at Arm’s machine learning hardware. www.androidauthority.com/arm-project-trillium-842770/.
[23] Clote P, Kranakis E. 2002 Boolean functions and computation models. Berlin, Germany: Springer. · Zbl 1016.94046
[24] Venkatesh G, Nurvitadhi E, Marr D. 2017 Accelerating deep convolutional neural networks using low precision and sparsity. In Proc. IEEE ICASSP, New Orleans, LA, 5-9 March 2017. Piscataway, NJ: IEEE.
[25] Su J. 2018 Artificial neural networks acceleration on field-programmable gate arrays considering model redundancy. Imperial College London, Technical Report.
[26] Weste N, Harris D. 2002 CMOS VLSI design: a circuits and system perspective. London, UK: Pearson.
[27] Koren I. 2001 Computer arithmetic algorithms. Natick, MA: A. K. Peters. · Zbl 0994.68186
[28] Trivedi KS, Ercegovac MD. 1977 On-line algorithms for division and multiplication. IEEE Trans. Comput. 26, 681-687. (doi:10.1109/TC.1977.1674901) · Zbl 0406.68040 · doi:10.1109/TC.1977.1674901
[29] Ercegovac M, Lang T. 2003 Digital arithmetic. Los Altos, CA: Morgan Kaufmann.
[30] Savage C. 1997 A survey of combinatorial Gray codes. SIAM Rev. 39, 605-629. (doi:10.1137/S0036144595295272) · Zbl 1049.94513 · doi:10.1137/S0036144595295272
[31] Bondy J. 1976 Graph theory with applications. Amsterdam, the Netherlands: Elsevier. · Zbl 1226.05083
[32] Gillis N, Glineur F. 2014 A continuous characterization of the maximum-edge biclique problem. J. Global Optim. 58, 439-464. (doi:10.1007/s10898-013-0053-2) · Zbl 1305.90349 · doi:10.1007/s10898-013-0053-2
[33] Hauck S, DeHon A. 2007 Reconfigurable computing: the theory and practice of FPGA-based computation. Los Altos, CA: Morgan Kaufmann. · Zbl 1147.68435
[34] Wang E, Davis J, Cheung P, Constantinides G. 2019 LUTNet: Rethinking inference in FPGA soft logic. In Proc. IEEE Int. Symp. Field-programmable Custom Computing Machines, Seaside, CA, 24-26 February 2019. New York, NY: ACM.
[35] Ghasemzadeh M, Samragh M, Koushanfar F. 2018 ReBNet: residual Binarized neural network. In Proc. IEEE Int. Symp. on Field-Programmable Custom Computing Machines, Boulder, CO, 29 April-1 May 2018. Piscataway, NJ: IEEE.
[36] Han S, Pool J, Tran J, Dally WJ. 2015 Learning Both Weights and Connections for Efficient Neural Network. In Conf. on Neural Information Processing Systems. Toronto, Canada: University of Toronto. www.cs.toronto.edu/kriz/learning-features-2009-TR.pdf.
[37] Krizhevsky A. 2009 Learning multiple layers of features from tiny images. University of Toronto, Technical Report.
[38] Zoph B, Le QV. 2016 Neural architecture search with reinforcement learning. CoRR (http://arxiv.org/abs/quant-ph/1611.01578). [Online]. Available: http://arxiv.org/abs/1611.01578.
[39] Quine W. 1952 The problem of simplifying truth functions. Am. Math. Mon. 59, 521-531. (doi:10.1080/00029890.1952.11988183) · Zbl 0048.24503 · doi:10.1080/00029890.1952.11988183
[40] Ruddell R, Sangiovanni-Vincentelli A. 1987 Multiple-valued minimization for PLA optimization. IEEE Trans. Computer-Aided Des. 6, 727-750. (doi:10.1109/TCAD.1987.1270318) · doi:10.1109/TCAD.1987.1270318
[41] Haaswijk W, Mishchenko A, Soeken M, Micheli GD. 2018 SAT based exact synthesis using DAG topology families. In Proc. Design Automation Conf., San Francisco, CA, 24-29 June 2018. New York, NY: ACM.
[42] Rissanen J. 1978 Modeling by shortest data description. Automatica 14, 465-471. (doi:10.1016/0005-1098(78)90005-5) · Zbl 0418.93079 · doi:10.1016/0005-1098(78)90005-5
[43] Gouk H, Frank E, Pfahringer B, Cree M. 2018 Regularisation of neural networks by enforcing Lipschitz continuity. CoRR (http://arxiv.org/abs/quant-ph/1804.04368). [Online]. Available: https://arxiv.org/abs/1804.04368.
[44] Scaman K, Virmaux A. 2018 Lipschitz regularity of deep neural networks: Analysis and efficient estimation. In Proc. Neural Information Processing Systems, Montréal, Canada, 3-8 December 2018. pp. 3839-3848. https://papers.nips.cc/paper/7640-lipschitz-regularity-of-deep-neural-networks-analysis-and-efficient-estimation.pdf.
[45] Dietterich T, Bakiri G. 1995 Solving multiclass learning problems via error-correcting output codes. J. Artif. Intell. Res. 2, 263-286. (doi:10.1613/jair.105) · Zbl 0900.68358 · doi:10.1613/jair.105
[46] Gupta S, Agrawal A, Gopalakrishnan K, Narayanan P. 2015 Deep learning with limited numerical precision. In Proc. 32nd Int. Conf. on Machine Learning, Lille, France, 6-11 July 2015. pp. 1737-1746. http://proceedings.mlr.press/v37/gupta15.pdf.
[47] Hopkins M, Mikaitis M, Lester D, Furber S. 2019 Stochastic rounding and reduced-precision fixed-point arithmetic for solving neural odes. CoRR (http://arxiv.org/abs/quant-ph/1904.11263). · Zbl 1462.65081
[48] Murphy K. 2012 Machine learning: a probabilistic perspective. Cambridge, MA: MIT Press. · Zbl 1295.68003
[49] Nielson F, Nielson H, Hankin C. 2010 Principles of program analysis. Berlin, Germany: Springer. · Zbl 0932.68013
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.