GluonCV and GluonNLP: deep learning in computer vision and natural language processing. (English) Zbl 07255054

Summary: We present GluonCV and GluonNLP, the deep learning toolkits for computer vision and natural language processing based on Apache MXNet (incubating). These toolkits provide state-of-the-art pre-trained models, training scripts, and training logs, to facilitate rapid prototyping and promote reproducible research. We also provide modular APIs with flexible building blocks to enable efficient customization. Leveraging the MXNet ecosystem, the deep learning models in GluonCV and GluonNLP can be deployed onto a variety of platforms with different programming languages. The Apache 2.0 license has been adopted by GluonCV and GluonNLP to allow for software distribution, modification, and usage.


68T05 Learning and adaptive systems in artificial intelligence
Full Text: arXiv Link


[1] Mart´ın Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. TensorFlow: A system for large-scale machine learning. In12th USENIX Symposium on Operating Systems Design and Implementation, pages 265-283, 2016.
[2] Fr´ed´eric Bastien, Pascal Lamblin, Razvan Pascanu, James Bergstra, Ian Goodfellow, Arnaud Bergeron, Nicolas Bouchard, David Warde-Farley, and Yoshua Bengio. Theano: new features and speed improvements.arXiv preprint arXiv:1211.5590, 2012.
[3] Joao Carreira and Andrew Zisserman. Quo vadis, action recognition? a new model and the kinetics dataset. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 6299-6308, 2017.
[4] Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. MXNet: A flexible and efficient machine learning library for heterogeneous distributed systems.arXiv preprint arXiv:1512.01274, 2015.
[5] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova.Bert:Pretraining of deep bidirectional transformers for language understanding.arXiv preprint arXiv:1810.04805, 2018.
[6] Ross Girshick. Fast R-CNN. InProceedings of the IEEE international conference on computer vision, pages 1440-1448, 2015.
[7] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 770-778, 2016.
[8] Kaiming He, Georgia Gkioxari, Piotr Doll´ar, and Ross Girshick. Mask R-CNN. InProceedings of the IEEE international conference on computer vision, pages 2961-2969, 2017.
[9] Tong He, Zhi Zhang, Hang Zhang, Zhongyue Zhang, Junyuan Xie, and Mu Li. Bag of tricks for image classification with convolutional neural networks. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 558-567, 2019.
[10] Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications.arXiv preprint arXiv:1704.04861, 2017.
[11] Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. Caffe: Convolutional architecture for fast feature embedding.arXiv preprint arXiv:1408.5093, 2014.
[12] Yoon Kim.Convolutional neural networks for sentence classification.arXiv preprint arXiv:1408.5882, 2014.
[13] Sebastian Knorr, Matis Hudon, Julian Cabrera, Thomas Sikora, and Aljosa Smolic. Deepstereobrush: Interactive depth map creation. In2018 International Conference on 3D Immersion (IC3D), pages 1-8. IEEE, 2018.
[14] Zachary C Lipton, David C Kale, Charles Elkan, and Randall Wetzel. Learning to diagnose with lstm recurrent neural networks.arXiv preprint arXiv:1511.03677, 2015.
[15] Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in pytorch. 2017.
[16] Frank Seide and Amit Agarwal. CNTK: Microsoft’s open-source deep-learning toolkit. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 2135-2135. ACM, 2016.
[17] Seiya Tokui, Kenta Oono, Shohei Hido, and Justin Clayton. Chainer: a next-generation open source framework for deep learning. InProceedings of workshop on machine learning systems (LearningSys) in the twenty-ninth annual conference on neural information processing systems (NIPS), volume 5, pages 1-6, 2015.
[18] Bin Xiao, Haiping Wu, and Yichen Wei. Simple baselines for human pose estimation and tracking. InProceedings of the European Conference on Computer Vision (ECCV), pages 466-481, 2018.
[19] Aston Zhang, Zachary C. Lipton, Mu Li, and Alexander J. Smola.Dive into Deep Learning. 2020.http://www.d2l.ai.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.