How to construct low-altitude aerial image datasets for deep learning. (English) Zbl 07393490

Summary: The combination of Unmanned Aerial Vehicle (UAV) technologies and computer vision makes UAV applications more and more popular. Computer vision tasks based on deep learning usually require a large amount of task-related data to train algorithms for specific tasks. Since the commonly used datasets are not designed for specific scenarios, in order to give UAVs stronger computer vision capabilities, large enough aerial image datasets are needed to be collected to meet the training requirements. In this paper, we take low-altitude aerial image object detection as an example to propose a framework to demonstrate how to construct datasets for specific tasks. Firstly, we introduce the existing low-altitude aerial images datasets and analyze the characteristics of low-altitude aerial images. On this basis, we put forward some suggestions on data collection of low-altitude aerial images. Then, we recommend several commonly used image annotation tools and crowdsourcing platforms for data annotation to generate labeled data for model training. In addition, in order to make up the shortage of data, we introduce data augmentation techniques, including traditional data augmentation and data augmentation based on oversampling and generative adversarial networks.


68-XX Computer science
94-XX Information and communication theory, circuits
Full Text: DOI


[1] J, Estimating tree height and biomass of a poplar plantation with image-based UAV technology, AIMS Agric. Food, 3, 313-326 (2018)
[2] S, Embedded system for road damage detection by deep convolutional neural network, Math. Biosci. Eng., 16, 7982-7994 (2019)
[3] M, The pascal visual object classes (voc) challenge, Int. J. Comput. Vision, 88, 303-338 (2010)
[4] J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, L. Fei, ImageNet: A large-scale hierarchical image database, in 2009 IEEE Conference on Computer Vision and Pattern Recognition, (2009), 248-255.
[5] T. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, et al., Microsoft coco: Common objects in context, in European Conference on Computer Vision, Springer, (2014), 740-755.
[6] D. Du, Y. Qi, H. Yu, Y. Yang, K. Duan, G. Li, et al., The unmanned aerial vehicle benchmark: Object detection and tracking, in European Conference on Computer Vision, Springer, (2018), 375-391.
[7] P. Zhu, L. Wen, X. Bian, H. Ling, Q. Hu, Vision meets drones: A challenge, preprint, arXiv: 1804.07437.
[8] A. Robicquet, A. Sadeghian, A. Alahi, S. Savarese, Learning social etiquette: Human trajectory understanding in crowded scenes, in European Conference on Computer Vision, Springer, (2016), 549-565.
[9] M. Mueller, N. Smith, B. Ghanem, A benchmark and simulator for UAV tracking, in European Conference on Computer Vision, Springer, (2016), 445-461.
[10] M. Barekatain, M. Marti, H. Shih, S. Murray, K. Nakayama, Y. Matsuo, et al., Okutama-action: An aerial view video dataset for concurrent human action detection, in 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, (2017), 2153-2160.
[11] S. Oh, A. Hoogs, A. Perera, N. Cuntoor, C. Chen, J. T. Lee, et al., A large-scale benchmark dataset for event recognition in surveillance video, in 2011 IEEE Conference on Computer Vision and Pattern Recognition, (2011), 3153-3160.
[12] T. Shu, D. Xie, B. Rothrock, S. Todorovic, S. C. Zhu, Joint inference of groups, events and human roles in aerial videos, in 2015 IEEE Conference on Computer Vision and Pattern Recognition, (2015), 4576-4584.
[13] M. Bonetto, P. Korshunov, G. Ramponi, T. Ebrahimi, Privacy in mini-drone based video surveillance, in 2015 IEEE International Conference on Automatic Face Gesture Recognition, (2015), 1-6.
[14] M. Hsieh, Y. Lin, W. H. Hsu, Drone-based object counting by spatially regularized regional proposal network, in 2017 IEEE International Conference on Computer Vision, (2017), 4165-4173.
[15] F. Kamran, M. Shahzad, F. Shafait, Automated military vehicle detection from low-altitude aerial images, in 2018 Digital Image Computing: Techniques and Applications, (2018), 1-8.
[16] X. Xu, X. Zhang, B. Yu, X. S. Hu, C. Rowen, J. Hu, et al., DAC-SDC low power object detection challenge for UAV applications, preprint, arXiv: 1809.00110.
[17] C, Efficiently scaling up crowdsourced video annotation, Int. J. Comput. Vision, 101, 184-204 (2013)
[18] C. Gu, C. Sun, D. A. Ross, C. Vondrick, C. Pantofaru, Y. Li, et al., Ava: A video dataset of spatio-temporally localized atomic visual actions, in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2018), 6047-6056.
[19] M. Kisantal, Z. Wojna, J. Murawski, J. Naruniec, K. Cho, Augmentation for small object detection, in 9th International Conference on Advances in Computing and Information Technology, 2019.
[20] W. Liu, L. Cheng, D. Meng, Brain slices microscopic detection using simplified SSD with Cycle-GAN data augmentation, in International Conference on Neural Information Processing, Springer, (2018), 454-463.
[21] C, A survey on image data augmentation for deep learning, J. Big Data, 6, 1-48 (2019)
[22] K. Chatfield, K. Simonyan, A. Vedaldi, A. Zisserman, Return of the devil in the details: Delving deep into convolutional nets, preprint, arXiv: 1405.3531.
[23] R. Mash, B. Borghetti, J. Pecarina, Improved aircraft recognition for aerial refueling through data augmentation in convolutional neural networks, in International Symposium on Visual Computing, Springer, (2016), 113-122.
[24] L. Taylor, G. Nitschke, Improving deep learning using generic data augmentation, preprint, arXiv: 1708.06020.
[25] F. J. Morenobarea, F. Strazzera, J. M. Jerez, D. Urda, L. Franco, Forward noise adjustment scheme for data augmentation, in 2018 IEEE Symposium Series on Computational Intelligence, (2018), 728-734.
[26] L. Hu, The Quest for Machine Learning, 1st edition, Posts and Telecommunications Press, Beijing, 2018.
[27] N, SMOTE: synthetic minority over-sampling technique, J. Artif. Intell. Res., 16, 321-357 (2002) · Zbl 0994.68128
[28] H. Inoue, Data augmentation by pairing samples for images classification, preprint, arXiv: 1801.02929.
[29] H. Zhang, M. Cisse, Y. N. Dauphin, D. Lopez-Paz, Mixup: Beyond empirical risk minimization, preprint, arXiv: 1710.09412.
[30] I, Generative adversarial nets, Adv. Neural Inf. Process. Syst., 27, 2672-2680 (2014)
[31] U. Shaham, Y. Yamada, S. Negahban, Conditional generative adversarial nets, preprint, arXiv: 1411.1784.
[32] J. Zhu, T. Park, P. Isola, A. A. Efros, Unpaired image-to-image translation using cycle-consistent adversarial networks, in 2017 IEEE International Conference on Computer Vision, (2017), 2242-2251.
[33] T. Karras, S. Laine, T. Aila, A style-based generator architecture for generative adversarial networks, in 2019 IEEE Conference on Computer Vision and Pattern Recognition, (2019), 4401-4410.
[34] W. Jiang, N. Ying, Improve object detection by data enhancement based on generative adversarial nets, preprint, arXiv: 1903.01716.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.