×

A comprehensive and systematic look up into deep learning based object detection techniques: a review. (English) Zbl 1484.68261

Summary: Object detection can be regarded as one of the most fundamental and challenging visual recognition task in computer vision and it has received great attention over the past few decades. Object detection techniques find their application in almost all the spheres of life, most prominent ones being surveillance, autonomous driving, pedestrian detection and so on. The primary focus of visual object detection is to detect objects belonging to certain class targets with absolute localization in a realistic scene or an input image and also to assign each detected instance of an object a predefined class label. Owing to rapid development of deep neural networks, the performance of object detectors has rapidly improved and as a result of this deep learning based detection techniques have been actively studied over the past several years. In this paper we provide a comprehensive survey of latest advances in deep learning based visual object detection. Firstly we have reviewed a large body of recent works in literature and using that we have analyzed traditional and current object detectors. Afterwards and primarily we provide a rigorous overview of backbone architectures for object detection followed by a systematic cover up of current learning strategies. Some popular datasets and metrics used for object detection are analyzed as well. Finally we discuss applications of object detection and provide several future directions to facilitate future research for visual object detection with deep learning.

MSC:

68T45 Machine vision and scene understanding
68T07 Artificial neural networks and deep learning
68-02 Research exposition (monographs, survey articles) pertaining to computer science
PDF BibTeX XML Cite
Full Text: DOI

References:

[1] Jiao, Licheng; Zhang, Fan; Liu, Fang; Yang, Shuyuan; Li, Lingling; Feng, Zhixi; Qu, Rong, A survey of deep learning-based object detection, IEEE Access, 7, 128837-128868 (2019)
[2] Nvidia, Jensen Huang, Accelerating AI with GPUs: A new computing model (2020), Retrieved on June 20, 2020 at 11:45 am, from URL https://blogs.nvidia.com/blog/2016/01/12/accelerating-ai-artificial-intelligence-gpus/
[3] Wu, Xiongwei; Sahoo, Doyen; Hoi, Steven C. H., Recent advances in deep learning for object detection, Neurocomputing (2020)
[4] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770-778.
[5] Chen, Liang-Chieh; Papandreou, George; Kokkinos, Iasonas; Murphy, Kevin; Yuille, Alan L., Semantic image segmentation with deep convolutional nets and fully connected crfs (2014), arXiv preprint arXiv:1412.7062
[6] Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 580-587.
[7] Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick, Mask r-cnn, in: Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 2961-2969.
[8] Sun, Yi; Liang, Ding; Wang, Xiaogang; Tang, Xiaoou, Deepid3: Face recognition with very deep neural networks (2015), arXiv preprint arXiv:1502.00873
[9] Sun, Yi; Chen, Yuheng; Wang, Xiaogang; Tang, Xiaoou, Deep learning face representation by joint identification-verification, (Advances in Neural Information Processing Systems (2014)), 1988-1996
[10] Weiyang Liu, Yandong Wen, Zhiding Yu, Ming Li, Bhiksha Raj, Le Song, Sphereface: Deep hypersphere embedding for face recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 212-220.
[11] Li, Jianan; Liang, Xiaodan; Shen, ShengMei; Xu, Tingfa; Feng, Jiashi; Yan, Shuicheng, Scale-aware fast R-CNN for pedestrian detection, IEEE Trans. Multimedia, 20, 4, 985-996 (2017)
[12] Jan Hosang, Mohamed Omran, Rodrigo Benenson, Bernt Schiele, Taking a deeper look at pedestrians, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 4073-4082.
[13] Angelova, Anelia; Krizhevsky, Alex; Vanhoucke, Vincent; Ogale, Abhijit; Ferguson, Dave, Real-time pedestrian detection with deep network cascades (2015)
[14] Hoi, Steven CH; Wu, Xiongwei; Liu, Hantang; Wu, Yue; Wang, Huiqiong; Xue, Hui; Wu, Qiang, Logo-net: Large-scale deep logo detection and brand recognition with deep region-based convolutional networks (2015), arXiv preprint arXiv:1511.02462
[15] Su, Hang; Zhu, Xiatian; Gong, Shaogang, Deep learning logo detection with data expansion by synthesising context, (2017 IEEE Winter Conference on Applications of Computer Vision (WACV) (2017), IEEE), 530-539
[16] Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, Li Fei-Fei, Large-scale video classification with convolutional neural networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 1725-1732.
[17] Hossein Mobahi, Ronan Collobert, Jason Weston, Deep learning from temporal coherence in video, in: Proceedings of the 26th Annual International Conference on Machine Learning, 2009, pp. 737-744.
[18] Girshick, Ross B., Fast R-CNN (2015), CoRR arXiv:1504.08083
[19] Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi, You only look once: Unified, real-time object detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779-788.
[20] Ren, Shaoqing; He, Kaiming; Girshick, Ross; Sun, Jian, Faster r-cnn: Towards real-time object detection with region proposal networks, (Advances in Neural Information Processing Systems (2015)), 91-99
[21] Lowe, David G., Object recognition from local scale-invariant features, (Proceedings of the Seventh IEEE International Conference on Computer Vision, vol. 2 (1999), Ieee), 1150-1157
[22] Bay, Herbert; Tuytelaars, Tinne; Van Gool, Luc, Surf: Speeded up robust features, (European Conference on Computer Vision (2006), Springer), 404-417
[23] Lienhart, Rainer; Maydt, Jochen, An extended set of haar-like features for rapid object detection, (Proceedings. International Conference on Image Processing, vol. 1 (2002), IEEE), pp. I-I
[24] Vig, Eleonora; Dorr, Michael; Cox, David, Large-scale optimization of hierarchical features for saliency prediction in natural images, (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2014)), 2798-2805
[25] Dalal, Navneet; Triggs, Bill, Histograms of oriented gradients for human detection, (2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 1 (2005), IEEE), 886-893
[26] Xiang, Yu; Choi, Wongun; Lin, Yuanqing; Savarese, Silvio, Subcategory-aware convolutional neural networks for object proposals and detection, (2017 IEEE Winter Conference on Applications of Computer Vision (WACV) (2017), IEEE), 924-933
[27] Freund, Yoav; Schapire, Robert E., A desicion-theoretic generalization of on-line learning and an application to boosting, (European Conference on Computational Learning Theory (1995), Springer), 23-37
[28] Freund, Yoav; Schapire, Robert E., Experiments with a new boosting algorithm, (Icml, vol. 96 (1996), Citeseer), 148-156
[29] Opitz, David; Maclin, Richard, Popular ensemble methods: An empirical study, J. Artif. Intell. Res., 11, 169-198 (1999) · Zbl 0924.68159
[30] Felzenszwalb, P. F.; Girshick, R. B.; McAllester, D.; Ramanan, D., Object detection with discriminatively trained part-based models, IEEE Trans. Pattern Anal. Mach. Intell., 32, 1627-1645 (2010)
[31] Everingham, Mark; Van Gool, Luc; Williams, Christopher K. I.; Winn, John; Zisserman, Andrew, The PASCAL visual object classes challenge 2007 (VOC2007) results (2007)
[32] Everingham, Mark; Van Gool, Luc; Williams, Christopher KI; Winn, John; Zisserman, Andrew, The pascal visual object classes (voc) challenge, Int. J. Comput. Vis., 88, 2, 303-338 (2010)
[33] Lowe, David G., Distinctive image features from scale-invariant keypoints, Int. J. Comput. Vis., 60, 2, 91-110 (2004)
[34] Ojala, Timo; Pietikainen, Matti; Maenpaa, Topi, Multiresolution gray-scale and rotation invariant texture classification with local binary patterns, IEEE Trans. Pattern Anal. Mach. Intell., 24, 7, 971-987 (2002) · Zbl 0977.68853
[35] Krizhevsky, Alex; Sutskever, Ilya; Hinton, Geoffrey E., Imagenet classification with deep convolutional neural networks, (Advances in Neural Information Processing Systems (2012)), 1097-1105
[36] Cao, Guimei; Xie, Xuemei; Yang, Wenzhe; Liao, Quan; Shi, Guangming; Wu, Jinjian, Feature-fused SSD: Fast detection for small objects, (Ninth International Conference on Graphic and Image Processing (ICGIP 2017), vol. 10615 (2018), International Society for Optics and Photonics), 106151E
[37] Subarna Tripathi, Gokce Dane, Byeongkeun Kang, Vasudev Bhaskaran, Truong Nguyen, LCDet: Low-complexity fully-convolutional neural networks for object detection in embedded systems, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017, pp. 94-103.
[38] Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, Trevor Darrell, Caffe: Convolutional architecture for fast feature embedding, in: Proceedings of the 22nd ACM International Conference on Multimedia, 2014, pp. 675-678.
[39] Yang, Zhenheng; Nevatia, Ramakant, A multi-scale cascade fully convolutional network face detector, (2016 23rd International Conference on Pattern Recognition (ICPR) (2016), IEEE), 633-638
[40] Ngiam Jiquan, Aditya Khosla, Mingyu Kim, Juhan Nam, Honglak Lee, AY Ng, Multimodal deep learning, in: Proceedings of the 28th International Conference on Machine Learning (ICML-11), vol. 689696, 2011.
[41] Jiang, Yu-Gang; Wu, Zuxuan; Tang, Jinhui; Li, Zechao; Xue, Xiangyang; Chang, Shih-Fu, Modeling multimodal clues in a hybrid deep learning framework for video classification, IEEE Trans. Multimed., 20, 11, 3137-3147 (2018)
[42] Tomè, Denis; Monti, Federico; Baroffio, Luca; Bondi, Luca; Tagliasacchi, Marco; Tubaro, Stefano, Deep convolutional neural networks for pedestrian detection, Signal Process., Image Commun., 47, 482-489 (2016)
[43] Zhao, Zhong-Qiu; Bian, Haiman; Hu, Donghui; Cheng, Wenjuan; Glotin, Hervé, Pedestrian detection based on fast R-CNN and batch normalization, (ICIC 2017: Intelligent Computing Theories and Application (2017), Springer), 735-746
[44] Chen Zhang, Joohee Kim, Object detection with location-aware deformable convolution and backward attention filtering, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 9452-9461.
[45] Joseph Redmon, Ali Farhadi, YOLO9000: better, faster, stronger, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 7263-7271.
[46] Liu, Wei; Anguelov, Dragomir; Erhan, Dumitru; Szegedy, Christian; Reed, Scott; Fu, Cheng-Yang; Berg, Alexander C., Ssd: Single shot multibox detector, (European Conference on Computer Vision (2016), Springer), 21-37
[47] Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, Serge Belongie, Feature pyramid networks for object detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2117-2125.
[48] Yang, Ming-Hsuan; Kriegman, David J.; Ahuja, Narendra, Detecting faces in images: A survey, IEEE Trans. Pattern Anal. Mach. Intell., 24, 1, 34-58 (2002)
[49] Zafeiriou, Stefanos; Zhang, Cha; Zhang, Zhengyou, A survey on face detection in the wild: past, present and future, Comput. Vis. Image Underst., 138, 1-24 (2015)
[50] Ye, Qixiang; Doermann, David, Text detection and recognition in imagery: A survey, IEEE Trans. Pattern Anal. Mach. Intell., 37, 7, 1480-1500 (2014)
[51] Dollar, Piotr; Wojek, Christian; Schiele, Bernt; Perona, Pietro, Pedestrian detection: An evaluation of the state of the art, IEEE Trans. Pattern Anal. Mach. Intell., 34, 4, 743-761 (2011)
[52] Enzweiler, Markus; Gavrila, Dariu M., Monocular pedestrian detection: Survey and experiments, IEEE Trans. Pattern Anal. Mach. Intell., 31, 12, 2179-2195 (2008)
[53] Geronimo, David; Lopez, Antonio M.; Sappa, Angel D.; Graf, Thorsten, Survey of pedestrian detection for advanced driver assistance systems, IEEE Trans. Pattern Anal. Mach. Intell., 32, 7, 1239-1258 (2009)
[54] Sun, Zehang; Bebis, George; Miller, Ronald, On-road vehicle detection: A review, IEEE Trans. Pattern Anal. Mach. Intell., 28, 5, 694-711 (2006)
[55] Zhang, Xin; Yang, Yee-Hong; Han, Zhiguang; Wang, Hui; Gao, Chao, Object class detection: A survey, ACM Comput. Surv., 46, 1, 1-53 (2013)
[56] Jiao, Licheng; Zhang, Fan; Liu, Fang; Yang, Shuyuan; Li, Lingling; Feng, Zhixi; Qu, Rong, A survey of deep learning-based object detection, IEEE Access, 7, 128837-128868 (2019)
[57] Zhao, Zhong-Qiu; Zheng, Peng; Xu, Shou-tao; Wu, Xindong, Object detection with deep learning: A review, IEEE Trans. Neural Netw. Learn. Syst., 30, 11, 3212-3232 (2019)
[58] Sun, Zehang; Bebis, George; Miller, Ronald, On-road vehicle detection: A review, IEEE Trans. Pattern Anal. Mach. Intell., 28, 5, 694-711 (2006)
[59] Ponce, Jean; Hebert, Martial; Schmid, Cordelia; Zisserman, Andrew, Toward Category-Level Object Recognition, vol. 4170 (2007), Springer
[60] Dickinson, Sven J.; Leonardis, Aleš; Schiele, Bernt; Tarr, Michael J., Object Categorization: Computer and Human Vision Perspectives (2009), Cambridge University Press
[61] Galleguillos, Carolina; Belongie, Serge, Context based object categorization: A critical survey, Comput. Vis. Image Underst., 114, 6, 712-722 (2010)
[62] Grauman, Kristen; Leibe, Bastian, Visual object recognition, Synth. Lect. Artif. Intell. Mach. Learn., 5, 2, 1-181 (2011)
[63] Andreopoulos, Alexander; Tsotsos, John K., 50 years of object recognition: Directions forward, Comput. Vis. Image Underst., 117, 8, 827-891 (2013)
[64] Bengio, Yoshua; Courville, Aaron; Vincent, Pascal, Representation learning: A review and new perspectives, IEEE Trans. Pattern Anal. Mach. Intell., 35, 8, 1798-1828 (2013)
[65] Borji, Ali; Cheng, Ming-Ming; Hou, Qibin; Jiang, Huaizu; Li, Jia, Salient object detection: A survey, Comput. Vis. Media, 1-34 (2019)
[66] Li, Yali; Wang, Shengjin; Tian, Qi; Ding, Xiaoqing, Feature representation for statistical-learning-based object detection: A review, Pattern Recognit., 48, 11, 3542-3559 (2015)
[67] LeCun, Yann; Bengio, Yoshua; Hinton, Geoffrey, Deep learning, Nature, 521, 7553, 436-444 (2015)
[68] Litjens, Geert; Kooi, Thijs; Bejnordi, Babak Ehteshami; Setio, Arnaud Arindra Adiyoso; Ciompi, Francesco; Ghafoorian, Mohsen; Van Der Laak, Jeroen Awm; Van Ginneken, Bram; Sánchez, Clara I., A survey on deep learning in medical image analysis, Med. Image Anal., 42, 60-88 (2017)
[69] Gu, Jiuxiang; Wang, Zhenhua; Kuen, Jason; Ma, Lianyang; Shahroudy, Amir; Shuai, Bing; Liu, Ting; Wang, Xingxing; Wang, Gang; Cai, Jianfei, Recent advances in convolutional neural networks, Pattern Recognit., 77, 354-377 (2018)
[70] Zou, Zhengxia; Shi, Zhenwei; Guo, Yuhong; Ye, Jieping, Object detection in 20 years: A survey (2019), arXiv preprint arXiv:1905.05055
[71] Viola, Paul; Jones, Michael, Rapid object detection using a boosted cascade of simple features, (Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, vol. 1 (2001), IEEE), pp. I-I
[72] Viola, Paul; Jones, Michael J., Robust real-time face detection, Int. J. Comput. Vis., 57, 2, 137-154 (2004)
[73] Freund, Yoav; Schapire, Robert; Abe, Naoki, A short introduction to boosting, J. Japanese Soc. Artif. Intell., 14, 771-780, 1612 (1999)
[74] Felzenszwalb, Pedro; McAllester, David; Ramanan, Deva, A discriminatively trained, multiscale, deformable part model, (2008 IEEE Conference on Computer Vision and Pattern Recognition (2008), IEEE), 1-8
[75] Felzenszwalb, Pedro F.; Girshick, Ross B.; McAllester, David, Cascade object detection with deformable part models, (2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2010), IEEE), 2241-2248
[76] Malisiewicz, Tomasz; Gupta, Abhinav; Efros, Alexei A., Ensemble of exemplar-svms for object detection and beyond, (2011 International Conference on Computer Vision (2011), IEEE), 89-96
[77] Girshick, Ross B.; Felzenszwalb, Pedro F.; Mcallester, David A., Object detection with grammar models, (Advances in Neural Information Processing Systems (2011)), 442-450
[78] Girshick, Ross Brook, From Rigid Templates to Grammars: Object Detection with Structured Models (2012), Citeseer
[79] Andrews, Stuart; Tsochantaridis, Ioannis; Hofmann, Thomas, Support vector machines for multiple-instance learning, (Becker, S.; Thrun, S.; Obermayer, K., Advances in Neural Information Processing Systems 15 (2003), MIT Press), 577-584, http://papers.nips.cc/paper/2232-support-vector-machines-for-multiple-instance-learning.pdf
[80] Sermanet, Pierre; Eigen, David; Zhang, Xiang; Mathieu, Michaël; Fergus, Rob; LeCun, Yann, Overfeat: Integrated recognition, localization and detection using convolutional networks (2013), arXiv preprint arXiv:1312.6229
[81] Redmon, Joseph; Farhadi, Ali, Yolov3: An incremental improvement (2018), arXiv preprint arXiv:1804.02767
[82] Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Piotr Dollár, Focal loss for dense object detection, in: Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 2980-2988.
[83] Hei Law, Jia Deng, Cornernet: Detecting objects as paired keypoints, in: Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 734-750.
[84] Zhou, Xingyi; Wang, Dequan; Krähenbühl, Philipp, Objects as points (2019), arXiv preprint arXiv:1904.07850
[85] Duan, Kaiwen; Bai, Song; Xie, Lingxi; Qi, Honggang; Huang, Qingming; Tian, Qi, Centernet: Object detection with keypoint triplets (2019), arXiv preprint arXiv:1904.08189 1(2), 4
[86] Uijlings, Jasper R. R.; Van De Sande, Koen E. A.; Gevers, Theo; Smeulders, Arnold W. M., Selective search for object recognition, Int. J. Comput. Vis., 104, 2, 154-171 (2013)
[87] Kleban, Jim; Xie, Xing; Ma, Wei-Ying, Spatial pyramid mining for logo detection in natural scenes, (2008 IEEE International Conference on Multimedia and Expo (2008), IEEE), 1077-1080
[88] He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing; Sun, Jian, Spatial pyramid pooling in deep convolutional networks for visual recognition, IEEE Trans. Pattern Anal. Mach. Intell., 37, 9, 1904-1916 (2015)
[89] Zitnick, C. L.awrence; Dollar, Piotr, Edge boxes: Locating object proposals from edges, (European Conference on Computer Vision (2014), Springer), 391-405
[90] Dai, Jifeng; Li, Yi; He, Kaiming; Sun, Jian, R-fcn: Object detection via region-based fully convolutional networks, (Advances in Neural Information Processing Systems (2016)), 379-387
[91] Lin, Tsung-Yi; Maire, Michael; Belongie, Serge; Hays, James; Perona, Pietro; Ramanan, Deva; Dollár, Piotr; Zitnick, C. Lawrence, Microsoft coco: Common objects in context, (European Conference on Computer Vision (2014), Springer), 740-755
[92] Han Hu, Jiayuan Gu, Zheng Zhang, Jifeng Dai, Yichen Wei, Relation networks for object detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 3588-3597.
[93] Jifeng Dai, Haozhi Qi, Yuwen Xiong, Yi Li, Guodong Zhang, Han Hu, Yichen Wei, Deformable convolutional networks, in: Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 764-773.
[94] Golnaz Ghiasi, Tsung-Yi Lin, Quoc V. Le, Nas-fpn: Learning scalable feature pyramid architecture for object detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7036-7045.
[95] Alexe, Bogdan; Deselaers, Thomas; Ferrari, Vittorio, Measuring the objectness of image windows, IEEE Trans. Pattern Anal. Mach. Intell., 34, 11, 2189-2202 (2012)
[96] Rahtu, Esa; Kannala, Juho; Blaschko, Matthew, Learning a category independent object detection cascade, (2011 International Conference on Computer Vision (2011), IEEE), 1052-1059
[97] Santiago Manen, Matthieu Guillaumin, Luc Van Gool, Prime object proposals with randomized prim’s algorithm, in: Proceedings of the IEEE International Conference on Computer Vision, 2013, pp. 2536-2543.
[98] Carreira, Joao; Sminchisescu, Cristian, CPMC: Automatic object segmentation using constrained parametric min-cuts, IEEE Trans. Pattern Anal. Mach. Intell., 34, 7, 1312-1328 (2011)
[99] Endres, Ian; Hoiem, Derek, Category-independent object proposals with diverse ranking, IEEE Trans. Pattern Anal. Mach. Intell., 36, 2, 222-234 (2013)
[100] Zhu, Chenchen; Tao, Ran; Luu, Khoa; Savvides, Marios, Seeing small faces from robust anchor’s perspective, (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)), 5127-5136
[101] Lele Xie, Yuliang Liu, Lianwen Jin, Zecheng Xie, DeRPN: Taking a further step toward more general object detection, in: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, 2019, pp. 9046-9053.
[102] Lachlan Tychsen-Smith, Lars Petersson, Denet: Scalable real-time object detection with directed sparse sampling, in: Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 428-436.
[103] Chenchen Zhu, Yihui He, Marios Savvides, Feature selective anchor-free module for single-shot object detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 840-849.
[104] Kaiwen Duan, Song Bai, Lingxi Xie, Honggang Qi, Qingming Huang, Qi Tian, Centernet: Keypoint triplets for object detection, in: Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 6569-6578.
[105] Ross Girshick, Fast r-cnn, in: Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1440-1448.
[106] Bharat Singh, Larry S. Davis, An analysis of scale invariance in object detection snip, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 3578-3587.
[107] Cai, Zhaowei; Fan, Quanfu; Feris, Rogerio S.; Vasconcelos, Nuno, A unified multi-scale deep convolutional neural network for fast object detection, (European Conference on Computer Vision (2016), Springer), 354-370
[108] Zhiqiang Shen, Zhuang Liu, Jianguo Li, Yu-Gang Jiang, Yurong Chen, Xiangyang Xue, Dsod: Learning deeply supervised object detectors from scratch, in: Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 1919-1927.
[109] Songtao Liu, Di Huang, et al. Receptive field block net for accurate and fast object detection, in: Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 385-400.
[110] Jimmy Ren, Xiaohao Chen, Jianbo Liu, Wenxiu Sun, Jiahao Pang, Qiong Yan, Yu-Wing Tai, Li Xu, Accurate single stage detector using recurrent rolling convolution, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 5420-5428.
[111] Jeong, Jisoo; Park, Hyojin; Kwak, Nojun, Enhancement of SSD by concatenating feature maps for object detection (2017), arXiv preprint arXiv:1705.09587
[112] Fu, Cheng-Yang; Liu, Wei; Ranga, Ananth; Tyagi, Ambrish; Berg, Alexander C., Dssd: Deconvolutional single shot detector (2017), arXiv preprint arXiv:1701.06659
[113] Woo, Sanghyun; Hwang, Soonmin; Kweon, In So, Stairnet: Top-down semantic aggregation for accurate one shot detection, (2018 IEEE Winter Conference on Applications of Computer Vision (WACV) (2018), IEEE), 1093-1102
[114] Li, Hongyang; Liu, Yu; Ouyang, Wanli; Wang, Xiaogang, Zoom out-and-in network with recursive training for object proposal (2017), arXiv preprint arXiv:1702.05711
[115] Tao Kong, Fuchun Sun, Chuanqi Tan, Huaping Liu, Wenbing Huang, Deep feature pyramid reconfiguration for object detection, in: Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 169-185. · Zbl 07123033
[116] Qijie Zhao, Tao Sheng, Yongtao Wang, Zhi Tang, Ying Chen, Ling Cai, Haibin Ling, M2det: A single-shot object detector based on multi-level feature pyramid network, in: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, 2019, pp. 9259-9266.
[117] Galleguillos, Carolina; Belongie, Serge, Context based object categorization: A critical survey, Comput. Vis. Image Underst., 114, 6, 712-722 (2010)
[118] Felzenszwalb, Pedro; Girshick, Ross; McAllester, David; Ramanan, Deva, Discriminatively trained mixtures of deformable part models, PASCAL VOC Challenge (2008)
[119] Wanli Ouyang, Xiaogang Wang, Xingyu Zeng, Shi Qiu, Ping Luo, Yonglong Tian, Hongsheng Li, Shuo Yang, Zhe Wang, Chen-Change Loy, et al. Deepid-net: Deformable deep convolutional neural networks for object detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 2403-2412.
[120] Xizhou Zhu, Han Hu, Stephen Lin, Jifeng Dai, Deformable convnets v2: More deformable, better results, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 9308-9316.
[121] Ross Girshick, Forrest Iandola, Trevor Darrell, Jitendra Malik, Deformable part models are convolutional neural networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 437-446.
[122] Li, Zeming; Peng, Chao; Yu, Gang; Zhang, Xiangyu; Deng, Yangdong; Sun, Jian, Detnet: A backbone network for object detection (2018), arXiv preprint arXiv:1804.06215
[123] Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, Kaiming He, Aggregated residual transformations for deep neural networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1492-1500.
[124] Howard, Andrew G.; Zhu, Menglong; Chen, Bo; Kalenichenko, Dmitry; Wang, Weijun; Weyand, Tobias; Andreetto, Marco; Adam, Hartwig, Mobilenets: Efficient convolutional neural networks for mobile vision applications (2017), arXiv preprint arXiv:1704.04861
[125] Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, Jian Sun, Shufflenet: An extremely efficient convolutional neural network for mobile devices, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848-6856.
[126] François Chollet, Xception: Deep learning with depthwise separable convolutions, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251-1258.
[127] R. J. Wang, C. X. Ling, Pelee: A real-time object detection system on mobile devices, Adv. Neural Inf. Process. Syst., 1963-1972 (2018)
[128] Candès, Emmanuel J.; Li, Xiaodong; Ma, Yi; Wright, John, Robust principal component analysis?, J. ACM, 58, 3, 1-37 (2011) · Zbl 1327.62369
[129] Simonyan, Karen; Zisserman, Andrew, Very deep convolutional networks for large-scale image recognition (2014), arXiv preprint arXiv:1409.1556
[130] LeCun, Yann; Bottou, Léon; Bengio, Yoshua; Haffner, Patrick, Gradient-based learning applied to document recognition, Proc. IEEE, 86, 11, 2278-2324 (1998)
[131] Robbins, Herbert; Monro, Sutton, A stochastic approximation method, Ann. Math. Stat., 400-407 (1951) · Zbl 0054.05901
[132] Kingma, Diederik P.; Ba, Jimmy, Adam: A method for stochastic optimization (2014), arXiv preprint arXiv:1412.6980
[133] Vinod Nair, Geoffrey E. Hinton, Rectified linear units improve restricted boltzmann machines, in: Proceedings of the 27th International Conference on Machine Learning (ICML-10), 2010, pp. 807-814.
[134] He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing; Sun, Jian, Identity mappings in deep residual networks, (European Conference on Computer Vision (2016), Springer), 630-645
[135] Ioffe, Sergey; Szegedy, Christian, Batch normalization: Accelerating deep network training by reducing internal covariate shift (2015), arXiv preprint arXiv:1502.03167
[136] Gao Huang, Zhuang Liu, Laurens Van Der Maaten, Kilian Q Weinberger, Densely connected convolutional networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 4700-4708.
[137] Jiang, Hanliang; Gao, Fei; Xu, Xingxin; Huang, Fei; Zhu, Suguo, Attentive and ensemble 3D dual path networks for pulmonary nodules classification, Neurocomputing (2019)
[138] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich, Going deeper with convolutions, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1-9.
[139] Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, Alexander A Alemi, Inception-v4, inception-resnet and the impact of residual connections on learning, in: Thirty-First AAAI Conference on Artificial Intelligence, 2017.
[140] Newell, Alejandro; Yang, Kaiyu; Deng, Jia, Stacked hourglass networks for human pose estimation, (European Conference on Computer Vision (2016), Springer), 483-499
[141] Singh, Bharat; Najibi, Mahyar; Davis, Larry S., SNIPER: Efficient multi-scale training, (Advances in Neural Information Processing Systems (2018)), 9310-9320
[142] Jiayuan Gu, Han Hu, Liwei Wang, Yichen Wei, Jifeng Dai, Learning region features for object detection, in: Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 381-395.
[143] Spyros Gidaris, Nikos Komodakis, Locnet: Improving localization accuracy for object detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 789-798.
[144] Zagoruyko, Sergey; Lerer, Adam; Lin, Tsung-Yi; Pinheiro, Pedro O.; Gross, Sam; Chintala, Soumith; Dollár, Piotr, A multipath network for object detection (2016), arXiv preprint arXiv:1604.02135
[145] Lachlan Tychsen-Smith, Lars Petersson, Improving object localization with fitness nms and bounded iou loss, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 6877-6885.
[146] Xin Lu, Buyu Li, Yuxin Yue, Quanquan Li, Junjie Yan, Grid r-cnn, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 7363-7372.
[147] Bin Yang, Junjie Yan, Zhen Lei, Stan Z. Li, Craft objects from images, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 6043-6051.
[148] Bowen Cheng, Yunchao Wei, Honghui Shi, Rogerio Feris, Jinjun Xiong, Thomas Huang, Revisiting rcnn: On awakening the classification power of faster rcnn, in: Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 453-468.
[149] Ian J Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio, Generative adversarial networks, in: Annual Conference on Neural Information Processing Systems (NeurIPS), 2014, pp. 2672-2680.
[150] Jun-Yan Zhu, Taesung Park, Phillip Isola, Alexei A Efros, Unpaired image-to-image translation using cycle-consistent adversarial networks, in: Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 2223-2232.
[151] Radford, Alec; Metz, Luke; Chintala, Soumith, Unsupervised representation learning with deep convolutional generative adversarial networks (2015), arXiv preprint arXiv:1511.06434
[152] Brock, Andrew; Donahue, Jeff; Simonyan, Karen, Large scale gan training for high fidelity natural image synthesis (2018), arXiv preprint arXiv:1809.11096
[153] Jianan Li, Xiaodan Liang, Yunchao Wei, Tingfa Xu, Jiashi Feng, Shuicheng Yan, Perceptual generative adversarial networks for small object detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1222-1230.
[154] Xiaolong Wang, Abhinav Shrivastava, Abhinav Gupta, A-fast-rcnn: Hard positive generation via adversary for object detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2606-2615.
[155] Shen, Zhiqiang; Shi, Honghui; Yu, Jiahui; Phan, Hai; Feris, Rogerio; Cao, Liangliang; Liu, Ding; Wang, Xinchao; Huang, Thomas; Savvides, Marios, Improving object detection from scratch via gated feature reuse (2017), arXiv preprint arXiv:1712.00886
[156] Kaiming He, Ross Girshick, Piotr Dollár, Rethinking imagenet pre-training, in: Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 4918-4927.
[157] Hinton, Geoffrey; Vinyals, Oriol; Dean, Jeff, Distilling the knowledge in a neural network (2015), arXiv preprint arXiv:1503.02531
[158] Quanquan Li, Shengying Jin, Junjie Yan, Mimicking very efficient network for object detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 6356-6364.
[159] Navaneeth Bodla, Bharat Singh, Rama Chellappa, Larry S. Davis, Soft-NMS-improving object detection with one line of code, in: Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 5561-5569.
[160] Jonathan Huang, Vivek Rathod, Chen Sun, Menglong Zhu, Anoop Korattikara, Alireza Fathi, Ian Fischer, Zbigniew Wojna, Yang Song, Sergio Guadarrama, et al. Speed/accuracy trade-offs for modern convolutional object detectors, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 7310-7311.
[161] Li, Zeming; Peng, Chao; Yu, Gang; Zhang, Xiangyu; Deng, Yangdong; Sun, Jian, Light-head r-cnn: In defense of two-stage object detector (2017), arXiv preprint arXiv:1711.07264
[162] Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen, Mobilenetv2: Inverted residuals and linear bottlenecks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 4510-4520.
[163] Womg, Alexander; Shafiee, Mohammad Javad; Li, Francis; Chwyl, Brendan, Tiny SSD: A tiny single-shot detection deep convolutional neural network for real-time embedded object detection, (2018 15th Conference on Computer and Robot Vision, CRV (2018), IEEE), 95-101
[164] Li, Yuxi; Li, Jiuwei; Lin, Weiyao; Li, Jianguo, Tiny-dsod: Lightweight object detection for resource-restricted usages (2018), arXiv preprint arXiv:1807.11013
[165] Wenling Shang, Kihyuk Sohn, Diogo Almeida, Honglak Lee, Understanding and improving convolutional neural networks via concatenated rectified linear units, in: International Conference on Machine Learning, 2016, pp. 2217-2225.
[166] Kim, Yong-Deok; Park, Eunhyeok; Yoo, Sungjoo; Choi, Taelim; Yang, Lu; Shin, Dongjun, Compression of deep convolutional neural networks for fast and low power mobile applications (2015), arXiv preprint arXiv:1511.06530
[167] Yihui He, Xiangyu Zhang, Jian Sun, Channel pruning for accelerating very deep neural networks, in: Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 1389-1397.
[168] Gong, Yunchao; Liu, Liu; Yang, Ming; Bourdev, Lubomir, Compressing deep convolutional networks using vector quantization (2014), arXiv preprint arXiv:1412.6115
[169] Lin, Yujun; Han, Song; Mao, Huizi; Wang, Yu; Dally, William J., Deep gradient compression: Reducing the communication bandwidth for distributed training (2017), arXiv preprint arXiv:1712.01887
[170] Jiaxiang Wu, Cong Leng, Yuhang Wang, Qinghao Hu, Jian Cheng, Quantized convolutional neural networks for mobile devices, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 4820-4828.
[171] Han, Song; Mao, Huizi; Dally, William J., Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding (2015), arXiv preprint arXiv:1510.00149
[172] Han, Song; Pool, Jeff; Tran, John; Dally, William, Learning both weights and connections for efficient neural network, (Advances in Neural Information Processing Systems (2015)), 1135-1143
[173] Shifeng Zhang, Longyin Wen, Xiao Bian, Zhen Lei, Stan Z. Li, Single-shot refinement neural network for object detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 4203-4212.
[174] Russakovsky, Olga; Deng, Jia; Su, Hao; Krause, Jonathan; Satheesh, Sanjeev; Ma, Sean; Huang, Zhiheng; Karpathy, Andrej; Khosla, Aditya; Bernstein, Michael, Imagenet large scale visual recognition challenge, Int. J. Comput. Vis., 115, 3, 211-252 (2015)
[175] Saining Xie, Zhuowen Tu, Holistically-nested edge detection, in: Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1395-1403.
[176] Mahyar Najibi, Mohammad Rastegari, Larry S. Davis, G-cnn: an iterative grid based object detector, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2369-2377.
[177] Yuting Zhang, Kihyuk Sohn, Ruben Villegas, Gang Pan, Honglak Lee, Improving object detection with deep convolutional networks via bayesian optimization and structured prediction, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015,pp. 249-258.
[178] Fan Yang, Wongun Choi, Yuanqing Lin, Exploit all the layers: Fast and accurate cnn object detector with scale dependent pooling and cascaded rejection classifiers, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2129-2137.
[179] Brahmbhatt, Samarth; Christensen, Henrik I.; Hays, James, Stuffnet: Using ‘stuff’to improve object detection, (2017 IEEE Winter Conference on Applications of Computer Vision, WACV (2017), IEEE), 934-943
[180] Ren, Shaoqing; He, Kaiming; Girshick, Ross; Zhang, Xiangyu; Sun, Jian, Object detection networks on convolutional feature maps, IEEE Trans. Pattern Anal. Mach. Intell., 39, 7, 1476-1481 (2016)
[181] Spyros Gidaris, Nikos Komodakis, Object detection via a multi-region and semantic segmentation-aware cnn model, in: Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1134-1142.
[182] Tao Kong, Anbang Yao, Yurong Chen, Fuchun Sun, Hypernet: Towards accurate region proposal generation and joint object detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 845-853.
[183] Abhinav Shrivastava, Abhinav Gupta, Ross Girshick, Training region-based object detectors with online hard example mining, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016,pp. 761-769.
[184] Sean Bell, C. Lawrence Zitnick, Kavita Bala, Ross Girshick, Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2874-2883.
[185] Zou, Zhengxia; Shi, Zhenwei; Guo, Yuhong; Ye, Jieping, Object detection in 20 years: A survey (2019), arXiv preprint arXiv:1905.05055
[186] Peng Zhou, Bingbing Ni, Cong Geng, Jianguo Hu, Yi Xu, Scale-transferrable object detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 528-537.
[187] Buyu Li, Yu Liu, Xiaogang Wang, Gradient harmonized single-stage detector, in: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, 2019, pp. 8577-8584.
[188] Zhi Tian, Chunhua Shen, Hao Chen, Tong He, Fcos: Fully convolutional one-stage object detection, in: Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 9627-9636.
[189] Xingyi Zhou, Jiacheng Zhuo, Philipp Krahenbuhl, Bottom-up object detection by grouping extreme and center points, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 850-859.
[190] Yousong Zhu, Chaoyang Zhao, Jinqiao Wang, Xu Zhao, Yi Wu, Hanqing Lu, Couplenet: Coupling global structure with local parts for object detection, in: Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 4126-4134.
[191] Hongyu Xu, Xutao Lv, Xiaoyu Wang, Zhou Ren, Navaneeth Bodla, Rama Chellappa, Deep regionlets for object detection, in: Proceedings of the European Conference on Computer Vision, ECCV, 2018, pp. 798-814.
[192] Zhe Chen, Shaoli Huang, Dacheng Tao, Context refinement for object detection, in: Proceedings of the European Conference on Computer Vision, ECCV, 2018, pp. 71-86.
[193] Zhaowei Cai, Nuno Vasconcelos, Cascade r-cnn: Delving into high quality object detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 6154-6162.
[194] Yanghao Li, Yuntao Chen, Naiyan Wang, Zhaoxiang Zhang, Scale-aware trident networks for object detection, in: Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 6054-6063.
[195] Deng, Jia; Dong, Wei; Socher, Richard; Li, Li-Jia; Li, Kai; Fei-Fei, Li, Imagenet: A large-scale hierarchical image database, (2009 IEEE Conference on Computer Vision and Pattern Recognition (2009), IEEE), 248-255
[196] Kuznetsova, Alina; Rom, Hassan; Alldrin, Neil; Uijlings, Jasper; Krasin, Ivan; Pont-Tuset, Jordi; Kamali, Shahab; Popov, Stefan; Malloci, Matteo; Duerig, Tom, The open images dataset v4: Unified image classification, object detection, and visual relationship detection at scale (2018), arXiv preprint arXiv:1811.00982
[197] Zhu, Pengfei; Wen, Longyin; Bian, Xiao; Ling, Haibin; Hu, Qinghua, Vision meets drones: A challenge (2018), arXiv preprint arXiv:1804.07437
[198] Agrim Gupta, Piotr Dollar, Ross Girshick, LVIS: A dataset for large vocabulary instance segmentation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 5356-5364.
[199] Ess, Andreas; Leibe, Bastian; Van Gool, Luc, Depth and appearance for mobile scene analysis, (2007 IEEE 11th International Conference on Computer Vision (2007), IEEE), 1-8
[200] Geiger, Andreas; Lenz, Philip; Stiller, Christoph; Urtasun, Raquel, Vision meets robotics: The kitti dataset, Int. J. Robot. Res., 32, 11, 1231-1237 (2013)
[201] Wojek, Christian; Walk, Stefan; Schiele, Bernt, Multi-cue onboard pedestrian detection, (2009 IEEE Conference on Computer Vision and Pattern Recognition (2009), IEEE), 794-801
[202] Geiger, Andreas; Lenz, Philip; Urtasun, Raquel, Are we ready for autonomous driving? The kitti vision benchmark suite, (2012 IEEE Conference on Computer Vision and Pattern Recognition (2012), IEEE), 3354-3361
[203] Shanshan Zhang, Rodrigo Benenson, Bernt Schiele, Citypersons: A diverse dataset for pedestrian detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 3213-3221.
[204] Li, Xiaofei; Flohr, Fabian; Yang, Yue; Xiong, Hui; Braun, Markus; Pan, Shuyue; Li, Keqiang; Gavrila, Dariu M., A new benchmark for vision-based cyclist detection, (2016 IEEE Intelligent Vehicles Symposium, IV (2016), IEEE), 1028-1033
[205] Jain, Vidit; Learned-Miller, Erik, Fddb: A Benchmark for Face Detection in Unconstrained SettingsTechnical Report (2010), UMass Amherst Technical Report
[206] Shuo Yang, Ping Luo, Chen-Change Loy, Xiaoou Tang, Wider face: A face detection benchmark, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 5525-5533.
[207] He, Ran; Wu, Xiang; Sun, Zhenan; Tan, Tieniu, Wasserstein cnn: Learning invariant features for nir-vis face recognition, IEEE Trans. Pattern Anal. Mach. Intell., 41, 7, 1761-1773 (2018)
[208] Xiao Zhang, Rui Zhao, Yu Qiao, Xiaogang Wang, Hongsheng Li, Adacos: Adaptively scaling cosine logits for effectively learning deep face representations, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 10823-10832.
[209] Liu, Yu; Li, Hongyang; Wang, Xiaogang, Rethinking feature discrimination and polymerization for large-scale recognition (2017), arXiv preprint arXiv:1710.00870
[210] Ranjan, Rajeev; Castillo, Carlos D.; Chellappa, Rama, L2-constrained softmax loss for discriminative face verification (2017), arXiv preprint arXiv:1703.09507
[211] Feng Wang, Xiang Xiang, Jian Cheng, Alan Loddon Yuille, Normface: L2 hypersphere embedding for face verification, in: Proceedings of the 25th ACM International Conference on Multimedia, 2017, pp. 1041-1049.
[212] Guo, Yuwei; Jiao, Licheng; Wang, Shuang; Wang, Shuo; Liu, Fang, Fuzzy sparse autoencoder framework for single image per person face recognition, IEEE Trans. Cybern., 48, 8, 2402-2415 (2017)
[213] Braun, Markus; Krebs, Sebastian; Flohr, Fabian; Gavrila, Dariu M., Eurocity persons: A novel benchmark for person detection in traffic scenes, IEEE Trans. Pattern Anal. Mach. Intell., 41, 8, 1844-1861 (2019)
[214] Cai, Zhaowei; Saberian, Mohammad Javad; Vasconcelos, Nuno, Learning complexity-aware cascades for pedestrian detection, IEEE Trans. Pattern Anal. Mach. Intell. (2019)
[215] Saberian, Mohammad Javad; Vasconcelos, Nuno, Learning optimal embedded cascades, IEEE Trans. Pattern Anal. Mach. Intell., 34, 10, 2005-2018 (2012)
[216] Dollár, Piotr; Appel, Ron; Belongie, Serge; Perona, Pietro, Fast feature pyramids for object detection, IEEE Trans. Pattern Anal. Mach. Intell., 36, 8, 1532-1545 (2014)
[217] Liu, Song; Yamada, Makoto; Collier, Nigel; Sugiyama, Masashi, Change-point detection in time-series data by relative density-ratio estimation, Neural Netw., 43, 72-83 (2013) · Zbl 1367.62259
[218] Senin, Pavel; Lin, Jessica; Wang, Xing; Oates, Tim; Gandhi, Sunil; Boedihardjo, Arnold P.; Chen, Crystal; Frankenstein, Susan, Grammarviz 3.0: Interactive discovery of variable-length time series patterns, ACM Trans. Knowl. Dis. Data (TKDD), 12, 1, 1-28 (2018)
[219] Jiang, Meng; Beutel, Alex; Cui, Peng; Hooi, Bryan; Yang, Shiqiang; Faloutsos, Christos, A general suspiciousness metric for dense blocks in multimodal data, (2015 IEEE International Conference on Data Mining (2015), IEEE), 781-786
[220] Wu, Elizabeth; Liu, Wei; Chawla, Sanjay, Spatio-temporal outlier detection in precipitation data, (International Workshop on Knowledge Discovery from Sensor Data (2008), Springer), 115-133
[221] Barz, Björn; Rodner, Erik; Garcia, Yanira Guanche; Denzler, Joachim, Detecting regions of maximal divergence for spatio-temporal anomaly detection, IEEE Trans. Pattern Anal. Mach. Intell., 41, 5, 1088-1101 (2018)
[222] Cheng, Gong; Han, Junwei, A survey on object detection in optical remote sensing images, ISPRS J. Photogramm. Remote Sens., 117, 11-28 (2016)
[223] Shivakumara, Palaiahnakote; Tang, Dongqi; Asadzadehkaljahi, Maryam; Lu, Tong; Pal, Umapada; Anisi, Mohammad Hossein, CNN-RNN based method for license plate recognition, CAAI Trans. Intell. Technol., 3, 3, 169-175 (2018)
[224] Sarfraz, Muhammad; Ahmed, Mohammed Jameel, An approach to license plate recognition system using neural network, (Exploring Critical Approaches of Evolutionary Computation (2019), IGI Global), 20-36
[225] Li, Hui; Wang, Peng; Shen, Chunhua, Toward end-to-end car license plate detection and recognition with deep neural networks, IEEE Trans. Intell. Transp. Syst., 20, 3, 1126-1136 (2018)
[226] Qian, Jinxing; Qu, Bo, Fast license plate recognition method based on competitive neural network, (2018 3rd International Conference on Communications, Information Management and Network Security, CIMNS 2018 (2018), Atlantis Press)
[227] Laroca, Rayson; Severo, Evair; Zanlorensi, Luiz A.; Oliveira, Luiz S.; Gonçalves, Gabriel Resende; Schwartz, William Robson; Menotti, David, A robust real-time automatic license plate recognition based on the YOLO detector, (2018 International Joint Conference on Neural Networks, IJCNN (2018), IEEE), 1-10
[228] Xibin Song, Peng Wang, Dingfu Zhou, Rui Zhu, Chenye Guan, Yuchao Dai, Hao Su, Hongdong Li, Ruigang Yang, Apollocar3d: A large 3d car instance understanding benchmark for autonomous driving, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 5452-5462.
[229] Banerjee, Koyel; Notz, Dominik; Windelen, Johannes; Gavarraju, Sumanth; He, Mingkang, Online camera lidar fusion and object detection on hybrid data for autonomous driving, (2018 IEEE Intelligent Vehicles Symposium, IV (2018), IEEE), 1632-1638
[230] Li, Jia; Wang, Zengfu, Real-time traffic sign recognition based on efficient CNNs in the wild, IEEE Trans. Intell. Transp. Syst., 20, 3, 975-984 (2018)
[231] T. Arinaga T. Moritani, Traffic sign recognition system, US Patent 9,865,165, 2018.
[232] Khalid, Sara; Muhammad, Nazeer; Sharif, Muhammad, Automatic measurement of the traffic sign with digital segmentation and recognition, IET Intell. Transp. Syst., 13, 2, 269-279 (2018)
[233] lvarez Garcıa, A.; Arcos-Garcıa, J. A.A.; Soria-Morillo, L. M., Deep neural network for traffic sign recognition systems: An analysis of spatial transformers and stochastic optimisation methods, Neural Netw., 99, 158-165 (2018)
[234] Li, Dong; Zhao, Dongbin; Chen, Yaran; Zhang, Qichao, Deepsign: Deep learning based traffic sign recognition, (2018 International Joint Conference on Neural Networks, IJCNN (2018), IEEE), 1-6
[235] Wu, Bo-Xun; Wang, Pin-Yu; Yang, Yi-Ta; Guo, Jiun-In, Traffic sign recognition with light convolutional networks, (2018 IEEE International Conference on Consumer Electronics-Taiwan, ICCE-TW (2018), IEEE), 1-2
[236] Zhou, Shuren; Liang, Wenlong; Li, Junguo; Kim, Jeong-Uk, Improved VGG model for road traffic sign recognition, Comput. Mater. Continua, 57, 1, 11-24 (2018)
[237] Chen, Xianjie; Yuille, Alan L., Articulated pose estimation by a graphical model with image dependent pairwise relations, (Advances in Neural Information Processing Systems (2014)), 1736-1744
[238] Xiaochuan Fan, Kang Zheng, Yuewei Lin, Song Wang, Combining local appearance and holistic view: Dual-source deep neural networks for human pose estimation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1347-1355.
[239] Rogez, Gregory; Weinzaepfel, Philippe; Schmid, Cordelia, Lcr-net++: Multi-person 2d and 3d pose detection in natural images, IEEE Trans. Pattern Anal. Mach. Intell., 42, 5, 1146-1161 (2019)
[240] Yilun Chen, Zhicheng Wang, Yuxiang Peng, Zhiqiang Zhang, Gang Yu, Jian Sun, Cascaded pyramid network for multi-person pose estimation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 7103-7112.
[241] George Papandreou, Tyler Zhu, Nori Kanazawa, Alexander Toshev, Jonathan Tompson, Chris Bregler, Kevin Murphy, Towards accurate multi-person pose estimation in the wild, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 4903-4911.
[242] Bin Xiao, Haiping Wu, Yichen Wei, Simple baselines for human pose estimation and tracking, in: Proceedings of the European Conference on Computer Vision, ECCV, 2018, pp. 466-481.
[243] Shih-En Wei, Varun Ramakrishna, Takeo Kanade, Yaser Sheikh, Convolutional pose machines, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 4724-4732.
[244] Li, Zhuoling; Dong, Minghui; Wen, Shiping; Hu, Xiang; Zhou, Pan; Zeng, Zhigang, CLU-CNNs: Object detection for medical images, Neurocomputing, 350, 53-59 (2019)
[245] Yang, Zhenguo; Li, Qing; Wenyin, Liu; Lv, Jianming, Shared multi-view data representation for multi-domain event detection, IEEE Trans. Pattern Anal. Mach. Intell. (2019)
[246] Yanxiang Wang, Hari Sundaram, Lexing Xie, Social event detection with interaction graph modeling, in: Proceedings of the 20th ACM International Conference on Multimedia, 2012, pp. 865-868.
[247] Manos Schinas, Symeon Papadopoulos, Georgios Petkos, Yiannis Kompatsiaris, Pericles A. Mitkas, Multimodal graph-based event detection and summarization in social media streams, in: Proceedings of the 23rd ACM International Conference on Multimedia, 2015, pp. 189-192.
[248] Teboul, Olivier; Kokkinos, Iasonas; Simon, Loic; Koutsourakis, Panagiotis; Paragios, Nikos, Shape grammar parsing via reinforcement learning, (CVPR 2011 (2011), IEEE), 2273-2280
[249] Zhao, Peng; Fang, Tian; Xiao, Jianxiong; Zhang, Honghui; Zhao, Qinping; Quan, Long, Rectilinear parsing of architecture in urban environment, (2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2010), IEEE), 342-349
[250] Friedman, Sam; Stamos, Ioannis, Online detection of repeated structures in point clouds of urban scenes for compression and registration, Int. J. Comput. Vis., 102, 1-3, 112-128 (2013)
[251] Shen, Chao-Hui; Huang, Shi-Sheng; Fu, Hongbo; Hu, Shi-Min, Adaptive partitioning of urban facades, ACM Trans. Graph., 30, 6, 1-10 (2011)
[252] Schindler, Grant; Krishnamurthy, Panchapagesan; Lublinerman, Roberto; Liu, Yanxi; Dellaert, Frank, Detecting and matching repeated patterns for automatic geo-tagging in urban environments, (2008 IEEE Conference on Computer Vision and Pattern Recognition (2008), IEEE), 1-7
[253] Wu, Changchang; Frahm, Jan-Michael; Pollefeys, Marc, Detecting large repetitive structures with salient boundaries, (European Conference on Computer Vision (2010), Springer), 142-155
[254] Shen, Chao-Hui; Huang, Shi-Sheng; Fu, Hongbo; Hu, Shi-Min, Image-based procedural modeling of facades, ACM Trans. Graph., 26, 85-95 (2007)
[255] Barinova, Olga; Lempitsky, Victor; Tretiak, Elena; Kohli, Pushmeet, Geometric image parsing in man-made environments, (European Conference on Computer Vision (2010), Springer), 57-70
[256] Mateusz Kozinski, Raghudeep Gadde, Sergey Zagoruyko, Guillaume Obozinski, Renaud Marlet, A MRF shape prior for facade parsing with occlusions, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 2820-2828.
[257] Andrea Cohen, Alexander G. Schwing, Marc Pollefeys, Efficient structured parsing of facades using dynamic programming, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 3206-3213.
[258] Gandy, Silvia; Recht, Benjamin; Yamada, Isao, Tensor completion and low-n-rank tensor recovery via convex optimization, Inverse Problems, 27, 2, Article 025010 pp. (2011) · Zbl 1211.15036
[259] Liu, Ji; Musialski, Przemyslaw; Wonka, Peter; Ye, Jieping, Tensor completion for estimating missing values in visual data, IEEE Trans. Pattern Anal. Mach. Intell., 35, 1, 208-220 (2012)
[260] Liu, Juan; Psarakis, Emmanouil Z.; Feng, Yang; Stamos, Ioannis, A kronecker product model for repeated pattern detection on 2d urban images, IEEE Trans. Pattern Anal. Mach. Intell., 41, 9, 2266-2272 (2018)
[261] Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan, Show and tell: A neural image caption generator, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 3156-3164.
[262] Jiuxiang Gu, Jianfei Cai, Gang Wang, Tsuhan Chen, Stack-captioning: Coarse-to-fine learning for image captioning, in: Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
[263] Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, Yoshua Bengio, Show, attend and tell: Neural image caption generation with visual attention, in: International Conference on Machine Learning, 2015, pp. 2048-2057.
[264] Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, Lei Zhang, Bottom-up and top-down attention for image captioning and visual question answering, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 6077-6086.
[265] A. Deshpande, J. Aneja, A.G. Schwing, Convolutional image captioning, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 5561—5570.
[266] Zhe Wu, Li Su, Qingming Huang, Cascaded partial decoder for fast and accurate salient object detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 3907-3916.
[267] Wang, Wenguan; Shen, Jianbing; Dong, Xingping; Borji, Ali; Yang, Ruigang, Inferring salient objects from human fixations, IEEE Trans. Pattern Anal. Mach. Intell. (2019)
[268] Wang, Linzhao; Wang, Lijun; Lu, Huchuan; Zhang, Pingping; Ruan, Xiang, Salient object detection with recurrent fully convolutional networks, IEEE Trans. Pattern Anal. Mach. Intell., 41, 7, 1734-1746 (2018)
[269] Mengyang Feng, Huchuan Lu, Errui Ding, Attentive feedback network for boundary-aware salient object detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 1623-1632.
[270] Zheng Zhang, Chengquan Zhang, Wei Shen, Cong Yao, Wenyu Liu, Xiang Bai, Multi-oriented text detection with fully convolutional networks, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 4159-4167.
[271] Yao, Cong; Bai, Xiang; Sang, Nong; Zhou, Xinyu; Zhou, Shuchang; Cao, Zhimin, Scene text detection via holistic, multi-channel prediction (2016), arXiv preprint arXiv:1606.09002
[272] He, Tong; Huang, Weilin; Qiao, Yu; Yao, Jian, Accurate text localization in natural image with cascaded convolutional text network (2016), arXiv preprint arXiv:1603.09423 · Zbl 1408.94242
[273] Pengyuan Lyu, Cong Yao, Wenhao Wu, Shuicheng Yan, Xiang Bai, Multi-oriented scene text detection via corner localization and region segmentation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 7553-7563.
[274] Ma, Jianqi; Shao, Weiyuan; Ye, Hao; Wang, Li; Wang, Hong; Zheng, Yingbin; Xue, Xiangyang, Arbitrary-oriented scene text detection via rotation proposals, IEEE Trans. Multimed., 20, 11, 3111-3122 (2018)
[275] Engelcke, Martin; Rao, Dushyant; Wang, Dominic Zeng; Tong, Chi Hay; Posner, Ingmar, Vote3deep: Fast object detection in 3d point clouds using efficient convolutional neural networks, (2017 IEEE International Conference on Robotics and Automation, ICRA (2017), IEEE), 1355-1361
[276] Charles R. Qi, Hao Su, Kaichun Mo, Leonidas J. Guibas, Pointnet: Deep learning on point sets for 3d classification and segmentation, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 652-660.
[277] Yin Zhou, Oncel Tuzel, Voxelnet: End-to-end learning for point cloud based 3d object detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 4490-4499.
[278] Zhe Cao, Tomas Simon, Shih-En Wei, Yaser Sheikh, Realtime multi-person 2d pose estimation using part affinity fields, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 7291-7299.
[279] Bulat, Adrian; Tzimiropoulos, Georgios, Human pose estimation via convolutional part heatmap regression, (European Conference on Computer Vision (2016), Springer), 717-732
[280] Newell, Alejandro; Yang, Kaiyu; Deng, Jia, Stacked hourglass networks for human pose estimation, (European Conference on Computer Vision (2016), Springer), 483-499
[281] Li, Wenbo; Wang, Zhicheng; Yin, Binyi; Peng, Qixiang; Du, Yuming; Xiao, Tianzi; Yu, Gang; Lu, Hongtao; Wei, Yichen; Sun, Jian, Rethinking on multi-stage networks for human pose estimation (2019), arXiv preprint arXiv:1901.00148
[282] Jonathan Krause, Michael Stark, Jia Deng, Li Fei-Fei, 3d object representations for fine-grained categorization, in: Proceedings of the IEEE International Conference on Computer Vision Workshops, 2013, pp. 554-561.
[283] Tsung-Yu Lin, Aruni RoyChowdhury, Subhransu Maji, Bilinear cnn models for fine-grained visual recognition, in: Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1449-1457.
[284] Xiangteng He, Yuxin Peng, Junjie Zhao, Fine-grained discriminative localization via saliency-guided faster R-CNN, in: Proceedings of the 25th ACM International Conference on Multimedia, 2017, pp. 627-635.
[285] He, Xiangteng; Peng, Yuxin; Zhao, Junjie, Fast fine-grained image classification via weakly supervised discriminative localization, IEEE Trans. Circuits Syst. Video Technol., 29, 5, 1394-1407 (2018)
[286] Jianzhong He, Shiliang Zhang, Ming Yang, Yanhu Shan, Tiejun Huang, Bi-directional cascade network for perceptual edge detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 3828-3837.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.