×

zbMATH — the first resource for mathematics

SP-Flow: self-supervised optical flow correspondence point prediction for real-time SLAM. (English) Zbl 07257411
Summary: This paper presents a self-supervised learning network called SP-Flow to generate keypoints in real-time for SLAM systems. Optical flows are employed to match the keypoints between two successive frames in the training process of SP-Flow. This approach enables the network to use datasets without manual annotations. To show the efficacy of our SP-Flow, we built an SP-Flow SLAM system by replacing ORB with SP-Flow in the ORB-SLAM2 system. The experimental results demonstrate that our SLAM system is able to achieve real-time performance and high accuracy with stereo or RGB-D images as inputs.
MSC:
68T07 Artificial neural networks and deep learning
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Alletto, S.; Abati, D.; Calderara, S.; Cucchiara, R.; Rigazio, L., Self-supervised optical flow estimation by projective bootstrap, IEEE Trans. Intell. Transp. Syst., 20, 3294-3302 (2018)
[2] Bouguet, J. Y., Pyramidal implementation of the affine Lucas Kanade feature tracker description of the algorithm, Intel Corp. Microprocess. Res. Labs, 5, 4 (2001)
[3] Calonder, M.; Lepetit, V.; Strecha, C.; Fua, P., Brief: binary robust independent elementary features, (Proceedings of the European Conference on Computer Vision (2010)), 778-792
[4] Choy, C. B.; Gwak, J.; Savarese, S.; Chandraker, M., Universal correspondence network, (Advances in Neural Information Processing Systems (2016)), 2414-2422
[5] Concha, A.; Civera, J., Rgbdtam: a cost-effective and accurate rgb-d tracking and mapping system, (2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (2017)), 6756-6763
[6] DeTone, D.; Malisiewicz, T.; Rabinovich, A., Self-improving visual odometry (2018)
[7] DeTone, D.; Malisiewicz, T.; Rabinovich, A., Superpoint: self-supervised interest point detection and description, (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (2018)), 224-236
[8] Engel, J.; Koltun, V.; Cremers, D., Direct sparse odometry, IEEE Trans. Pattern Anal. Mach. Intell., 40, 611-625 (2018)
[9] Engel, J.; Schöps, T.; Cremers, D., Lsd-slam: large-scale direct monocular slam, (Proceedings of the European Conference on Computer Vision (2014)), 834-849
[10] Engel, J.; Stückler, J.; Cremers, D., Large-scale direct slam with stereo cameras, (2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (2015)), 1935-1942
[11] Farnebäck, G., Two-frame motion estimation based on polynomial expansion, (Scandinavian Conference on Image Analysis (2003)), 363-370 · Zbl 1040.68627
[12] Gao, X.; Zhang, T.; Liu, Y.; Yan, Q. R., Fourteen Lessons of Visual SLAM from Theory to Application (2017)
[13] Geiger, A.; Lenz, P.; Stiller, C.; Urtasun, R., Vision meets robotics: the kitti dataset, Int. J. Robot. Res., 32, 1231-1237 (2013)
[14] He, K.; Zhang, X.; Ren, S.; Sun, J., Deep residual learning for image recognition, (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)), 770-778
[15] Kerl, C.; Sturm, J.; Cremers, D., Dense visual slam for rgb-d cameras, (2013 IEEE/RSJ International Conference on Intelligent Robots and Systems (2013)), 2100-2106
[16] Kingma, D. P.; Ba, J., Adam: a method for stochastic optimization (2014)
[17] Lieb, D.; Lookingbill, A.; Thrun, S., Adaptive road following using self-supervised learning and reverse optical flow, (Robotics: Science and Systems (2005)), 273-280
[18] Liu, P.; Lyu, M.; King, I.; Xu, J., Selflow: self-supervised learning of optical flow, (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019)), 4571-4580
[19] Long, J.; Shelhamer, E.; Darrell, T., Fully convolutional networks for semantic segmentation, (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)), 3431-3440
[20] Lowe, D. G., Distinctive image features from scale-invariant keypoints, Int. J. Comput. Vis., 60, 91-110 (2004)
[21] Lucas, B. D.; Kanade, T., An iterative image registration technique with an application to stereo vision, Ann. Phys. (1981)
[22] Mahendran, A.; Thewlis, J.; Vedaldi, A., Cross pixel optical-flow similarity for self-supervised learning, (Asian Conference on Computer Vision (2018)), 99-116
[23] Meister, S.; Hur, J.; Roth, S., Unflow: unsupervised learning of optical flow with a bidirectional census loss, (Thirty-Second AAAI Conference on Artificial Intelligence (2018))
[24] Menze, M.; Geiger, A., Object scene flow for autonomous vehicles, (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)), 3061-3070
[25] Mikolajczyk, K.; Schmid, C., A performance evaluation of local descriptors, IEEE Trans. Pattern Anal. Mach. Intell., 27, 1615-1630 (2005)
[26] Mur-Artal, R.; Tardós, J. D., Orb-slam2: an open-source slam system for monocular, stereo, and rgb-d cameras, IEEE Trans. Robot., 33, 1255-1262 (2017)
[27] Ono, Y.; Trulls, E.; Fua, P.; Yi, K. M., Lf-net: learning local features from images, (Advances in Neural Information Processing Systems (2018)), 6234-6244
[28] Ren, Z.; Yan, J.; Ni, B.; Liu, B.; Yang, X.; Zha, H., Unsupervised deep learning for optical flow estimation, (Thirty-First AAAI Conference on Artificial Intelligence (2017))
[29] Rosten, E.; Drummond, T., Machine learning for high-speed corner detection, (Proceedings of the European Conference on Computer Vision (2006)), 430-443
[30] Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G. R., Orb: an efficient alternative to sift or surf, (Proceedings of the IEEE International Conference on Computer Vision, vol. 11 (2011)), 2
[31] Schmid, C.; Mohr, R.; Bauckhage, C., Evaluation of interest point detectors, Int. J. Comput. Vis., 37, 151-172 (2000) · Zbl 0985.68625
[32] Schroff, F.; Kalenichenko, D.; Philbin, J., Facenet: a unified embedding for face recognition and clustering, (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)), 815-823
[33] Shi, W.; Caballero, J.; Huszár, F.; Totz, J.; Aitken, A. P.; Bishop, R.; Rueckert, D.; Wang, Z., Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network, (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)), 1874-1883
[34] Simo-Serra, E.; Trulls, E.; Ferraz, L.; Kokkinos, I.; Fua, P.; Moreno-Noguer, F., Discriminative learning of deep convolutional feature point descriptors, (Proceedings of the IEEE International Conference on Computer Vision (2015)), 118-126
[35] Sturm, J.; Engelhard, N.; Endres, F.; Burgard, W.; Cremers, D., A benchmark for the evaluation of rgb-d slam systems, (2012 IEEE/RSJ International Conference on Intelligent Robots and Systems (2012)), 573-580
[36] Tang, J.; Ericson, L.; Folkesson, J.; Jensfelt, P., Gcnv2: efficient correspondence prediction for real-time slam, IEEE Robot. Autom. Lett., 3505-3512 (2019)
[37] Tang, J.; Folkesson, J.; Jensfelt, P., Geometric correspondence network for camera motion estimation, IEEE Robot. Autom. Lett., 3, 1010-1017 (2018)
[38] Tateno, K.; Tombari, F.; Laina, I.; Navab, N., Cnn-slam: real-time dense monocular slam with learned depth prediction, (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)), 6243-6252
[39] Verdie, Y.; Yi, K.; Fua, P.; Lepetit, V., Tilde: a temporally invariant learned detector, (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)), 5279-5288
[40] Whelan, T.; Salas-Moreno, R. F.; Glocker, B.; Davison, A. J.; Leutenegger, S., Elasticfusion: real-time dense slam and light source estimation, Int. J. Robot. Res., 35, 1697-1716 (2016)
[41] Yang, Y.; Loquercio, A.; Scaramuzza, D.; Soatto, S., Unsupervised moving object detection via contextual information separation, (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019)), 879-888
[42] Yang, N.; Wang, R.; Stuckler, J.; Cremers, D., Deep virtual stereo odometry: leveraging deep depth prediction for monocular direct sparse odometry, (Proceedings of the European Conference on Computer Vision (2018)), 817-833
[43] Yi, K. M.; Trulls, E.; Lepetit, V.; Fua, P., Lift: learned invariant feature transform, (Proceedings of the European Conference on Computer Vision (2016)), 467-483
[44] Yin, Z.; Shi, J., Geonet: unsupervised learning of dense depth, optical flow and camera pose, (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)), 1983-1992
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.