×

zbMATH — the first resource for mathematics

A survey of structure from motion. (English) Zbl 1377.65027
Summary: The structure from motion (SfM) problem in computer vision is to recover the three-dimensional (3D) structure of a stationary scene from a set of projective measurements, represented as a collection of two-dimensional (2D) images, via estimation of motion of the cameras corresponding to these images. In essence, SfM involves the three main stages of (i) extracting features in images (e.g. points of interest, lines, etc.) and matching these features between images, (ii) camera motion estimation (e.g. using relative pairwise camera positions estimated from the extracted features), and (iii) recovery of the 3D structure using the estimated motion and features (e.g. by minimizing the so-called reprojection error). This survey mainly focuses on relatively recent developments in the literature pertaining to stages (ii) and (iii). More specifically, after touching upon the early factorization-based techniques for motion and structure estimation, we provide a detailed account of some of the recent camera location estimation methods in the literature, followed by discussion of notable techniques for 3D structure recovery. We also cover the basics of the simultaneous localization and mapping (SLAM) problem, which can be viewed as a specific case of the SfM problem. Further, our survey includes a review of the fundamentals of feature extraction and matching (i.e. stage (i) above), various recent methods for handling ambiguities in 3D scenes, SfM techniques involving relatively uncommon camera models and image features, and popular sources of data and SfM software.

MSC:
65D18 Numerical aspects of computer graphics, image analysis, and computational geometry
94A08 Image processing (compression, reconstruction, etc.) in information and communication theory
Software:
PCA-SIFT; SURF; LSD-SLAM; SBA; SIFT
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Agarwal, S.; Snavely, N.; Seitz, S.; Szeliski, R., ECCV 2010: 11th European Conference on Computer Vision, part II, Bundle adjustment in the large, 29-42, (2010), Springer
[2] Agarwal, S.; Snavely, N.; Simon, I.; Seitz, S.; Szeliski, R., ICCV 2009: 12th IEEE International Conference on Computer Vision, Building Rome in a day, pp., (2009)
[3] Aliaga, D., ICCV 2001: 8th IEEE International Conference on Computer Vision, Accurate catadioptric calibration for real-time pose estimation in room-size environments, 127-134, (2001)
[4] Arie-Nachimson, M.; Kovalsky, S.; Kemelmacher-Shlizerman, I.; Singer, A.; Basri, R., 3DimPVT 2012: 2nd IEEE International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission, Global motion estimation from point matches, 81-88, (2012)
[5] Arrigoni, F.; Fusiello, A.; Rossi, B., 3DV 2016: 4th IEEE International Conference on 3D Vision, Camera motion from group synchronization, 546-555, (2016)
[6] Aulinas, J.; Petillot, Y.; Salvi, J.; Lladó, X., 2008 Conference on Artificial Intelligence Research and Development; 11th International Conference of the Catalan Association for Artificial Intelligence, The SLAM problem: A survey, 363-371, (2008), IOS Press
[7] Bartoli, A.; Sturm, P., Structure-from-motion using lines: representation, triangulation, and bundle adjustment, Comput. Vision Image Underst., 100, 416-441, (2005)
[8] Bay, H.; Tuytelaars, T.; Van Gool, L., ECCV 2006: 9th European Conference on Computer Vision, SURF: Speeded up robust features, 404-417, (2006), Springer
[9] Bolles, R.; Fischler, M., IJCAI ’81: 7th International Joint Conference on Artificial intelligence, part 2, A RANSAC-based approach to model fitting and its application to finding cylinders in range data, 637-643, (1981)
[10] Brand, M.; Antone, M.; Teller, S., ECCV 2004: 8th European Conference on Computer Vision, Spectral solution of large-scale extrinsic camera calibration as a graph embedding problem, 262-273, (2004), Springer · Zbl 1098.68733
[11] Chang, P.; Hebert, M., 2000 IEEE Workshop on Omnidirectional Vision, Omni-directional structure from motion, 127-133, (2000)
[12] Chatterjee, A.; Govindu, V., ICCV 2013: IEEE International Conference on Computer Vision, Efficient and robust large-scale rotation averaging, 521-528, (2013)
[13] Chiuso, A.; Brockett, R.; Soatto, S., Optimal structure from motion: local ambiguities and global estimates, Int. J. Comput. Vision, 39, 195-228, (2000) · Zbl 1012.68696
[14] Cohen, A.; Zach, C.; Sinha, S.; Pollefeys, M., CVPR 2012: IEEE Conference on Computer Vision and Pattern Recognition, Discovering and exploiting 3D symmetries in structure from motion, 1514-1521, (2012)
[15] Crandall, D.; Owens, A.; Snavely, N.; Huttenlocher, D., CVPR 2011: IEEE Conference on Computer Vision and Pattern Recognition, Discrete-continuous optimization for large-scale structure from motion, 3001-3008, (2011)
[16] Cucuringu, M.; Singer, A.; Cowburn, D., Eigenvector synchronization, graph rigidity and the molecule problem, Inf. Inference, 1, 27-67, (2012) · Zbl 1278.05231
[17] Engel, J.; Schöps, T.; Cremers, D., ECCV 2014: 13th European Conference on Computer Vision, LSD-SLAM: Large-scale direct monocular SLAM, 834-849, (2014), Springer
[18] Fuentes-Pacheco, J.; Ruiz-Ascencio, J.; Rendón-Mancha, J., Visual simultaneous localization and mapping: A survey, Artificial Intelligence Review, 43, 55-81, (2015)
[19] Furukawa, Y.; Ponce, J., Accurate, dense, and robust multiview stereopsis, IEEE Trans. Pattern Anal. Mach. Intel., 32, 1362-1376, (2010)
[20] Furukawa, Y.; Ponce, J., pp.
[21] Furukawa, Y.; Curless, B.; Seitz, S.; Szeliski, R., pp.
[22] Furukawa, Y.; Curless, B.; Seitz, S.; Szeliski, R., CVPR 2010: IEEE Conference on Computer Vision and Pattern Recognition, Towards Internet-scale multi-view stereo, 1434-1441, (2010)
[23] Gauglitz, S.; Höllerer, T.; Turk, M., Evaluation of interest point detectors and feature descriptors for visual tracking, Int. J. Comput. Vision, 94, 335-360, (2011) · Zbl 1235.68264
[24] Gluckman, J.; Nayar, S., pp.
[25] Goldstein, T.; Hand, P.; Lee, C.; Voroninski, V.; Soatto, S., ECCV 2016: 14th European Conference on Computer Vision, ShapeFit and ShapeKick for robust, scalable structure from motion, 289-304, (2016), Springer
[26] Govindu, V., pp.
[27] Govindu, V., pp.
[28] Hand, P.; Lee, C.; Voroninski, V., pp.
[29] Hand, P.; Lee, C.; Voroninski, V., Shapefit: exact location recovery from corrupted pairwise directions, Comm. Pure Appl. Math., pp., (2017) · Zbl 1423.68525
[30] Hartley, R., In defense of the eight-point algorithm, IEEE Trans. Pattern Anal. Mach. Intel., 19, 580-593, (1997)
[31] Hartley, R.; Zisserman, A., Multiple View Geometry in Computer Vision, pp., (2000), Cambridge University Press · Zbl 0956.68149
[32] Havlena, M.; Torii, A.; Pajdla, T., ECCV 2010: 11th European Conference on Computer Vision, Efficient structure from motion by graph optimization, 100-113, (2010), Springer
[33] Havlena, M.; Torii, A.; Knopp, J.; Pajdla, T., CVPR 2009: IEEE Conference on Computer Vision and Pattern Recognition, Randomized structure from motion based on atomic 3D models from camera triplets, 2874-2881, (2009)
[34] Hernandez, J.; Tsotsos, K.; Soatto, S., ICRA 2015: IEEE International Conference on Robotics and Automation, Observability, identifiability and sensitivity of vision-aided inertial navigation, 2319-2325, (2015)
[35] Jiang, N.; Cui, Z.; Tan, P., ICCV 2013: IEEE International Conference on Computer Vision, A global linear method for camera pose registration, 481-488, (2013)
[36] Jiang, N.; Tan, P.; Cheong, L., CVPR 2012: IEEE Conference on Computer Vision and Pattern Recognition, Seeing double without confusion: Structure-from-motion in highly ambiguous scenes, 1458-1465, (2012)
[37] Kanade, T.; Morris, D., Philos. Trans. Royal Soc. London, 356, Factorization methods for structure from motion, 1153-1173, (1998) · Zbl 0901.68211
[38] Kannala, J.; Brandt, S., A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses, IEEE Trans. Pattern Anal. Mach. Intel., 28, 1335-1340, (2006)
[39] Ke, Y.; Sukthankar, R., CVPR 2004: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, part 2, PCA-SIFT: A more distinctive representation for local image descriptors, 506-513, (2004)
[40] Longuet-Higgins, H., A computer algorithm for reconstructing a scene from two projections, Nature, 293, 133-135, (1981)
[41] Lourakis, M.; Argyros, A., SBA: A software package for generic sparse bundle adjustment, ACM Trans. Math. Softw., 36, pp., (2009) · Zbl 1364.65052
[42] Lowe, D., ICCV 1999: 7th IEEE International Conference on Computer Vision, part 2, Object recognition from local scale-invariant features, 1150-1157, (1999)
[43] Lowe, D., Distinctive image features from scale-invariant keypoints, Int. J. Comput. Vision, 60, 91-110, (2004)
[44] Ma, Y.; Košecká, J.; Sastry, S., Optimization criteria and geometric algorithms for motion and structure estimation, Int. J. Comput. Vision, 44, 219-249, (2001) · Zbl 0998.68197
[45] Martinec, D.; Pajdla, T., CVPR 2003: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, part 1, 497-502, (2003)
[46] Martinec, D.; Pajdla, T., CVPR 2007: IEEE Conference on Computer Vision and Pattern Recognition, Robust rotation and translation estimation in multiview reconstruction, 1-8, (2007)
[47] Micusik, B.; Pajdla, T., Structure from motion with wide circular field of view cameras, IEEE Trans. Pattern Anal. Mach. Intel., 28, 1135-1149, (2006)
[48] Mikolajczyk, K.; Schmid, C., A performance evaluation of local descriptors, IEEE Trans. Pattern Anal. Mach. Intel., 27, 1615-1630, (2005)
[49] Moulon, P.; Monasse, P.; Marlet, R., ICCV 2013: IEEE International Conference on Computer Vision, Global fusion of relative motions for robust, accurate and scalable structure from motion, 3248-3255, (2013)
[50] Moulon, P.; Monasse, P.; Marlet, R., pp.
[51] Mouragnon, E.; Lhuillier, M.; Dhome, M.; Dekeyser, F.; Sayd, P., Generic and real-time structure from motion using local bundle adjustment, Image and Vision Computing, 27, 1178-1193, (2009)
[52] Musialski, P.; Wonka, P.; Aliaga, D.; Wimmer, M.; Van Gool, L.; Purgathofer, W., Computer Graphics Forum, 32, A survey of urban reconstruction, 146-177, (2013)
[53] Oliensis, J., A critique of structure-from-motion algorithms, Comput. Vision Image Underst., 80, 172-214, (2000) · Zbl 1010.68550
[54] Özyeşil, O.; Singer, A., CVPR 2015: IEEE Conference on Computer Vision and Pattern Recognition, Robust camera location estimation by convex programming, 2674-2683, (2015)
[55] Özyeşil, O.; Singer, A.; Basri, R., Stable camera motion estimation using convex programming, SIAM J. Imaging Sci., 8, 1220-1262, (2015) · Zbl 1341.68287
[56] Pachauri, D.; Kondor, R.; Sargur, G.; Singh, V.; Ghahramani, Z., Advances in Neural Information Processing Systems 27, Permutation diffusion maps (PDM) with application to the image association problem in computer vision, 541-549, (2014), Curran Associates
[57] Pollefeys, M.; Nistér, D.; Frahm, J.-M.; Akbarzadeh, A.; Mordohai, P.; Clipp, B.; Engels, C.; Gallup, D.; Kim, S.-J.; Merrell, P.; Salmi, C.; Sinha, S.; Talton, B.; Wang, L.; Yang, Q.; Stewénius, H.; Yang, R.; Welch, G.; Towles, H., Detailed real-time urban 3D reconstruction from video, Int. J. Comput. Vision, 78, 143-167, (2008)
[58] Quan, L.; Kanade, T., Affine structure from line correspondences with uncalibrated affine cameras, IEEE Trans. Pattern Anal. Mach. Intel., 19, 834-845, (1997)
[59] Ramalingam, S.; Lodha, S.; Sturm, P., ‘A generic structure-from-motion framework’, Comput. Vision Image Underst., 103, 218-228, (2006)
[60] Roberts, R.; Sinha, S.; Szeliski, R.; Steedly, D.; Szeliski, R., CVPR 2011: IEEE Conference on Computer Vision and Pattern Recognition, Structure from motion for scenes with large duplicate structures, 3137-3144, (2011)
[61] Schaffalitzky, F.; Zisserman, A., ECCV 2002: 7th European Conference on Computer Vision, part 1, Multi-view matching for unordered image sets, or ‘How do I organize my holiday snaps?, 414-431, (2002), Springer · Zbl 1034.68662
[62] Schindler, G.; Krishnamurthy, P.; Dellaert, F., Third International Symposium on 3D Data Processing, Visualization, and Transmission, Line-based structure from motion for urban environments, 846-853, (2006), IEEE
[63] Schönberger, J.; Frahm, J.-M., CVPR 2016: IEEE Conference on Computer Vision and Pattern Recognition, Structure-from-motion revisited, 4104-4113, (2016)
[64] Schönberger, J.; Zheng, E.; Frahm, J.-M.; Pollefeys, M., ECCV 2016: 14th European Conference on Computer Vision, part III, Pixelwise view selection for unstructured multi-view stereo, 501-518, (2016), Springer
[65] Shakernia, O.; Vidal, R.; Sastry, S., CVPRW 2003: Computer Vision and Pattern Recognition Workshop, part 7, Omnidirectional egomotion estimation from back-projection flow, 82-82, (2003)
[66] Singer, A., ‘angular synchronization by eigenvectors and semidefinite programming’, Appl. Comput. Harmon. Anal., 30, 20-36, (2011) · Zbl 1206.90116
[67] Sinha, S.; Steedly, D.; Szeliski, R., ECCV 2010 Workshops: Trends and Topics in Computer Vision, part II, A multi-stage linear approach to structure from motion, 267-281, (2010), Springer
[68] Snavely, N.; Seitz, S.; Szeliski, R., ACM Trans. Graph., 25, Photo tourism: Exploring photo collections in 3D, 835-846, (2006)
[69] Snavely, N.; Seitz, S.; Szeliski, R., Int. J. Comput. Vision, 80, Modeling the world from internet photo collections, 189-210, (2008)
[70] Snavely, N.; Seitz, S.; Szeliski, R., CVPR 2008: IEEE Conference on Computer Vision and Pattern Recognition, Skeletal graphs for efficient structure from motion, 1-8, (2008)
[71] Soatto, S., 3-D structure from visual motion: modeling, representation and observability, Automatica, 33, 1287-1312, (1997) · Zbl 0880.93008
[72] Strecha, C.; Hansen, W.; Van Gool, L.; Fua, P.; Thoennessen, U., CVPR 2008: IEEE Conference on Computer Vision and Pattern Recognition, On benchmarking camera calibration and multi-view stereo for high resolution imagery, 1-8, (2008)
[73] Sturm, P.; Triggs, B., ECCV ’96: 4th European Conference on Computer Vision, part II, A factorization based algorithm for multi-image projective structure and motion, 709-720, (1996), Springer
[74] Sweeney, C., pp.
[75] Taylor, C.; Kriegman, D., ‘structure and motion from line segments in multiple images’, IEEE Trans. Pattern Anal. Mach. Intel., 17, 1021-1032, (1995)
[76] Tomasi, C.; Kanade, T., ‘shape and motion from image streams under orthography: A factorization method’, Int. J. Comput. Vision, 9, 137-154, (1992)
[77] Triggs, B., CVPR ’96: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Factorization methods for projective structure and motion, 845-851, (1996)
[78] Triggs, B.; McLauchlan, P.; Hartley, R.; Fitzgibbon, A., Vision Algorithms: Theory and Practice, Bundle adjustment: A modern synthesis, 298-375, (2000), Springer
[79] Tron, R.; Vidal, R., ‘distributed \(3\)-D localization of camera sensor networks from \(2\)-D image measurements’, IEEE Trans. Automatic Control, 59, 3325-3340, (2014) · Zbl 1360.94048
[80] Tron, R.; Zhou, X.; Daniilidis, K., IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, A survey on rotation optimization in structure from motion, 77-85, (2016)
[81] Tsotsos, K.; Chiuso, A.; Soatto, S., ICRA 2015: IEEE International Conference on Robotics and Automation, Robust inference for visual-inertial sensor fusion, 5203-5210, (2015)
[82] Tuytelaars, T.; Mikolajczyk, K., ‘local invariant feature detectors: A survey’, Found. Trends Comput. Graphics Vision, 3, 177-280, (2008)
[83] Vedaldi, A.; Guidi, G.; Soatto, S., CVPR 2007: IEEE Conference on Computer Vision and Pattern Recognition, Moving forward in structure from motion, 1-7, (2007)
[84] Wang, L.; Singer, A., Exact and stable recovery of rotations for robust synchronization, Inf. Inference, 2, 145-193, (2013) · Zbl 1309.65070
[85] Wilson, K.; Snavely, N., ICCV 2013: IEEE International Conference on Computer Vision, Network principles for SfM: Disambiguating repeated structures with local context, 513-520, (2013)
[86] Wilson, K.; Snavely, N., ECCV 2014: 13th European Conference on Computer Vision, part III, Robust global translations with 1DSfM, 61-75, (2014), Springer
[87] Wu, C., pp.
[88] Wu, C., pp.
[89] Wu, C.; Agarwal, S.; Curless, B.; Seitz, S., CVPR 2011: IEEE Conference on Computer Vision and Pattern Recognition, Multicore bundle adjustment, 3057-3064, (2011)
[90] Younes, G.; Asmar, D.; Shammas, E., pp.
[91] Zach, C.; Klopschitz, M.; Pollefeys, M., CVPR 2010: IEEE Conference on Computer Vision and Pattern Recognition, Disambiguating visual relations using loop constraints, 1426-1433, (2010)
[92] Zhang, Z., ‘determining the epipolar geometry and its uncertainty: A review’, Int. J. Comput. Vision, 27, 161-195, (1998)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.