×

zbMATH — the first resource for mathematics

Coarse-to-fine segmentation for indoor scenes with progressive supervision. (English) Zbl 07137432
Summary: Three-dimensional indoor scene segmentation is highly difficult due to the natural hierarchical structures and complicated contextual relationships in the scenes. In this paper, a 3D scene segmentation method that uses a stacked network is proposed for utilizing the context and hierarchy in 3D scenes. The method consists of two parts: a stacked network and progressive supervision. The stacked network consists of multiple base segmentation networks, and each network’s output is concatenated to the raw input as another network’s input to provide a prior context. Progressive supervision includes a group of coarse-to-fine segmentation labels that are generated based on the spatial relationships among objects in the scene, and it forces the network to learn the hierarchy. The experimental results from a regular dataset and a complex dataset demonstrate that our progressive supervision is effective and that our method outperforms existing methods in complex scenes.
MSC:
65D Numerical approximation and computational geometry (primarily algorithms)
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Armeni, I.; Sener, O.; Zamir, A. R.; Jiang, H.; Brilakis, I.; Fischer, M.; Savarese, S., 3D semantic parsing of large-scale indoor spaces, (2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016), IEEE), 1534-1543
[2] Chang, A.; Dai, A.; Funkhouser, T.; Halber, M.; Niebner, M.; Savva, M.; Song, S.; Zeng, A.; Zhang, Y., Matterport3D: learning from RGB-D data in indoor environments, (2017 International Conference on 3D Vision (3DV) (2017), IEEE), 667-676
[3] Dai, A.; Chang, A. X.; Savva, M.; Halber, M.; Funkhouser, T.; NieBner, M., ScanNet: richly-annotated 3D reconstructions of indoor scenes, (2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017), IEEE), 2432-2443
[4] Dai, A.; Ritchie, D.; Bokeloh, M.; Reed, S.; Sturm, J.; Nießner, M., Scancomplete: large-scale scene completion and semantic segmentation for 3d scans, (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)), 4578-4587
[5] Diba, A.; Sharma, V.; Pazandeh, A.; Pirsiavash, H.; Gool, L. V., Weakly supervised cascaded convolutional networks, (2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017), IEEE), 5131-5139
[6] Hua, B. S.; Pham, Q. H.; Nguyen, D. T.; Tran, M. K.; Yu, L. F.; Yeung, S. K., SceneNN: a scene meshes dataset with aNNotations, (2016 Fourth International Conference on 3D Vision (3DV) (2016), IEEE), 92-101
[7] Jackson, A. S.; Valstar, M.; Tzimiropoulos, G., A cnn cascade for landmark guided semantic part segmentation, (European Conference on Computer Vision (2016), Springer), 143-155
[8] Jiang, M.; Wu, Y.; Zhao, T.; Zhao, Z.; Lu, C., PointSIFT: a SIFT-like network module for 3D point cloud semantic segmentation (2018)
[9] Johnson, A.; Hebert, M., Using spin images for efficient object recognition in cluttered 3D scenes, IEEE Trans. Pattern Anal. Mach. Intell., 21, 433-449 (1999)
[10] Köhler, W., Gestalt psychology, Psychol. Res., 31, XVIII-XXX (1967)
[11] Marr, D., Vision: A Computational Investigation into the Human Representation and Processing of Visual Information (1982), MIT Press: MIT Press Cambridge, Massachusetts
[12] Martinovic, A.; Knopp, J.; Riemenschneider, H.; Van Gool, L., 3D all the way: semantic segmentation of urban scenes from start to end in 3D, (2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015), IEEE), 4456-4465
[13] Qi, C. R.; Su, H.; Mo, K.; Guibas, L. J., PointNet: deep learning on point sets for 3D classification and segmentation, (2016 Fourth International Conference on 3D Vision (3DV) (2016)), 601-610
[14] Qi, C. R.; Yi, L.; Su, H.; Guibas, L. J., PointNet++: deep hierarchical feature learning on point sets in a metric space, (Guyon, I.; Luxburg, U. V.; Bengio, S.; Wallach, H.; Fergus, R.; Vishwanathan, S.; Garnett, R., Advances in Neural Information Processing Systems, vol. 30 (2017), Curran Associates, Inc.), 5099-5108
[15] Qi, X.; Liao, R.; Jia, J.; Fidler, S.; Urtasun, R., 3D graph neural networks for RGBD semantic segmentation, (Proceedings of the IEEE International Conference on Computer Vision 2017-Octob. (2017)), 5209-5218
[16] Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; Berg, A. C.; Fei-Fei, L., ImageNet large scale visual recognition challenge, Int. J. Comput. Vis., 115, 211-252 (2015)
[17] Shelhamer, E.; Long, J.; Darrell, T., Fully convolutional networks for semantic segmentation, IEEE Trans. Pattern Anal. Mach. Intell., 39, 640-651 (2017)
[18] Tchapmi, L.; Choy, C.; Armeni, I.; Gwak, J.; Savarese, S., Segcloud: Semantic Segmentation of 3d Point Clouds, 537-547 (2017)
[19] Wang, Z.; Zhang, L.; Fang, T.; Mathiopoulos, P. T.; Tong, X.; Qu, H.; Xiao, Z.; Li, F.; Chen, D., A multiscale and hierarchical feature extraction method for terrestrial laser scanning point cloud classification, IEEE Trans. Geosci. Remote Sens., 53, 2409-2425 (2015)
[20] Wu, J.; Wang, Y.; Xue, T.; Sun, X.; Freeman, B.; Tenenbaum, J., Marrnet: 3d shape reconstruction via 2.5d sketches, (Guyon, I.; Luxburg, U. V.; Bengio, S.; Wallach, H.; Fergus, R.; Vishwanathan, S.; Garnett, R., Advances in Neural Information Processing Systems, vol. 30 (2017), Curran Associates, Inc.), 540-550
[21] Yang, B.; Dong, Z.; Liu, Y.; Liang, F.; Wang, Y., Computing multiple aggregation levels and contextual features for road facilities recognition using mobile laser scanning data, ISPRS J. Photogramm. Remote Sens., 126, 180-194 (2017)
[22] Zhang, Z.; Zhang, L.; Tong, X.; Mathiopoulos, P. T.; Guo, B.; Huang, X.; Wang, Z.; Wang, Y., A multilevel point-cluster-based discriminative feature for ALS point cloud classification, IEEE Trans. Geosci. Remote Sens., 54, 3309-3321 (2016)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.