×

Multi-resolution 3D CNN for learning multi-scale spatial features in CAD models. (English) Zbl 1480.65055

Summary: Learning multi-scale spatial features from 3D spatial geometric representations of objects such as point clouds, 3D CAD models, surfaces, and RGB-D data can potentially improve object recognition accuracy. Current deep learning approaches learn such features using structured data representations such as volume occupancy grids (voxels) and octrees or unstructured representations such as graphs and point clouds. Structured representations are generally restricted by their inherent limitations on the resolution, such as the voxel grid dimensions or the maximum octree depth. At the same time, it is challenging to learn directly from unstructured representations of 3D data due to non-uniformity among the samples. A hierarchical approach that maintains the structure at a larger scale while still accounting for the details at a smaller scale in specific spatial locations can provide an optimal solution for learning from 3D data. In this paper, we propose a multi-level learning approach to capture large-scale features at a coarse level (for example, using a coarse voxelization) while simultaneously capturing flexible sparse information of the small-scale features at a fine level (for example, a local fine-level voxel grid) at different spatial locations. To demonstrate the utility of the proposed multi-resolution learning, we use a multi-level voxel representation of CAD models to perform object recognition. The multi-level voxel representation consists of a coarse voxel grid containing volumetric information of the 3D objects and multiple fine-level voxel grids corresponding to each voxel in the coarse grid containing a portion of the object boundary. In addition, we develop an interpretability-based feedback approach to transfer saliency information from one level of features to another in our hierarchical end-to-end learning framework. Finally, we demonstrate the performance of our multi-resolution learning algorithm for object recognition. We outperform several previously published benchmarks for object recognition while using significantly less memory during training.

MSC:

65D18 Numerical aspects of computer graphics, image analysis, and computational geometry
68T07 Artificial neural networks and deep learning
PDF BibTeX XML Cite
Full Text: DOI

References:

[1] Abdul-Rahman, A.; Pilouk, M., Spatial Data Modelling for 3D GIS (2007), Springer Science & Business Media
[2] Baydin, A. G.; Pearlmutter, B. A.; Radul, A. A.; Siskind, J. M., Automatic differentiation in machine learning: a survey, J. Mach. Learn. Res., 18 (2018) · Zbl 06982909
[3] Blelloch, G. E., Scans as primitive parallel operations, IEEE Trans. Comput., 38, 1526-1538 (1989)
[4] Boscaini, D.; Masci, J.; Rodoià, E.; Bronstein, M., Learning shape correspondence with anisotropic convolutional neural networks, (Proceedings of the 30th International Conference on Neural Information Processing Systems. Proceedings of the 30th International Conference on Neural Information Processing Systems, USA (2016)), 3197-3205
[5] Bronstein, M. M.; Bruna, J.; LeCun, Y.; Szlam, A.; Vandergheynst, P., Geometric deep learning: going beyond Euclidean data, IEEE Signal Process. Mag., 34, 18-42 (2017)
[6] Cardone, A.; Gupta, S. K.; Karnik, M., A survey of shape similarity assessment algorithms for product design and manufacturing applications, J. Comput. Inf. Sci. Eng., 3, 109-118 (2003)
[7] Charles, R. Q.; Su, H.; Kaichun, M.; Guibas, L. J., PointNet: deep learning on point sets for 3D classification and segmentation, (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)), 77-85
[8] Choy, C.; Gwak, J.; Savarese, S., 4D spatio-temporal ConvNets: Minkowski convolutional neural networks, (Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019)), 3075-3084
[9] Couprie, C.; Farabet, C.; Najman, L.; LeCun, Y., Indoor semantic segmentation using depth information, (International Conference on Learning Representations (2013)), 1-8
[10] Elinson, A.; Nau, D. S.; Regli, W. C., Feature-based similarity assessment of solid models, (Proceedings of the Fourth ACM Symposium on Solid Modeling and Applications (1997), ACM), 297-310
[11] Fey, M.; Lenssen, J. E.; Weichert, F.; Müller, H., SplineCNN: fast geometric deep learning with continuous B-spline kernels, (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)), 869-877
[12] Ghadai, S.; Balu, A.; Sarkar, S.; Krishnamurthy, A., Learning localized features in 3D CAD models for manufacturability analysis of drilled holes, Comput. Aided Geom. Des., 62, 263-275 (2018) · Zbl 06892794
[13] Ghadai, S.; Yeow Lee, X.; Balu, A.; Sarkar, S.; Krishnamurthy, A., Multi-level 3D CNN for learning multi-scale spatial features, (Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (2019)), 1-5
[14] He, K.; Zhang, X.; Ren, S.; Sun, J., Deep residual learning for image recognition, (2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)), 770-778
[15] Huang, G.; Liu, Z.; van der Maaten, L.; Weinberger, K. Q., Densely connected convolutional networks, (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)), 2261-2269
[16] Ioannidou, A.; Chatzilari, E.; Nikolopoulos, S.; Kompatsiaris, I., Deep learning advances in computer vision with 3D data: a survey, ACM Comput. Surv., 50, 20 (2017)
[17] Iyer, N.; Jayanti, S.; Lou, K.; Kalyanaraman, Y.; Ramani, K., Three-dimensional shape searching: state-of-the-art review and future trends, Comput. Aided Des., 37, 509-530 (2005)
[18] Jayanti, S.; Kalyanaraman, Y.; Iyer, N.; Ramani, K., Developing an engineering shape benchmark for CAD models, Comput. Aided Des., 38, 939-953 (2006)
[19] Joshi, S.; Chang, T. C., Graph-based heuristics for recognition of machined features from a 3D solid model, Comput. Aided Des., 20, 58-66 (1988) · Zbl 0656.65014
[20] Kanezaki, A.; Matsushita, Y.; Nishida, Y., RotationNet: joint object categorization and pose estimation using multiviews from unsupervised viewpoints, (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)), 5010-5019
[21] Keim, D. A., Efficient geometry-based similarity search of 3D spatial databases, SIGMOD Rec., 28, 419-430 (1999)
[22] Klokov, R.; Lempitsky, V., Escape from cells: deep Kd-networks for the recognition of 3D point cloud models, (Proceedings of the IEEE International Conference on Computer Vision (2017)), 863-872
[23] Kovnatsky, A.; Bronstein, M. M.; Bronstein, A. M.; Glashoff, K.; Kimmel, R., Coupled quasi-harmonic bases, Comput. Graph. Forum, 32, 439-448 (2013)
[24] Kriegel, H. P.; Brecheisen, S.; Kröger, P.; Pfeifle, M.; Schubert, M., Using sets of feature vectors for similarity search on voxelized CAD objects, (Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data (2003), ACM: ACM New York, NY, USA), 587-598
[25] Krizhevsky, A.; Sutskever, I.; Hinton, G. E., ImageNet classification with deep convolutional neural networks, Commun. ACM, 60, 84-90 (2017)
[26] Kyprianou, L. K., Shape classification in computer-aided design (1980), University of Cambridge, Ph.D. thesis
[27] Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P., Gradient-based learning applied to document recognition, Proc. IEEE, 86, 2278-2324 (1998)
[28] Li, J.; Chen, B. M.; Lee, G. H., So-net: self-organizing network for point cloud analysis, (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)), 9397-9406
[29] Liu, M.; Shi, Y.; Zheng, L.; Xu, K.; Huang, H.; Manocha, D., Recurrent 3D attentional networks for end-to-end active object recognition, Comput. Vis. Media, 5, 91-104 (2019)
[30] Lowe, D. G., Local feature view clustering for 3D object recognition, (Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2001. Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2001, CVPR 2001 (2001), IEEE), 1
[31] Ma, C.; Guo, Y.; Lei, Y.; An, W., Binary volumetric convolutional neural networks for 3D object recognition, IEEE Trans. Instrum. Meas., 1-11 (2018)
[32] Masci, J.; Boscaini, D.; Bronstein, M. M.; Vandergheynst, P., Geodesic convolutional neural networks on Riemannian manifolds, (2015 IEEE International Conference on Computer Vision Workshop (2015)), 832-840
[33] Maturana, D.; Scherer, S., VoxNet: a 3D convolutional neural network for real-time object recognition, (2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2015)), 922-928
[34] Olah, C.; Mordvintsev, A.; Schubert, L., Feature visualization, Distill (2017)
[35] Qi, C. R.; Su, H.; Nießner, M.; Dai, A.; Yan, M.; Guibas, L. J., Volumetric and multi-view CNNs for object classification on 3D data, (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)), 5648-5656
[36] Qi, C. R.; Yi, L.; Su, H.; Guibas, L. J., PointNet++: deep hierarchical feature learning on point sets in a metric space, (Guyon, I.; Luxburg, U. V.; Bengio, S.; Wallach, H.; Fergus, R.; Vishwanathan, S.; Garnett, R., Advances in Neural Information Processing Systems (2017)), 5099-5108
[37] Ramesh, M.; Yip-Hoi, D.; Dutta, D., Feature based shape similarity measurement for retrieval of mechanical parts, J. Comput. Inf. Sci. Eng., 1, 245-256 (2001)
[38] Ribeiro, M. T.; Singh, S.; Guestrin, C., “Why should I trust you?”: Explaining the predictions of any classifier, (Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2016), ACM: ACM New York, NY, USA), 1135-1144
[39] Riegler, G.; Ulusoys, A. O.; Geiger, A., Octnet: learning deep 3D representations at high resolutions, (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)), 3577-3586
[40] Roveri, R.; Rahmann, L.; Oztireli, C.; Gross, M., A network architecture for point cloud classification via automatic depth images generation, (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)), 4176-4184
[41] Schnabel, R.; Wahl, R.; Klein, R., Efficient ransac for point-cloud shape detection, (Computer Graphics Forum (2007), Wiley Online Library), 214-226
[42] Seidl, T.; Kriegel, H. P., Optimal multi-step k-nearest neighbor search, (ACM Sigmod Record (1998), ACM), 154-165
[43] Selvaraju, R. R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D., Grad-cam: visual explanations from deep networks via gradient-based localization, (2017 IEEE International Conference on Computer Vision (2017)), 618-626
[44] Sfikas, K.; Pratikakis, I.; Theoharis, T., Ensemble of PANORAMA-based convolutional neural networks for 3D model classification and retrieval, Comput. Graph. (2017)
[45] Socher, R.; Huval, B.; Bhat, B.; Manning, C. D.; Ng, A. Y., Convolutional-recursive deep learning for 3D object classification, (Proceedings of the 25th International Conference on Neural Information Processing Systems, vol. 1. Proceedings of the 25th International Conference on Neural Information Processing Systems, vol. 1, USA (2012)), 656-664
[46] Song, S.; Xiao, J., Deep sliding shapes for amodal 3D object detection in RGB-D images, (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016)), 808-816
[47] Su, H.; Maji, S.; Kalogerakis, E.; Learned-Miller, E., Multi-view convolutional neural networks for 3D shape recognition, (Proceedings of the IEEE International Conference on Computer Vision (2015)), 945-953
[48] Tangelder, J. W.; Veltkamp, R. C., A survey of content based 3D shape retrieval methods, (Shape Modeling Applications, 2004. Proceedings (2004), IEEE), 145-156
[49] Tatarchenko, M.; Dosovitskiy, A.; Brox, T., Octree generating networks: efficient convolutional architectures for high-resolution 3D outputs, (Proceedings of the IEEE International Conference on Computer Vision (2017)), 2088-2096
[50] Tombari, F.; Salti, S.; Di Stefano, L., A combined texture-shape descriptor for enhanced 3D feature matching, (18th IEEE International Conference on Image Processing (ICIP), 2011 (2011), IEEE), 809-812
[51] Veltkamp, R. C., Shape matching: similarity measures and algorithms, (SMI 2001 International Conference on Shape Modeling and Applications (2001), IEEE), 188-197
[52] Wang, C.; Samari, B.; Siddiqi, K., Local spectral graph convolution for point set feature learning, (Proceedings of the European Conference on Computer Vision (ECCV) (2018)), 52-66
[53] Wang, F.; Decker, J.; Wu, X.; Essertel, G.; Rompf, T., Backpropagation with callbacks: foundations for efficient and expressive differentiable programming, Adv. Neural Inf. Process. Syst., 31, 10180-10191 (2018)
[54] Wang, P. S.; Liu, Y.; Guo, Y. X.; Sun, C. Y.; Tong, X., O-CNN: octree-based convolutional neural networks for 3D shape analysis, ACM Trans. Graph., 36 (2017)
[55] Wu, J.; Zhang, C.; Xue, T.; Freeman, W. T.; Tenenbaum, J. B., Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling, (Proceedings of the 30th International Conference on Neural Information Processing Systems. Proceedings of the 30th International Conference on Neural Information Processing Systems, USA (2016)), 82-90
[56] Wu, Z.; Song, S.; Khosla, A.; Yu, F.; Zhang, L.; Tang, X.; Xiao, J., 3D ShapeNets: a deep representation for volumetric shapes, (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015)), 1912-1920
[57] Xiao, Y. P.; Lai, Y. K.; Zhang, F. L.; Li, C.; Gao, L., A survey on deep geometry learning: from a representation perspective, Comput. Vis. Media, 6, 113-133 (2020)
[58] Xu, X.; Todorovic, S., Beam search for learning a deep convolutional neural network of 3D shapes, (2016 23rd International Conference on Pattern Recognition (ICPR) (2016)), 3506-3511
[59] Young, G.; Krishnamurthy, A., GPU-accelerated generation and rendering of multi-level voxel representations of solid models, Comput. Graph., 75, 11-24 (October 2018)
[60] Zeiler, M. D.; Fergus, R., Visualizing and understanding convolutional networks, (Fleet, D.; Pajdla, T.; Schiele, B.; Tuytelaars, T., Computer Vision - ECCV 2014 (2014), Springer International Publishing: Springer International Publishing Cham), 818-833
[61] Zhang, Q.; Cao, R.; Wu, Y. N.; Zhu, S., Mining object parts from CNNs via active question-answering, (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)), 3890-3899
[62] Zhang, Q.; Wang, W.; Zhu, S. C., Examining CNN representations with respect to dataset bias, (Proceedings of the AAAI Conference on Artificial Intelligence (2018)), 4464-4473
[63] Zhang, Q.s.; Zhu, S.c., Visual interpretability for deep learning: a survey, Front. Inf. Technol. Electron. Eng., 19, 27-39 (2018)
[64] Zhi, S.; Liu, Y.; Li, X.; Guo, Y., Towards real-time 3D object recognition: a lightweight volumetric CNN framework using multitask learning, Comput. Graph., 71, 199-207 (2018)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.