×

PSPDNet: part-aware shape and pose disentanglement neural network for 3D human animating meshes. (English) Zbl 07711476

Summary: Disentangled representations of shape and pose are essential for animating human body meshes in computer animation, computer games, and virtual reality applications. While recent deep neural networks have achieved impressive effectiveness, their performance in terms of interpretability, reconstruction precision, and fine-grained control is not satisfactory. To address these issues, we propose the Part-aware Shape and Pose Disentanglement neural network (PSPDNet), a framework for disentangling the shape and pose of 3D human meshes with the same connectivity. PSPDNet utilizes part mesh autoencoders to learn representations of different human body parts, enhancing the interpretability of the latent codes by corresponding them with local motions. While mesh autoencoders alone can decouple shape and pose information from animated meshes, they fail to control local motions. In addition, PSPDNet employs an additional rotation-translation module to remove global rigid motion, i.e., rotation and translation, from the sequence. Finally, we propose a novel loss function which includes disentanglement loss and alignment loss to train PSPDNet in an unsupervised manner. Our experiments show that PSPDNet greatly improves disentangled representation with strong interpretability, insensitivity to global rigid transformation, and locality of editing and controlling.

MSC:

65Dxx Numerical approximation and computational geometry (primarily algorithms)
Full Text: DOI

References:

[1] Anguelov, D.; Srinivasan, P.; Koller, D.; Thrun, S.; Rodgers, J.; Davis, J., SCAPE: shape completion and animation of people, ACM Trans. Graph., 408-416 (2005)
[2] Aumentado-Armstrong, T.; Tsogkas, S.; Dickinson, S.; Jepson, A., Disentangling geometric deformation spaces in generative latent shape models (2021), arXiv preprint
[3] Aumentado-Armstrong, T.; Tsogkas, S.; Jepson, A.; Dickinson, S., Geometric disentanglement for generative latent shape models, (Proceedings of the IEEE International Conference on Computer Vision (2019)), 8181-8190
[4] Bagautdinov, T.; Wu, C.; Simon, T.; Prada, F.; Shiratori, T.; Wei, S. E.; Xu, W.; Sheikh, Y.; Saragih, J., Driving-signal aware full-body avatars, ACM Trans. Graph., 40 (2021)
[5] Blanz, V.; Vetter, T., A morphable model for the synthesis of 3D faces, (Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH 99) (1999)), 187-194
[6] Bogo, F.; Romero, J.; Loper, M.; Black, M. J., FAUST: dataset and evaluation for 3D mesh registration, (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2014)), 3794-3801
[7] Bogo, F.; Romero, J.; Pons-Moll, G.; Black, M. J., Dynamic FAUST: registering human bodies in motion, (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)), 6233-6242
[8] Bouaziz, S.; Wang, Y.; Pauly, M., Online modeling for realtime facial animation, ACM Trans. Graph., 32, 1-10 (2013) · Zbl 1305.68211
[9] Bouritsas, G.; Bokhnyak, S.; Ploumpis, S.; Bronstein, M.; Zafeiriou, S., Neural 3D morphable models: spiral convolutional networks for 3D shape representation learning and generation, (Proceedings of the IEEE International Conference on Computer Vision (2019)), 7213-7222
[10] Cosmo, L.; Norelli, A.; Halimi, O.; Kimmel, R.; Rodola, E., LIMP: learning latent shape representations with metric preservation priors, (Proceedings of the European Conference on Computer Vision (2020), Springer), 19-35
[11] Hirshberg, D. A.; Loper, M.; Rachlin, E.; Black, M. J., Coregistration: simultaneous alignment and modeling of articulated 3D shape, (Proceedings of the European Conference on Computer Vision (2012)), 242-255
[12] Hu, X.; Li, X.; Busam, B.; Zhou, Y.; Leonardis, A.; Yuan, S., Disentangling 3D attributes from a single 2d image: human pose, shape and garment (2022), arXiv preprint
[13] Jiang, B.; Zhang, J.; Cai, J.; Zheng, J., Disentangled human body embedding based on deep hierarchical neural network, IEEE Trans. Vis. Comput. Graph., 26, 2560-2575 (2020)
[14] Jiang, Z. H.; Wu, Q.; Chen, K.; Zhang, J., Disentangled representation learning for 3D face shape, (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019)), 11957-11966
[15] Levinson, J.; Sud, A.; Makadia, A., Latent feature disentanglement for 3D meshes (2019), arXiv preprint
[16] Li, P.; Aberman, K.; Hanocka, R.; Liu, L.; Sorkine-Hornung, O.; Chen, B., Learning skeletal articulations with neural blend shapes, ACM Trans. Graph., 40, 1-15 (2021)
[17] Li, T.; Bolkart, T.; Black, M. J.; Li, H.; Romero, J., Learning a model of facial shape and expression from 4D scans, ACM Trans. Graph., 36, Article 194 pp. (2017)
[18] Litany, O.; Bronstein, A.; Bronstein, M.; Makadia, A., Deformable shape completion with graph convolutional autoencoders, (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)), 1886-1895
[19] Lombardi, S.; Yang, B.; Fan, T.; Bao, H.; Zhang, G.; Pollefeys, M.; Cui, Z., LatentHuman: shape-and-pose disentangled latent representation for human bodies, (2021 International Conference on 3D Vision (3DV), IEEE (2021)), 278-288
[20] Loper, M.; Mahmood, N.; Romero, J.; Pons-Moll, G.; Black, M. J., SMPL: a skinned multi-person linear model, ACM Trans. Graph., 34, 248 (2015)
[21] Loshchilov, I.; Hutter, F., Fixing weight decay regularization in Adam (2017), ArXiv
[22] Mahmood, N.; Ghorbani, N.; Troje, N. F.; Pons-Moll, G.; Black, M. J., AMASS: archive of motion capture as surface shapes, (Proceedings of the IEEE/CVF International Conference on Computer Vision (2019)), 5442-5451
[23] Palafox, P.; Božič, A.; Thies, J.; Nießner, M.; Dai, A., NPMs: neural parametric models for 3D deformable shapes, (Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2021)), 12695-12705
[24] Paszke, A.; Gross, S.; Chintala, S.; Chanan, G.; Yang, E.; DeVito, Z.; Lin, Z.; Desmaison, A.; Antiga, L.; Lerer, A. (2017), Automatic differentiation in PyTorch URL
[25] Pavlakos, G.; Choutas, V.; Ghorbani, N.; Bolkart, T.; Osman, A. A.; Tzionas, D.; Black, M. J., Expressive body capture: 3D hands, face, and body from a single image, (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019)), 10975-10985
[26] Pishchulin, L.; Wuhrer, S.; Helten, T.; Theobalt, C.; Schiele, B., Building statistical shape spaces for 3D human modeling, Pattern Recognit., 67, 276-286 (2017)
[27] Ranjan, A.; Bolkart, T.; Sanyal, S.; Black, M. J., Generating 3D faces using convolutional mesh autoencoders, (Proceedings of the European Conference on Computer Vision (2018)), 704-720
[28] Romero, J.; Tzionas, D.; Black, M. J., Embodied hands: modeling and capturing hands and bodies together, ACM Trans. Graph., 36, 245 (2017)
[29] Sigal, L.; Balan, A. O.; Black, M. J., HumanEva: synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion, Int. J. Comput. Vis., 87, 4 (2010)
[30] Sorkine, O.; Alexa, M., As-rigid-as-possible surface modeling, (Symposium on Geometry Processing (2007)), 109-116
[31] Tan, Q.; Gao, L.; Lai, Y. K.; Xia, S., Variational autoencoders for deforming 3D mesh models, (Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)), 5841-5850
[32] Tan, Q.; Gao, L.; Lai, Y. K.; Yang, J.; Xia, S., Mesh-based autoencoders for localized deformation component analysis, (Proceedings of the AAAI Conference on Artificial Intelligence (2018))
[33] Tretschk, E.; Tewari, A.; Zollhöfer, M.; Golyanik, V.; Theobalt, C., DEMEA: deep mesh autoencoders for non-rigidly deforming objects, (Proceedings of the European Conference on Computer Vision (2020)), 601-617
[34] Trumble, M.; Gilbert, A.; Malleson, C.; Hilton, A.; Collomosse, J., Total capture: 3D human pose estimation fusing video and inertial sensors, (2017 British Machine Vision Conference (BMVC) (2017))
[35] Wang, Y.; Li, G.; Zhang, H.; Zou, X.; Liu, Y.; Nie, Y., PanoMan: sparse localized components-based model for full human motions, ACM Trans. Graph., 40, 1-17 (2021)
[36] Xu, H.; Bazavan, E. G.; Zanfir, A.; Freeman, W. T.; Sukthankar, R.; Sminchisescu, C., GHUM & GHUML: generative 3D human shape and articulated pose models, (Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern (2020)), Recognition (CVPR)
[37] Yang, J.; Gao, L.; Tan, Q.; Huang, Y.; Xia, S.; Lai, Y. K., Multiscale mesh deformation component analysis with attention-based autoencoders (2020), arXiv preprint
[38] Zhou, K.; Bhatnagar, B. L.; Pons-Moll, G., Unsupervised shape and pose disentanglement for 3D meshes, (Proceedings of the European Conference on Computer Vision (2020)), 341-357
[39] Zhou, Y.; Wu, C.; Li, Z.; Cao, C.; Ye, Y.; Saragih, J.; Li, H.; Sheikh, Y., Fully convolutional mesh autoencoder using efficient spatially varying kernels, Adv. Neural Inf. Process. Syst., 33, 9251-9262 (2020)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.