Human body shape reconstruction from binary silhouette images. (English) Zbl 07137402

Summary: 3D content creation is referred to as one of the most fundamental tasks of computer graphics. And many 3D modeling algorithms from 2D images or curves have been developed over the past several decades. Designers are allowed to align some conceptual images or sketch some suggestive curves, from front, side, and top views, and then use them as references in constructing a 3D model manually or semi-automatically. In this paper, we propose a deep learning based reconstruction of 3D human body shape from 2D orthographic silhouette images. A CNN-based regression network, with two branches corresponding to frontal and lateral views respectively, is designed for estimating 3D human body shape from binary silhouette images. We train our networks separately to decouple the feature descriptors which encode the body parameters from different views, and fuse them to estimate an accurate human body shape. In addition, to overcome the shortage of training data required for this purpose, we propose some significantly data augmentation schemes for 3D human body shapes, which can be used to promote further research on this topic. Extensive experimental results demonstrate that visually realistic and accurate reconstructions can be achieved effectively using our algorithm. Requiring only one or two silhouette images, our method can help users create their own digital avatars quickly, and also make it easy to create digital human body for 3D game, virtual reality, online fashion shopping.


65Dxx Numerical approximation and computational geometry (primarily algorithms)


Full Text: DOI


[1] Alldieck, T.; Magnor, M. A.; Xu, W.; Theobalt, C.; Pons-Moll, G., Video based reconstruction of 3D people models (2018), URL:
[2] Allen, B.; Curless, B.; Popović, Z., The space of human body shapes: reconstruction and parameterization from range scans, ACM Trans. Graph., 22, 587-594 (2003)
[3] Allen, B.; Curless, B.; Popović, Z.; Hertzmann, A., Learning a correlated model of identity and pose-dependent body shape variation for real-time synthesis, (Cani, M. P.; O’Brien, J., ACM SIGGRAPH / Eurographics Symposium on Computer Animation, Eurographics Association. ACM SIGGRAPH / Eurographics Symposium on Computer Animation, Eurographics Association, Vienna, Austria (2006)), 147-156
[4] Anguelov, D.; Srinivasan, P.; Koller, D.; Thrun, S.; Rodgers, J.; Davis, J., Scape: shape completion and animation of people, ACM Trans. Graph., 24, 408-416 (2005)
[5] Bogo, F.; Kanazawa, A.; Lassner, C.; Gehler, P.; Romero, J.; Black, M. J., Keep it SMPL: automatic estimation of 3D human pose and shape from a single image, (Computer Vision - ECCV 2016 (2016), Springer International Publishing)
[6] Boisvert, J.; Shu, C.; Wuhrer, S.; Xi, P., Three-dimensional human shape inference from silhouettes: reconstruction and validation, Mach. Vis. Appl., 24, 145-157 (2013)
[7] Chen, Y.; Kim, T. K.; Cipolla, R., Inferring 3D shapes and deformations from single views, (Daniilidis, K.; Maragos, P.; Paragios, N., Computer Vision - ECCV 2010, 11th European Conference on Computer Vision, Proceedings, Part III. Computer Vision - ECCV 2010, 11th European Conference on Computer Vision, Proceedings, Part III, Heraklion, Crete, Greece, September 5-11, 2010 (2010), Springer), 300-313
[8] Cun, Y. L.; Boser, B.; Denker, J. S.; Henderson, D.; Howard, R.; Hubbard, W.; Jackel, L., Backpropagation applied to handwritten zip code recognition, Neural Comput., 1, 541-551 (1989)
[9] Dibra, E.; Öztireli, A. C.; Ziegler, R.; Gross, M. H., Shape from selfies: human body shape estimation using CCA regression forests, (Leibe, B.; Matas, J.; Sebe, N.; Welling, M., Computer Vision - ECCV 2016-14th European Conference, Proceedings, Part IV. Computer Vision - ECCV 2016-14th European Conference, Proceedings, Part IV, Amsterdam, the Netherlands, October 11-14, 2016 (2016), Springer), 88-104
[10] Fan, L.; Liu, L.; Liu, K., Paint mesh cutting, Comput. Graph. Forum (Proc. of Eurographics 2011), 30, 603-611 (2011)
[11] Guan, P.; Weiss, A.; Balan, A. O.; Black, M. J., Estimating human shape and pose from a single image, Int. Conf. on Computer Vision, ICCV, 1381-1388 (2009)
[12] Han, X.; Gao, C.; Yu, Y., DeepSketch2Face: a deep learning based sketching system for 3D face and caricature modeling (2017), URL:
[13] Hasler, N.; Stoll, C.; Sunkel, M.; Rosenhahn, B.; Seidel, H. P., A statistical model of human pose and body shape, Comput. Graph. Forum, 28 (2009)
[14] He, K.; Zhang, X.; Ren, S.; Sun, J., Deep residual learning for image recognition, (CVPR (2015))
[15] Hilton, A.; Beresford, D. J.; Gentils, T.; Smith, R. S.; Sun, W.; Illingworth, J., Whole-body modelling of people from multiview images to populate virtual worlds, Vis. Comput., 16, 411-436 (2000) · Zbl 1009.68977
[16] Huang, H.; Kalogerakis, E.; Yumer, E.; Mech, R., Shape synthesis from sketches via procedural models and convolutional networks, IEEE Trans. Vis. Comput. Graph., 23 (2017)
[17] Huang, G.; Liu, Z.; van der Maaten, L.; Weinberger, K. Q., Densely connected convolutional networks, CVPR, 2261-2269 (2017)
[18] Igarashi, T.; Matsuoka, S.; Tanaka, H., Teddy: a sketching interface for 3D freeform design, (Rockwood, A., Siggraph 1999 (1999), Addison Wesley Longman: Addison Wesley Longman Los Angeles), 409-416, URL:
[19] Jain, A.; Thormählen, T.; Seidel, H. P.; Theobalt, C., Moviereshape: tracking and reshaping of humans in videos, ACM Trans. Graph., 29, Article 148 pp. (2010)
[20] Jégou, S.; Drozdzal, M.; Vázquez, D.; Romero, A.; Bengio, Y., The one hundred layers tiramisu: fully convolutional densenets for semantic segmentation (2016), CoRR
[21] Kanazawa, A.; Black, M. J.; Jacobs, D. W.; Malik, J., End-to-end recovery of human shape and pose (2017), URL:
[22] Kingma, D. P.; Ba, J., Adam: a method for stochastic optimization (2014), CoRR
[23] Lassner, C.; Romero, J.; Kiefel, M.; Bogo, F.; Black, M. J.; Gehler, P. V., Unite the people: closing the loop between 3D and 2D human representations, (IEEE Conf. on Computer Vision and Pattern Recognition. IEEE Conf. on Computer Vision and Pattern Recognition, CVPR (2017)), URL:
[24] Lee, W. S.; Gu, J.; Magnenat-Thalmann, N., Generating animatable 3D virtual humans from photographs, Comput. Graph. Forum, 19, 1-10 (2000)
[25] Li, H.; Vouga, E.; Gudym, A.; Luo, L.; Barron, J. T.; Gusev, G., 3d self-portraits, ACM Trans. Graph., 32, Article 187 pp. (2013)
[26] Loper, M.; Mahmood, N.; Romero, J.; Pons-Moll, G.; Black, M. J., Smpl: a skinned multi-person linear model, ACM Trans. Graph., 34, Article 248 pp. (2015)
[27] Meng, M.; Xia, J.; Luo, J.; He, Y., Unsupervised co-segmentation for 3D shapes using iterative multi-label optimization, Comput. Aided Des., 45 (2013)
[28] Nealen, A.; Igarashi, T.; Sorkine, O.; Alexa, M., Fibermesh: designing freeform surfaces with 3D curves, ACM Trans. Graph., 26 (2007)
[29] Nealen, A.; Sorkine, O.; Alexa, M.; Cohen-Or, D., A sketch-based interface for detail-preserving mesh editing, ACM Trans. Graph. (Proc. of ACM SIGGRAPH), 24, 1142-1147 (2005)
[30] Nishida, G.; Garcia-Dorado, I.; Aliaga, D. G.; Benes, B.; Bousseau, A., Interactive sketching of urban procedural models, ACM Trans. Graph. (SIGGRAPH Conf. Proc.), 35, 130:1-130:11 (2016)
[31] Pavlakos, G.; Zhu, L.; Zhou, X.; Daniilidis, K., Learning to estimate 3D human pose and shape from a single color image, (CVPR (2018))
[32] Pishchulin, L.; Insafutdinov, E.; Tang, S.; Andres, B.; Andriluka, M.; Gehler, P.; Schiele, B., DeepCut: joint subset partition and labeling for multi person pose estimation, (IEEE Conference on Computer Vision and Pattern Recognition. IEEE Conference on Computer Vision and Pattern Recognition, CVPR (2016)), URL:
[33] Pishchulin, L.; Wuhrer, S.; Helten, T.; Theobalt, C.; Schiele, B., Building statistical shape spaces for 3D human modeling (2015), CoRR
[34] Rivers, A.; Durand, F.; Igarashi, T., 3D modeling with silhouettes, ACM Trans. Graph., 29, Article 109 pp. (2010)
[35] Robinette, K. M.; Daanen, H. A.M.; Paquet, E., The CAESAR project: a 3-D surface anthropometry survey, (3DIM99 (1999))
[36] Sigal, L.; Balan, A.; Black, M. J., Combined discriminative and generative articulated pose and non-rigid shape estimation, (Advances in Neural Information Processing Systems, vol. 20. Advances in Neural Information Processing Systems, vol. 20, NIPS-2007 (2008), MIT Press), 1337-1344
[37] Song, D.; Tong, R.; Chang, J.; Yang, X.; Tang, M.; Zhang, J. J., 3D body shapes estimation from dressed-human silhouettes, Comput. Graph. Forum, 35, 147-156 (2016)
[38] Srivastava, R. K.; Greff, K.; Schmidhuber, J., Training very deep networks, (Proceedings of the 28th International Conference on Neural Information Processing Systems, vol. 2 (2015)), 2377-2385
[39] Tong, T.; Li, G.; Liu, X.; Gao, Q., Image super-resolution using dense skip connections, ICCV, 4809-4817 (2017)
[40] Tong, J.; Zhou, J.; Liu, L.; Pan, Z.; Yan, H., Scanning 3D full human bodies using kinects, IEEE Trans. Vis. Comput. Graph., 18, 643-650 (2012)
[41] Varol, G.; Romero, J.; Martin, X.; Mahmood, N.; Black, M. J.; Laptev, I.; Schmid, C., Learning from synthetic humans (2017), CoRR
[42] Vlasic, D.; Peers, P.; Baran, I.; Debevec, P. E.; Popovic, J.; Rusinkiewicz, S.; Matusik, W., Dynamic shape capture using multi-view photometric stereo, ACM Trans. Graph., 28, Article 174 pp. (2009)
[43] Wang, C. C.L.; Wang, Y.; Chang, T. K.K.; Yuen, M. M.F., Virtual human modeling from photographs for garment industry, Comput. Aided Des., 35, 577-589 (2003)
[44] Yang, Y.; Yu, Y.; Zhou, Y.; Du, S.; Davis, J.; Yang, R., Semantic parametric reshaping of human body models, (3DV (Workshops) (2014), IEEE), 41-48, ISSN:978-1-4799-7000-1
[45] Yu, Y.; Zhou, K.; Xu, D.; Shi, X.; Bao, H.; Guo, B.; Shum, H. Y., Mesh editing with Poisson-based gradient field manipulation, ACM Trans. Graph., 23, 644-651 (2004)
[46] Zhou, S.; Fu, H.; Liu, L.; Cohen-Or, D.; Han, X., Parametric reshaping of human bodies in images, ACM Trans. Graph., 29, Article 126 pp. (2010)
[47] Zhu, S.; Mok, P., Predicting realistic and precise human body models under clothing based on orthogonal-view photos, Proc. Manuf., 3812-3819 (2015)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.