×

Nonnegative tensor-based linear dynamical systems for recognizing human action from 3D skeletons. (English) Zbl 1435.92008

Summary: Recently, skeleton-based action recognition has become a very important topic in the field of computer vision. It is a challenging task to accurately build a human action model and precisely distinguish similar human actions. In this paper, an action (skeleton sequence) is represented as a third-order nonnegative tensor time series to capture the original spatiotemporal information of the action. As a linear dynamical system (LDS) is an efficient tool for encoding the spatiotemporal data in various disciplines, this paper proposes a nonnegative tensor-based LDS (nLDS) to model the third-order nonnegative tensor time series. Nonnegative Tucker decomposition (NTD) is utilized to estimate the parameters of the nLDS model. These parameters are used to build extended observability sequence \(\mathbf{O}_{\infty}^T\) for the action, which implies that \(\mathbf{O}_{\infty}^T\) can be considered as the feature descriptor of the action. To avoid the limitations introduced by approximating \(\mathbf{O}_{\infty}^T\) with a finite-order matrix, we represent an action as a point on infinite Grassmann manifold comprising the orthonormalized extended observability sequences. The classification task can be performed by dictionary learning and sparse coding on the infinite Grassmann manifold. The experimental results on the MSR-Action3D, UTKinect-Action, and G3D-Gaming datasets demonstrate that the proposed approach achieves a better performance in comparison with the state-of-the-art methods.

MSC:

92C10 Biomechanics
15A18 Eigenvalues, singular values, and eigenvectors
94A08 Image processing (compression, reconstruction, etc.) in information and communication theory

Software:

G3D
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Fothergill, S.; Mentis, H. M.; Kohli, P.; Nowozin, S., Instructing people for training gestural interactive systems, Proceedings of the 30th ACM Conference on Human Factors in Computing Systems, CHI 2012
[2] Bloom, V.; Makris, D.; Argyriou, V., G3D: A gaming action dataset and real time action recognition evaluation framework, Proceedings of the 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2012
[3] Lao, W.; Han, J.; de With, P. H. N., Automatic video-based human motion analyzer for consumer surveillance system, IEEE Transactions on Consumer Electronics, 55, 2, 591-598 (2009) · doi:10.1109/TCE.2009.5174427
[4] Aggarwal, J. K.; Ryoo, M. S., Human activity analysis: a review, ACM Computing Surveys, 43, 3, 1-43 (2011)
[5] Shotton, J.; Fitzgibbon, A.; Cook, M.; Sharp, T.; Finocchio, M.; Moore, R.; Kipman, A.; Blake, A., Real-time human pose recognition in parts from single depth images, Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2011
[6] Ding, W.; Liu, K.; Belyaev, E.; Cheng, F., Tensor-based linear dynamical systems for action recognition from 3D skeletons, Pattern Recognition, 75-86 (2017) · doi:10.1016/j.patcog.2017.12.004
[7] Turaga, P.; Veeraraghavan, A.; Srivastava, A.; Chellappa, R., Statistical analysis on manifolds and its applications to video analysis, Studies in Computational Intelligence, 287, 115-144 (2010)
[8] Slama, R.; Wannous, H.; Daoudi, M.; Srivastava, A., Accurate 3D action recognition using learning on the Grassmann manifold, Pattern Recognition, 48, 2, 556-567 (2015) · doi:10.1016/j.patcog.2014.08.011
[9] Golub, G. H.; Van Loan, C. F., Matrix Computations (1989), Jon Hopkins University Press · Zbl 0733.65016
[10] Tucker, L. R., Some mathematical notes on three-mode factor analysis, Psychometrika, 31, 3, 279-311 (1966) · doi:10.1007/BF02289464
[11] Kim, Y. D.; Choi, S., Nonnegative tucker decomposition, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition(CVPR), IEEE
[12] Vemulapalli, R.; Arrate, F.; Chellappa, R., Human action recognition by representing 3D skeletons as points in a lie group, Proceedings of the 27th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014
[13] Yang, X.; Tian, Y., Effective 3D action recognition using Eigen Joints, Journal of Visual Communication and Image Representation, 25, 1, 2-11 (2014) · doi:10.1016/j.jvcir.2013.03.001
[14] Lee, I.; Kim, D.; Kang, S.; Lee, S., Ensemble deep learning for skeleton-based action recognition using temporal sliding LSTM networks, Proceedings of the 16th IEEE International Conference on Computer Vision, ICCV 2017
[15] Tanfous, A. B.; Drira, H.; Amor, B. B., Coding kendall’s shape trajectories for 3D action recognition, Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE · doi:10.1109/CVPR.2018.00300
[16] Carbonera Luvizon, D.; Tabia, H.; Picard, D., Learning features combination for human action recognition from skeleton sequences, Pattern Recognition Letters, 99, 13-20 (2017) · doi:10.1016/j.patrec.2017.02.001
[17] Liu, J.; Shahroudy, A.; Xu, D.; Wang, G., Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition, Proceedings of the European Conference on Computer Vision(ECCV
[18] Wang, C.; Wang, Y.; Yuille, A. L., Mining 3D key-pose-motifs for action recognition, Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016
[19] Nie, S.; Ji, Q., Capturing global and local dynamics for human action recognition, Proceedings of the 22nd International Conference on Pattern Recognition, ICPR 2014
[20] Vemulapalli, R.; Chellappa, R., Rolling rotations for recognizing human actions from 3D skeletal data, Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016
[21] Huang, Z.; Wan, C.; Probst, T.; Van Gool, L., Deep learning on lie groups for skeleton-based action recognition, Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017
[22] Wang, J.; Liu, Z.; Wu, Y.; Yuan, J., Learning actionlet ensemble for 3D human action recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, 36, 5, 914-927 (2014) · doi:10.1109/TPAMI.2013.198
[23] Rahmani, H.; Mian, A., Learning a non-linear knowledge transfer model for cross-view action recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015
[24] Tang, J. K.; Leung, H., Retrieval of logically relevant 3D human motions by Adaptive Feature Selection with Graded Relevance Feedback, Pattern Recognition Letters, 33, 4, 420-430 (2012) · doi:10.1016/j.patrec.2011.06.005
[25] Koniusz, P.; Cherian, A.; Porikli, F., Tensor representations via kernel linearization for action recognition from 3D skeletons, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics): Preface, 9908, 37-53 (2016)
[26] Zhu, W.; Lan, C.; Xing, J.; Zeng, W.; Li, Y.; Shen, L.; Xie, X., Co-Occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks, Proceedings of the 30th AAAI Conference on Artificial Intelligence, AAAI 2016
[27] Ding, W.; Liu, K.; Fu, X.; Cheng, F., Profile HMMs for skeleton-based human action recognition, Signal Processing: Image Communication, 42, 109-119 (2016) · doi:10.1016/j.image.2016.01.010
[28] Kohonen, T., The self-organizing map, Neurocomputing, 21, 1-3, 1-6 (1998) · Zbl 0917.68176 · doi:10.1016/s0925-2312(98)00030-7
[29] Eddy, S. R., Profile hidden Markov models, Bioinformatics, 14, 9, 755-763 (1998) · doi:10.1093/bioinformatics/14.9.755
[30] Doretto, G.; Chiuso, A.; Wu, Y. N.; Soatto, S., Dynamic textures, International Journal of Computer Vision, 51, 2, 91-109 (2003) · Zbl 1030.68646 · doi:10.1023/a:1021669406132
[31] Bissacco, A.; Chiuso, A.; Ma, Y.; Soatto, S., Recognition of human gaits, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition
[32] Comaniciu, D.; Meer, P., Mean shift: a robust approach toward feature space analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence, 24, 5, 603-619 (2002) · doi:10.1109/34.1000236
[33] Edelman, A.; Arias, R.; Smith, S., The geometry of algorithms with orthogonal constraints, Siam Journal on Matrix Analysis and Applications, 20, 2, 303-353 (1999) · Zbl 0928.65050
[34] Huang, W.; Sun, F.; Cao, L.; Zhao, D.; Liu, H.; Harandi, M., Sparse coding and dictionary learning with linear dynamical systems, Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016
[35] Ye, K.; Lim, L.-H., Schubert varieties and distances between subspaces of different dimensions, SIAM Journal on Matrix Analysis and Applications, 37, 3, 1407-1427 (2014) · Zbl 1365.14065 · doi:10.1137/15M1054201
[36] Li, W.; Zhang, Z.; Liu, Z., Action recognition based on a bag of 3D points, Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops, CVPRW 2010
[37] Xia, L.; Chen, C.-C.; Aggarwal, J. K., View invariant human action recognition using histograms of 3D joints, Proceedings of the 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2012
[38] Zhou, G.; Cichocki, A.; Zhao, Q.; Xie, S., Efficient nonnegative tucker decompositions: algorithms and uniqueness, IEEE Transactions on Image Processing, 24, 12, 4990-5003 (2015) · Zbl 1408.94841 · doi:10.1109/TIP.2015.2478396
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.