Learning with tree tensor networks: complexity estimates and model selection. (English) Zbl 07526570

Summary: In this paper, we propose and analyze a model selection method for tree tensor networks in an empirical risk minimization framework and analyze its performance over a wide range of smoothness classes. Tree tensor networks, or tree-based tensor formats, are prominent model classes for the approximation of high-dimensional functions in numerical analysis and data science. They correspond to sum-product neural networks with a sparse connectivity associated with a dimension partition tree \(T\), widths given by a tuple \(r\) of tensor ranks, and multilinear activation functions (or units). The approximation power of these model classes has been proved to be optimal (or near to optimal) for classical smoothness classes. However, in an empirical risk minimization framework with a limited number of observations, the dimension tree \(T\) and ranks \(r\) should be selected carefully to balance estimation and approximation errors. In this paper, we propose a complexity-based model selection strategy à la Barron, Birgé, Massart. Given a family of model classes associated with different trees, ranks, tensor product feature spaces and sparsity patterns for sparse tensor networks, a model is selected by minimizing a penalized empirical risk, with a penalty depending on the complexity of the model class. After deriving bounds of the metric entropy of tree tensor networks with bounded parameters, we deduce a form of the penalty from bounds on suprema of empirical processes. This choice of penalty yields a risk bound for the predictor associated with the selected model. In a least-squares setting, after deriving fast rates of convergence of the risk, we show that the proposed strategy is (near to) minimax adaptive to a wide range of smoothness classes including Sobolev or Besov spaces (with isotropic, anisotropic or mixed dominating smoothness) and analytic functions. We discuss the role of sparsity of the tensor network for obtaining optimal performance in several regimes. In practice, the amplitude of the penalty is calibrated with a slope heuristics method. Numerical experiments in a least-squares regression setting illustrate the performance of the strategy for the approximation of multivariate functions and univariate functions identified with tensors by tensorization (quantization).


62Gxx Nonparametric inference
41Axx Approximations and expansions
42Cxx Nontrigonometric harmonic analysis


Full Text: DOI arXiv Link


[1] Ali, M. and Nouy, A. (2020). Approximation theory of tree tensor networks: Tensorized univariate functions. Part I. Preprint. Available at arxiv:2007.00118.
[2] Ali, M. and Nouy, A. (2020). Approximation theory of tree tensor networks: Tensorized univariate functions. Part II. Preprint. Available at arxiv:2007.00128.
[3] Ali, M. and Nouy, A. (2021). Approximation theory of tree tensor networks: Tensorized multivariate functions. Preprint. Available at arXiv:2101.11932.
[4] Arlot, S. (2019). Minimal penalties and the slope heuristics: A survey. J. SFdS 160 1-106. · Zbl 1437.62121
[5] Bachmayr, M., Nouy, A. and Schneider, R. (2021). Approximation power of tree tensor networks for compositional functions. Preprint.
[6] Bachmayr, M., Schneider, R. and Uschmajew, A. (2016). Tensor networks and hierarchical tensors for the solution of high-dimensional partial differential equations. Found. Comput. Math. 16 1423-1472. · Zbl 1357.65153 · doi:10.1007/s10208-016-9317-9
[7] Baudry, J.-P., Maugis, C. and Michel, B. (2012). Slope heuristics: Overview and implementation. Stat. Comput. 22 455-470. · Zbl 1322.62007 · doi:10.1007/s11222-011-9236-1
[8] Belitser, E. (1998). Efficient estimation of analytic density under random censorship. Bernoulli 4 519-543. · Zbl 1037.62025 · doi:10.2307/3318664
[9] Birgé, L. and Massart, P. (2007). Minimal penalties for Gaussian model selection. Probab. Theory Related Fields 138 33-73. · Zbl 1112.62082 · doi:10.1007/s00440-006-0011-8
[10] Bousquet, O., Boucheron, S. and Lugosi, G. (2004). Introduction to statistical learning theory. In Advanced Lectures on Machine Learning 169-207. Berlin: Springer. · Zbl 1120.68428
[11] Cichocki, A., Lee, N., Oseledets, I., Phan, A.-H., Zhao, Q. and Mandic, D. (2016). Tensor networks for dimensionality reduction and large-scale optimization: Part 1 low-rank tensor decompositions. Found. Trends Mach. Learn. 9 249-429. · Zbl 1364.68320
[12] Cichocki, A., Phan, A.-H., Zhao, Q., Lee, N., Oseledets, I., Sugiyama, M. and Mandic, D. (2017). Tensor networks for dimensionality reduction and large-scale optimization: Part 2 applications and future perspectives. Found. Trends Mach. Learn. 9 431-673. · Zbl 1376.68123
[13] Cohen, N., Sharir, O. and Shashua, A. (2016). On the expressive power of deep learning: A tensor analysis. In Conference on Learning Theory 698-728.
[14] DeVore, R.A. and Popov, V.A. (1988). Interpolation of Besov spaces. Trans. Amer. Math. Soc. 305 397-414. · Zbl 0646.46030 · doi:10.2307/2001060
[15] Donoho, D.L. and Johnstone, I.M. (1998). Minimax estimation via wavelet shrinkage. Ann. Statist. 26 879-921. · Zbl 0935.62041 · doi:10.1214/aos/1024691081
[16] Donoho, D.L., Johnstone, I.M., Kerkyacharian, G. and Picard, D. (1996). Density estimation by wavelet thresholding. Ann. Statist. 24 508-539. · Zbl 0860.62032 · doi:10.1214/aos/1032894451
[17] Dũng, D., Temlyakov, V. and Ullrich, T. (2018). Hyperbolic Cross Approximation. Advanced Courses in Mathematics. CRM Barcelona. Cham: Birkhäuser/Springer. · Zbl 1414.41001 · doi:10.1007/978-3-319-92240-9
[18] Falcó, A., Hackbusch, W. and Nouy, A. (2021). Tree-based tensor formats. SeMA J. 78 159-173. · Zbl 1479.15026 · doi:10.1007/s40324-018-0177-x
[19] Giné, E. and Nickl, R. (2016). Mathematical Foundations of Infinite-Dimensional Statistical Models. Cambridge Series in Statistical and Probabilistic Mathematics 40. New York: Cambridge Univ. Press. · Zbl 1358.62014 · doi:10.1017/CBO9781107337862
[20] Grelier, E., Nouy, A. and Chevreuil, M. (2018). Learning with tree-based tensor formats. Preprint. Available at arXiv:1811.04455.
[21] Grelier, E., Nouy, A. and Lebrun, R. (2019). Learning high-dimensional probability distributions using tree tensor networks. Preprint. Available at arXiv:1912.07913.
[22] Gribonval, R., Kutyniok, G., Nielsen, M. and Voigtlaender, F. (2019). Approximation spaces of deep neural networks. Preprint. Available at arXiv:1905.01208. · Zbl 1491.82017
[23] Griebel, M. and Harbrecht, H. (2019). Analysis of tensor approximation schemes for continuous functions. Preprint. Available at arXiv:1903.04234.
[24] Györfi, L., Kohler, M., Krzyżak, A. and Walk, H. (2006). A Distribution-Free Theory of Nonparametric Regression. Springer Series in Statistics. New York: Springer. · Zbl 1021.62024 · doi:10.1007/b97848
[25] Hackbusch, W. (2012). Tensor Spaces and Numerical Tensor Calculus. Springer Series in Computational Mathematics 42. Heidelberg: Springer. · Zbl 1244.65061 · doi:10.1007/978-3-642-28027-6
[26] Hansen, M. and Sickel, W. (2010). Best \(m\)-term approximation and tensor product of Sobolev and Besov spaces—The case of non-compact embeddings. East J. Approx. 16 345-388. · Zbl 1234.41010
[27] Hansen, M. and Sickel, W. (2012). Best \(m\)-term approximation and Sobolev-Besov spaces of dominating mixed smoothness—The case of compact embeddings. Constr. Approx. 36 1-51. · Zbl 1251.41005 · doi:10.1007/s00365-012-9161-3
[28] Kazeev, V., Oseledets, I., Rakhuba, M. and Schwab, C. (2017). QTT-finite-element approximation for multiscale problems I: Model problems in one dimension. Adv. Comput. Math. 43 411-442. · Zbl 1375.65152 · doi:10.1007/s10444-016-9491-y
[29] Kazeev, V. and Schwab, C. (2015). Approximation of singularities by quantized-tensor fem. PAMM 15 743-746.
[30] Khoromskij, B. (2012). Tensors-structured numerical methods in scientific computing: Survey on recent advances. Chemom. Intell. Lab. Syst. 110 1-19.
[31] Khrulkov, V., Novikov, A. and Oseledets, I. (2018). Expressive power of recurrent neural networks. In International Conference on Learning Representations. · Zbl 1453.65095
[32] Koltchinskii, V. (2011). Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems: Ecole d’Eté de Probabilités de Saint-Flour XXXVIII-2008. Lecture Notes in Math. 2033. Heidelberg: Springer. · Zbl 1223.91002 · doi:10.1007/978-3-642-22147-7
[33] Leisner, C. (2003). Nonlinear wavelet approximation in anisotropic Besov spaces. Indiana Univ. Math. J. 52 437-455. · Zbl 1072.42028 · doi:10.1512/iumj.2003.52.2224
[34] Massart, P. (2007). Concentration Inequalities and Model Selection. Lecture Notes in Math. 1896. Berlin: Springer. · Zbl 1170.60006
[35] Michel, B. and Nouy, A. (2022). Supplement to “Learning with tree tensor networks: Complexity estimates and model selection.” · Zbl 07526570 · doi:10.3150/21-BEJ1371SUPP
[36] Neumann, M.H. (2000). Multivariate wavelet thresholding in anisotropic function spaces. Statist. Sinica 10 399-431. · Zbl 0982.62039
[37] Nouy, A. (2017). Low-rank methods for high-dimensional approximation and model order reduction. In Model Reduction and Approximation. Comput. Sci. Eng. 15 171-226. Philadelphia, PA: SIAM. · Zbl 1378.65010 · doi:10.1137/1.9781611974829.ch4
[38] Recht, B., Fazel, M. and Parrilo, P.A. (2010). Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev. 52 471-501. · Zbl 1198.90321 · doi:10.1137/070697835
[39] Schneider, R. and Uschmajew, A. (2014). Approximation rates for the hierarchical tensor format in periodic Sobolev spaces. J. Complexity 30 56-71. · Zbl 1329.41033 · doi:10.1016/j.jco.2013.10.001
[40] Signoretto, M., De Lathauwer, L. and Suykens, J.A. (2010). Nuclear norms for tensors and their use for convex multilinear estimation. Linear Algebra Appl. 43. To appear.
[41] Suzuki, T. (2018). Adaptivity of deep relu network for learning in besov and mixed smooth besov spaces: Optimal rate and curse of dimensionality. Preprint. Available at arXiv:1810.08033.
[42] Vapnik, V.N. (2013). The Nature of Statistical Learning Theory. New York: Springer. · doi:10.1007/978-1-4757-2440-0
[43] Yuan, M. and Zhang, C.-H. (2016). On tensor completion via nuclear norm minimization. Found. Comput. Math. 16 1031-1068 · Zbl 1378.90066 · doi:10.1007/s10208-015-9269-5
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.