×

Prediction bounds for higher order total variation regularized least squares. (English) Zbl 1486.62205

Summary: We establish adaptive results for trend filtering: least squares estimation with a penalty on the total variation of \((k-1)\)th order differences. Our approach is based on combining a general oracle inequality for the \({\ell_1}\)-penalized least squares estimator with “interpolating vectors” to upper bound the “effective sparsity”. This allows one to show that the \({\ell_1}\)-penalty on the \(k\)th order differences leads to an estimator that can adapt to the number of jumps in the \((k-1)\)th order differences of the underlying signal or an approximation thereof. We show the result for \(k\in \{1,2,3,4\}\) and indicate how it could be derived for general \(k\in \mathbb{N}\).

MSC:

62J05 Linear regression; mixed models
62J07 Ridge regression; shrinkage estimators (Lasso)
62J99 Linear inference, regression
62M20 Inference from stochastic processes and prediction
PDFBibTeX XMLCite
Full Text: DOI arXiv Link

References:

[1] Belloni, A., Chernozhukov, V. and Wang, L. (2011). Square-root lasso: Pivotal recovery of sparse signals via conic programming. Biometrika 98 791-806. · Zbl 1228.62083
[2] Boyer, C., De Castro, Y. and Salmon, J. (2017). Adapting to unknown noise level in sparse deconvolution. Inf. Inference 6 310-348. · Zbl 1383.62186
[3] Candès, E. J. and Fernandez-Granda, C. (2014). Towards a mathematical theory of super-resolution. Comm. Pure Appl. Math. 67 906-956. · Zbl 1350.94011
[4] Candès, E. J. and Plan, Y. (2011). A probabilistic and RIPless theory of compressed sensing. IEEE Trans. Inf. Theory 57 7235-7254. · Zbl 1365.94174
[5] Chatterjee, S. and Goswami, S. (2019). Adaptive estimation of multivariate piecewise polynomials and bounded variation functions by optimal decision trees. Preprint. Available at arXiv:1911.11562.
[6] Dalalyan, A. S., Hebiri, M. and Lederer, J. (2017). On the prediction performance of the Lasso. Bernoulli 23 552-581. · Zbl 1359.62295
[7] Donoho, D. L. and Johnstone, I. M. (1998). Minimax estimation via wavelet shrinkage. Ann. Statist. 26 879-921. · Zbl 0935.62041
[8] Elad, M., Milanfar, P. and Rubinstein, R. (2007). Analysis versus synthesis in signal priors. Inverse Probl. 23 947-968. · Zbl 1138.93055
[9] Guntuboyina, A., Lieu, D., Chatterjee, S. and Sen, B. (2020). Adaptive risk bounds in univariate total variation denoising and trend filtering. Ann. Statist. 48 205-229. · Zbl 1439.62100
[10] Kim, S.-J., Koh, K., Boyd, S. and Gorinevsky, D. (2009). \[{l_1}\] trend filtering. SIAM Rev. 51 339-360. · Zbl 1171.37033
[11] Laurent, B. and Massart, P. (2000). Adaptive estimation of a quadratic functional by model selection. Ann. Statist. 28 1302-1338. · Zbl 1105.62328
[12] Mammen, E. and van de Geer, S. (1997). Locally adaptive regression splines. Ann. Statist. 25 387-413. · Zbl 0871.62040
[13] Ortelli, F. and van de Geer, S. (2020a). Adaptive rates for total variation image denoising. J. Mach. Learn. Res. 21 1-38. · Zbl 07306926
[14] Ortelli, F. and van de Geer, S. (2020b). Oracle inequalities for square root analysis estimators with application to total variation penalties. Inf. Inference iaaa002. · Zbl 1475.62211
[15] Ortelli, F. and van de Geer, S. (2021). Supplement to “Prediction bounds for higher order total variation regularized least squares.” https://doi.org/10.1214/21-AOS2054SUPP · Zbl 1475.62211
[16] Padilla, O. H. M. and Chatterjee, S. (2020). Adaptive quantile trend filtering. Preprint. Available at arXiv:2007.07472.
[17] Qian, J. and Jia, J. (2016). On stepwise pattern recovery of the fused Lasso. Comput. Statist. Data Anal. 94 221-237. · Zbl 1468.62161
[18] Rudin, L. I., Osher, S. and Fatemi, E. (1992). Nonlinear total variation based noise removal algorithms. Phys. D 60 259-268. · Zbl 0780.49028
[19] Sadhanala, V. and Tibshirani, R. J. (2019). Additive models with trend filtering. Ann. Statist. 47 3032-3068. · Zbl 1436.62450
[20] Sadhanala, V., Wang, Y.-X., Sharpnack, J. and Tibshirani, R. (2017). Higher-order total variation classes on grids: Minimax theory and trend filtering methods. In Advances in Neural Information Processing Systems 5800-5810.
[21] Steidl, G., Didas, S. and Neumann, J. (2006). Splines in higher order TV regularization. Int. J. Comput. Vis. 70 241-255. · Zbl 1477.68425
[22] Tang, G., Bhaskar, B. N. and Recht, B. (2015). Near minimax line spectral estimation. IEEE Trans. Inf. Theory 61 499-512. · Zbl 1359.94181
[23] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267-288. · Zbl 0850.62538
[24] Tibshirani, R. J. (2014). Adaptive piecewise polynomial estimation via trend filtering. Ann. Statist. 42 285-323. · Zbl 1307.62118
[25] Tibshirani, R. (2020). Divided differences, falling factorials, and discrete splines: Another look at trend filtering and related problems. Preprint. Available at arXiv:2003.03886.
[26] van de Geer, S. (2020). Logistic regression with total variation regularization. Trans. A. Razmadze Math. Inst. 174 217-233. · Zbl 1458.62153
[27] Wang, Y.-X., Smola, A. and Tibshirani, R. (2014). The falling factorial basis and its statistical applications. In International Conference on Machine Learning 730-738.
[28] Wang, Y.-X., Sharpnack, J., Smola, A. J. and Tibshirani, R. J. (2016). Trend filtering on graphs. J. Mach. Learn. Res. 17 Paper No. 105, 41. · Zbl 1369.62082
[29] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541-2563 · Zbl 1222.62008
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.