×

Prediction of time series by statistical learning: general losses and fast rates. (English) Zbl 06297673

Summary: We establish rates of convergences in statistical learning for time series forecasting. Using the PAC-Bayesian approach, slow rates of convergence \(\sqrt{d/n}\) for the Gibbs estimator under the absolute loss were given in a previous work [P. Alquier and O. Wintenberger, Bernoulli 18, No. 3, 883–913 (2012; Zbl 1243.62117)], where \(n\) is the sample size and d the dimension of the set of predictors. Under the same weak dependence conditions, we extend this result to any convex Lipschitz loss function. We also identify a condition on the parameter space that ensures similar rates for the classical penalized ERM procedure. We apply this method for quantile forecasting of the French GDP. Under additional conditions on the loss functions (satisfied by the quadratic loss function) and for uniformly mixing processes, we prove that the Gibbs estimator actually achieves fast rates of convergence \(d/n\). We discuss the optimality of these different rates pointing out references to lower bounds when they are available. In particular, these results bring a generalization the results of [A. Dalalyan and A. Tsybakov, “Aggregation by exponential weighting, sharp PAC-Bayesian bounds and sparsity”, Mach. Learn. 72, 39–61 (2008)] on sparse regression estimation to some autoregression.

MSC:

62M20 Inference from stochastic processes and prediction
60G25 Prediction theory (aspects of stochastic processes)
62M10 Time series, auto-correlation, regression, etc. in statistics (GARCH)
62P20 Applications of statistics to economics
68Q32 Computational learning theory
68T05 Learning and adaptive systems in artificial intelligence

Citations:

Zbl 1243.62117

Software:

R
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] A. Agarwal and J. C. Duchi, The generalization ability of online algorithms for dependent data, IEEE Trans. Inform. Theory 59 (2011), no. 1, 573-587.; · Zbl 1364.68372
[2] H. Akaike, Information theory and an extension of the maximum likelihood principle, 2nd International Symposium on Information Theory (B. N. Petrov and F. Csaki, eds.), Budapest: Akademia Kiado, 1973, pp. 267-281.; · Zbl 0283.62006
[3] P. Alquier and P. Lounici, PAC-Bayesian bounds for sparse regression estimation with exponential weights, Electron. J. Stat. 5 (2011), 127-145.; · Zbl 1274.62463
[4] P. Alquier, PAC-Bayesian bounds for randomized empirical risk minimizers, Math. Methods Statist. 17 (2008), no. 4, 279-304.; · Zbl 1260.62038
[5] K. B. Athreya and S. G. Pantula, Mixing properties of Harris chains and autoregressive processes, J. Appl. Probab. 23 (1986), no. 4, 880-892. MR 867185 (88c:60127); · Zbl 0623.60087
[6] J.-Y. Audibert, Fast rates in statistical inference through aggregation, Ann. Statist. 35 (2007), no. 2, 1591-1646.; · Zbl 1360.62167
[7] P. Alquier and O. Wintenberger, Model selection for weakly dependent time series forecasting, Bernoulli 18 (2012), no. 3, 883-193.; · Zbl 1243.62117
[8] G. Biau, O. Biau, and L. Rouvière, Nonparametric forecasting of the manufacturing output growth with firm-level survey data, Journal of Business Cycle Measurement and Analysis 3 (2008), 317-332.;
[9] A. Belloni and V. Chernozhukov, L1-penalized quantile regression in high-dimensional sparse models, Ann. Statist. 39 (2011), no. 1, 82-130.; · Zbl 1209.62064
[10] P. Brockwell and R. Davis, Time series: Theory and methods (2nd edition), Springer, 2009.; · Zbl 1169.62074
[11] E. Britton, P. Fisher, and J. Whitley, The inflation report projections: Understanding the fan chart, Bank of England Quarterly Bulletin 38 (1998), no. 1, 30-37.;
[12] L. Birgé and P. Massart, Gaussian model selection, J. Eur. Math. Soc. 3 (2001), no. 3, 203-268.; · Zbl 1037.62001
[13] G. Biau and B. Patra, Sequential quantile prediction of time series, IEEE Trans. Inform. Theory 57 (2011), 1664- 1674.; · Zbl 1366.62171
[14] F. Bunea, A. B. Tsybakov, and M. H. Wegkamp, Aggregation for gaussian regression, Ann. Statist. 35 (2007), no. 4, 1674-1697.; · Zbl 1209.62065
[15] O. Catoni, A PAC-Bayesian approach to adaptative classification, preprint (2003).;
[16] O. Catoni, Statistical learning theory and stochastic optimization, Springer Lecture Notes in Mathematics, 2004.; · Zbl 1076.93002
[17] O. Catoni, PAC-Bayesian supervised classification (the thermodynamics of statistical learning), Lecture Notes- Monograph Series, vol. 56, IMS, 2007.; · Zbl 1277.62015
[18] N. Cesa-Bianchi and G. Lugosi, Prediction, learning, and games, Cambridge University Press, New York, 2006.; · Zbl 1114.91001
[19] L. Clavel and C. Minodier, A monthly indicator of the french business climate, Documents de Travail de la DESE, 2009.;
[20] M. Cornec, Constructing a conditional gdp fan chart with an application to french business survey data, 30th CIRET Conference, New York, 2010.;
[21] N. V. Cuong, L. S. Tung Ho, and V. Dinh, Generalization and robustness of batched weighted average algorithm with v-geometrically ergodic markov data, Proceedings of ALT’13 (Jain S., R. Munos, F. Stephan, and T. Zeugmann, eds.), Springer, 2013, pp. 264-278.; · Zbl 1406.68095
[22] J. C. Duchi, A. Agarwal, M. Johansson, and M. I. Jordan, Ergodic mirror descent, SIAM J. Optim. 22 (2012), no. 4, 1549-1578.; · Zbl 1262.90114
[23] J. Dedecker, P. Doukhan, G. Lang, J. R. León, S. Louhichi, and C. Prieur, Weak dependence, examples and applications, Lecture Notes in Statistics, vol. 190, Springer-Verlag, Berlin, 2007.; · Zbl 1165.62001
[24] M. Devilliers, Les enquêtes de conjoncture, Archives et Documents, no. 101, INSEE, 1984.;
[25] E. Dubois and E. Michaux, étalonnages à l’aide d’enquêtes de conjoncture: de nouvaux résultats, Économie et Prévision, no. 172, INSEE, 2006.;
[26] P. Doukhan, Mixing, Lecture Notes in Statistics, Springer, New York, 1994.;
[27] K. Dowd, The inflation fan charts: An evaluation, Greek Economic Review 23 (2004), 99-111.;
[28] A. Dalalyan and J. Salmon, Sharp oracle inequalities for aggregation of affine estimators, Ann. Statist. 40 (2012), no. 4, 2327-2355.; · Zbl 1257.62038
[29] A. Dalalyan and A. Tsybakov, Aggregation by exponential weighting, sharp PAC-Bayesian bounds and sparsity, Mach. Learn. 72 (2008), 39-61.; · Zbl 1470.62054
[30] F. X. Diebold, A. S. Tay, and K. F. Wallis, Evaluating density forecasts of inflation: the survey of professional forecasters, Discussion Paper No.48, ESRC Macroeconomic Modelling Bureau, University of Warwick and Working Paper No.6228, National Bureau of Economic Research, Cambridge, Mass., 1997.;
[31] M. D. Donsker and S. S. Varadhan, Asymptotic evaluation of certain markov process expectations for large time. iii., Comm. Pure Appl. Math. 28 (1976), 389-461.; · Zbl 0348.60032
[32] P. Doukhan and O. Wintenberger, Weakly dependent chain with infinite memory, Stochastic Process. Appl. 118 (2008), no. 11, 1997-2013.; · Zbl 1166.60031
[33] R. F. Engle, Autoregressive conditional heteroscedasticity with estimates of variance of united kingdom inflation, Econometrica 50 (1982), 987-1008.; · Zbl 0491.62099
[34] C. Francq and J.-M. Zakoian, Garch models: Structure, statistical inference and financial applications, Wiley- Blackwell, 2010.; · Zbl 1431.62004
[35] S. Gerchinovitz, Sparsity regret bounds for individual sequences in online linear regression, Proceedings of COLT’11, 2011.; · Zbl 1320.62165
[36] J. Hamilton, Time series analysis, Princeton University Press, 1994.; · Zbl 0831.62061
[37] H. Hang and I. Steinwart, Fast learning from α-mixing observations, Technical report, Fakultät für Mathematik und Physik, Universität Stuttgart, 2012.; · Zbl 1359.62242
[38] I. A. Ibragimov, Some limit theorems for stationary processes, Theory Probab. Appl. 7 (1962), no. 4, 349-382.; · Zbl 0119.14204
[39] A. B. Juditsky, A. V. Nazin, A. B. Tsybakov, and N. Vayatis, Recursive aggregation of estimators bythe mirror descent algorithm with averaging, Probl. Inf. Transm. 41 (2005), no. 4, 368-384.; · Zbl 1123.62044
[40] A. B. Juditsky, P. Rigollet, and A. B. Tsybakov, Learning my mirror averaging, Ann. Statist. 36 (2008), no. 5, 2183-2206.; · Zbl 1274.62288
[41] R. Koenker and G. Jr. Bassett, Regression quantiles, Econometrica 46 (1978), 33-50.; · Zbl 0373.62038
[42] R. Koenker, Quantile regression, Cambridge University Press, Cambridge, 2005.; · Zbl 1111.62037
[43] S. Kullback, Information theory and statistics, Wiley, New York, 1959.; · Zbl 0088.10406
[44] N. Littlestone and M.K. Warmuth, The weighted majority algorithm, Information and Computation 108 (1994), 212-261.; · Zbl 0804.68121
[45] P. Massart, Concentration inequalities and model selection - ecole d’été de probabilités de saint-flour xxxiii - 2003, Lecture Notes in Mathematics - J. Picard Editor, vol. 1896, Springer, 2007.; · Zbl 1170.60006
[46] D. A. McAllester, PAC-Bayesian model averaging, Procs. of of the 12th Annual Conf. On Computational Learning Theory, Santa Cruz, California (Electronic), ACM, New-York, 1999, pp. 164-170.;
[47] R. Meir, Nonparametric time series prediction through adaptive model selection, Mach. Learn. 39 (2000), 5-34.; · Zbl 0954.68124
[48] C. Minodier, Avantages comparés des séries premières valeurs publiées et des séries des valeurs révisées, Documents de Travail de la DESE, 2010.;
[49] D. S. Modha and E. Masry, Memory-universal prediction of stationary random processes, IEEE Trans. Inform. Theory 44 (1998), no. 1, 117-133.; · Zbl 0938.62106
[50] S. P. Meyn and R. L. Tweedie, Markov chains and stochastic stability, Communications and Control Engineering Series, Springer-Verlag London Ltd., London, 1993. MR 1287609 (95j:60103); · Zbl 0925.60001
[51] A. Nemirovski, Topics in nonparametric statistics, Lectures on Probability Theory and Statistics - Ecole d’ét’e de probagilités de Saint-Flour XXVIII (P. Bernard, ed.), Springer, 2000, pp. 85-277.;
[52] R Development Core Team, R: A language and environment for statistical computing, R Foundation for Statistical Computing, Vienna, 2008.;
[53] E. Rio, Ingalités de Hoeffding pour les fonctions lipschitziennes de suites dépendantes, C. R. Math. Acad. Sci. Paris 330 (2000), 905-908.; · Zbl 0961.60032
[54] P.-M. Samson, Concentration of measure inequalities for markov chains and φ-mixing processes, Ann. Probab. 28 (2000), no. 1, 416-461.; · Zbl 1044.60061
[55] I. Steinwart and A. Christmann, Fast learning from non-i.i.d. observations, Advances in Neural Information Processing Systems 22 (Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, eds.), 2009, pp. 1768-1776.;
[56] I. Steinwart, D. Hush, and C. Scovel, Learning from dependent observations, J. Multivariate Anal. 100 (2009), 175-194.; · Zbl 1158.68040
[57] Y. Seldin, F. Laviolette, N. Cesa-Bianchi, J. Shawe-Taylor, J. Peters, and P. Auer, Pac-bayesian inequalities for martingales, IEEE Trans. Inform. Theory 58 (2012), no. 12, 7086-7093.; · Zbl 1364.60030
[58] A. Sanchez-Perez, Time series prediction via aggregation : an oracle bound including numerical cost, Preprint arXiv:1311.4500, 2013.;
[59] G. Stoltz, Agrégation séquentielle de prédicteurs : méthodologie générale et applications à la prévision de la qualité de l’air et à celle de la consommation électrique, Journal de la SFDS 151 (2010), no. 2, 66-106.; · Zbl 1316.62169
[60] J. Shawe-Taylor and R. Williamson, A PAC analysis of a bayes estimator, Proceedings of the Tenth Annual Conference on Computational Learning Theory, COLT’97, ACM, 1997, pp. 2-9.;
[61] N. N. Taleb, Black swans and the domains of statistics, Amer. Statist. 61 (2007), no. 3, 198-200.;
[62] A. S. Tay and K. F. Wallis, Density forecasting: a survey, J. Forecast 19 (2000), 235-254.;
[63] V. Vapnik, The nature of statistical learning theory, Springer, 1999.; · Zbl 0833.62008
[64] V.G. Vovk, Aggregating strategies, Proceedings of the 3rd Annual Workshop on Computational Learning Theory (COLT), 1990, pp. 372-283.;
[65] O. Wintenberger, Deviation inequalities for sums of weakly dependent time series, Electron. Commun. Probab. 15 (2010), 489-503.; · Zbl 1225.60034
[66] Y.-L. Xu and D.-R. Chen, Learning rate of regularized regression for exponentially strongly mixing sequence, J. Statist. Plann. Inference 138 (2008), 2180-2189.; · Zbl 1134.62050
[67] B. Zou, L. Li, and Z. Xu, The generalization performance of erm algorithm with strongly mixing observations, Mach. Learn. 75 (2009), 275-295.; · Zbl 1470.68214
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.