×

Estimating variances in time series kriging using convex optimization and empirical BLUPs. (English) Zbl 1477.62243

Summary: We revisit and update estimating variances, fundamental quantities in a time series forecasting approach called kriging, in time series models known as FDSLRMs, whose observations can be described by a linear mixed model (LMM). As a result of applying the convex optimization, we resolved two open problems in FDSLRM research: (1) theoretical existence and equivalence between two standard estimation methods – least squares estimators, non-negative (M)DOOLSE, and maximum likelihood estimators, (RE)MLE, (2) and a practical lack of free available computational implementation for FDSLRM. As for computing (RE)MLE in the case of \(n\) observed time series values, we also discovered a new algorithm of order \({\mathcal{O}}(n)\), which at the default precision is \(10^7\) times more accurate and \(n^2\) times faster than the best current Python(or R)-based computational packages, namely CVXPY, CVXR, nlme, sommer and mixed. The LMM framework led us to the proposal of a two-stage estimation method of variance components based on the empirical (plug-in) best linear unbiased predictions of unobservable random components in FDSLRM. The method, providing non-negative invariant estimators with a simple explicit analytic form and performance comparable with (RE)MLE in the Gaussian case, can be used for any absolutely continuous probability distribution of time series data. We illustrate our results via applications and simulations on three real data sets (electricity consumption, tourism and cyber security), which are easily available, reproducible, sharable and modifiable in the form of interactive Jupyter notebooks.

MSC:

62M10 Time series, auto-correlation, regression, etc. in statistics (GARCH)
62-08 Computational methods for problems pertaining to statistics
62J05 Linear regression; mixed models
62J10 Analysis of variance and covariance (ANOVA)
62M20 Inference from stochastic processes and prediction
62M15 Inference from stochastic processes and spectral analysis
90C25 Convex programming
PDFBibTeX XMLCite
Full Text: DOI arXiv

References:

[1] Agrawal, A.; Verschueren, R.; Diamond, S.; Boyd, S., A rewriting system for convex optimization problems, J Control Decis, 5, 1, 42-60 (2018)
[2] Amemiya, T., A note on a heteroscedastic model, J Econom, 6, 3, 365-370 (1977) · Zbl 0367.62086
[3] Anderson, T.; Bose, RC; Roy, SN, Estimation of covariance matrices which are linear combinations or whose inverses are linear combinations of given matrices, Essays in probability and statistics, 1-24 (1970), Chapel Hill: University of North Carolina Press, Chapel Hill · Zbl 0265.62023
[4] Bates, D.; Mächler, M.; Bolker, B.; Walker, S., Fitting linear mixed-effects models using lme4, J Stat Softw, 67, 1, 1-48 (2015)
[5] Beezer RA, Bradshaw R, Grout J, Stein WA (2013) Sage. In: Hogben L (ed) Handbook of linear algebra, 2nd edn. Chapman and Hall/CRC, Boca Raton, pp 91-1-91-26
[6] Bertsekas, DP, Convex optimization theory (2009), Belmont: Athena Scientific, Belmont · Zbl 1242.90001
[7] Box, GEP; Jenkins, GM; Reinsel, GC; Ljung, GM, Time series analysis: forecasting and control (2015), Hoboken: Wiley, Hoboken · Zbl 1317.62001
[8] Boyd, S.; Vandenberghe, L., Convex optimization (2009), Cambridge: Cambridge University Press, Cambridge · Zbl 1058.90049
[9] Boyd, S.; Vandenberghe, L., Introduction to applied linear algebra: vectors, matrices, and least squares (2018), Cambridge: Cambridge University Press, Cambridge · Zbl 1406.15001
[10] Brieulle, L.; De Feo, L.; Doliskani, J.; Flori, JP; Schost, É., Computing isomorphisms and embeddings of finite fields, Math Comp, 88, 317, 1391-1426 (2019) · Zbl 1408.13070
[11] Brockwell, PJ; Davis, RA, Time series: theory and methods (1991), New York: Springer, New York · Zbl 0709.62080
[12] Brockwell, PJ; Davis, RA, Time series: theory and methods (2009), New York: Springer, New York · Zbl 1169.62074
[13] Brockwell, PJ; Davis, RA, Introduction to time series and forecasting (2016), New York: Springer, New York · Zbl 1355.62001
[14] Christensen, R., Plane answers to complex questions: the theory of linear models (2011), New York: Springer, New York · Zbl 1266.62043
[15] Cornuéjols, G.; Peña, J.; Tütüncü, R., Optimization methods in finance (2018), Cambridge: Cambridge University Press, Cambridge · Zbl 1400.91001
[16] Covarrubias-Pazaran, G., Genome-assisted prediction of quantitative traits using the R package sommer, PLoS ONE, 11, 6, e0156 (2016)
[17] Demidenko, E., Mixed models: theory and applications with R (2013), Hoboken: Wiley, Hoboken · Zbl 1276.62049
[18] Diamond, S.; Boyd, S., CVXPY: a python-embedded modeling language for convex optimization, J Mach Learn Res, 17, 83, 1-5 (2016) · Zbl 1360.90008
[19] Frederickson B (2019) Ranking Programming Languages by GitHub Users. https://www.benfrederickson.com/ranking-programming-languages-by-github-users/. Accessed 19 Feb 2020
[20] Fu A, Narasimhan B, Diamond S, Miller J, Boyd S, Rosenfield PK (2019) CVXR: disciplined Convex Optimization. https://CRAN.R-project.org/package=CVXR. Accessed 19 Feb 2020
[21] Gajdoš A (2019) MMEinR: R-version of Witkovský’s MATLAB mixed function. https://github.com/fdslrm/MMEinR
[22] Gajdoš, A.; Hančová, M.; Hanč, J., Kriging methodology and its development in forecasting econometric time series, Statistika, 97, 1, 59-73 (2017)
[23] Gajdoš A, Hanč J, Hančová M (2019a) fdslrm. https://github.com/fdslrm. Accessed 19 Feb 2020
[24] Gajdoš A, Hanč J, Hančová M (2019b) fdslrm: applications. https://github.com/fdslrm/applications. Accessed 19 Feb 2020
[25] Gajdoš A, Hanč J, Hančová M (2019c) fdslrm: EBLUP-NE. https://github.com/fdslrm/EBLUP-NE. Accessed 19 Feb 2020
[26] Gajdoš A, Hanč J, Hančová M (2019d) fdslrm: R package. https://github.com/fdslrm/R-package. Accessed 19 Feb 2020
[27] Galántai, A., Projectors and projection methods (2004), New York: Springer, New York · Zbl 1055.65043
[28] Galecki, A.; Burzykowski, T., Linear mixed-effects models using R: a step-by-step approach (2013), New York: Springer, New York · Zbl 1275.62053
[29] Ghosh, M., On the nonexistence of nonnegative unbiased estimators of variance components, Indian J Stat Ser B (1960-2002), 58, 3, 360-362 (1996) · Zbl 0874.62075
[30] Grant M, Boyd S (2018) CVX: matlab software for disciplined convex programming. http://cvxr.com/cvx/. Accessed 19 Feb 2020
[31] Grant, M.; Boyd, S.; Ye, Y.; Liberti, L.; Maculan, N., Disciplined convex programming, Global optimization: from theory to implementation (2006), New York: Springer, New York · Zbl 1130.90382
[32] Hančová M (2007) Comparison of prediction quality of the best linear unbiased predictors in time series linear regression models. In: Proceedings of 15th European Young Statisticians Meeting, University of Extremadura, Castro Urdiales (Spain). https://github.com/fdslrm/EBLUP-NE. Accessed 19 Feb 2020
[33] Hančová, M., Natural estimation of variances in a general finite discrete spectrum linear regression model, Metrika, 67, 3, 265-276 (2008) · Zbl 1357.62275
[34] Harville, DA, Bayesian inference for variance components using only error contrasts, Biometrika, 61, 2, 383-385 (1974) · Zbl 0281.62072
[35] Harville, DA, Maximum likelihood approaches to variancecomponent estimation and to related problems, J Am Stat Assoc, 72, 358, 320-338 (1977) · Zbl 0373.62040
[36] Henderson, CR; Kempthorne, O.; Searle, SR; von Krosigk, CM, The Estimation of environmental and genetic trends from records subject to culling, Biometrics, 15, 2, 192-218 (1959) · Zbl 0128.40301
[37] Hyndman, RJ; Athanasopoulos, G., Forecasting: principles and practice (2018), OTexts: Monash University, OTexts
[38] Jiang, J., Linear and generalized linear mixed models and their applications (2007), New York: Springer, New York · Zbl 1152.62040
[39] Jones E, Oliphant T, Peterson P et al (2001) SciPy: Open source scientific tools for Python. http://www.scipy.org/. Accessed 19 Feb 2020
[40] Kedem, B.; Fokianos, K., Regression models for time series analysis (2005), New York: Wiley, New York · Zbl 1011.62089
[41] Kluyver T, Ragan-Kelley B, Perez F, Granger B, Bussonnier M, Frederic J, Kelley K, Hamrick J, Grout J, Corlay S, Ivanov P, Avila D, Abdalla S, Willing C (2016) Jupyter Notebooks-a publishing format for reproducible computational workflows. In: Loizides F, Schmidt B (eds) Positioning and power in academic publishing: players, agents and agendas. Proceedings of the 20th ELPUB. Ios Press, Amsterdam, pp 87-90
[42] Koenker, R.; Mizera, I., Convex optimization in R, J Stat Softw, 60, 5, 1-23 (2014) · Zbl 1367.62020
[43] Kreiss, JP; Lahiri, SN; Rao, TS; Rao, SS; Rao, CR, Bootstrap methods for time series, Handbook of statistics, time series analysis: methods and applications, 3-26 (2012), Amsterdam: Elsevier, Amsterdam · Zbl 1242.62005
[44] LaMotte, LR, A direct derivation of the REML likelihood function, Stat Pap, 48, 2, 321-327 (2007) · Zbl 1110.62078
[45] Lima A, Rossi L, Musolesi M (2014) Coding together at scale: GitHub as a collaborative social network. In: Adar E, Resnick P, Choudhury MD, Hogan B, Oh AH (eds) Proceedings of the 8th ICWSM, Ann Arbor, June 1-4, 2014. The AAAI Press, Palo Alto
[46] McLeod, AI; Yu, H.; Mahdi, E.; Rao, TS; Rao, SS; Rao, CR, Time series with R, Time series analysis: methods and applications, 661-712 (2012), Amsterdam: Elsevier, Amsterdam
[47] Oliphant, TE, Python for scientific computing, Comput Sci Eng, 9, 3, 10-20 (2007)
[48] Percival, DB; Walden, AT, Spectral analysis for physical applications: multitaper and conventional univariate techniques (2009), Cambridge: Cambridge University Press, Cambridge · Zbl 0796.62077
[49] Pinheiro, J.; Bates, D., Mixed-effects models in S and S-PLUS (2009), New York: Springer, New York · Zbl 0953.62065
[50] Pinheiro JC, Bates D, DebRoy S, Sarkar D, EISPACK authors, Heiserkamp S, Van Willingen B, R-core (2018) nlme: linear and nonlinear mixed effects models. https://CRAN.R-project.org/package=nlme. Accessed 19 Feb 2020
[51] Pourahmadi, M., Foundations of time series analysis and prediction theory (2001), New York: Wiley, New York · Zbl 0982.62074
[52] Priestley, M., Spectral analysis and time series (2004), Amsterdam: Elsevier Academic Press, Amsterdam
[53] Project Jupyter, Bussonnier M, Forde J, Freeman J, Granger B, Head T, Holdgraf C, Kelley K, Nalvarte G, Osheroff A, Pacer M, Panda Y, Perez F, Ragan-Kelley B, Willing C (2018) Binder 2.0—reproducible, interactive, sharable environments for science at scale. Proceedings of the 17th Python in Science Conference, pp 113-120
[54] Puntanen, S.; Styan, GPH; Isotalo, J., Formulas useful for linear regression analysis and related matrix theory. SpringerBriefs in statistics (2013), Berlin: Springer, Berlin · Zbl 1276.62048
[55] R Development Core Team (2019) R: a language and environment for statistical computing. http://www.r-project.org/. Accessed 19 Feb 2020
[56] Rao, CR; Kleffe, J., Estimation of variance components and applications (1988), Amsterdam: North-Holland, Amsterdam · Zbl 0645.62073
[57] Rao, JNK; Molina, I., Small area estimation (2015), Hoboken: Wiley, Hoboken · Zbl 1323.62002
[58] SAS Institute Inc, SAS/STAT 15.1 user’s guide: the MIXED procedure (2018), Cary: SAS Institute Inc., Cary
[59] Searle, SR; Khuri, AI, Matrix algebra useful for statistics (2017), Hoboken: Wiley, Hoboken · Zbl 1365.62004
[60] Searle, SR; Casella, G.; McCulloch, CE, Variance components (2009), Hoboken: Wiley, Hoboken · Zbl 1108.62064
[61] Shumway, RH; Stoffer, DS, Time series analysis and its applications: with R examples (2017), New York: Springer, New York · Zbl 1367.62004
[62] Singer, JM; Rocha, FMM; Nobre, JS, Graphical Tools for detecting departures from linear mixed model assumptions and some remedial measures, Int Stat Rev, 85, 2, 290-324 (2017) · Zbl 07763549
[63] Sokol, P.; Gajdoš, A.; Silhavy, R.; Silhavy, P.; Prokopova, Z., Prediction of attacks against honeynet based on time series modeling, Applied computational intelligence and mathematical methods, 360-371 (2018), New York: Springer, New York
[64] Stein WA et al (2019) Sage Mathematics Software—SageMath. http://www.sagemath.org. Accessed 19 Feb 2020
[65] Stroup, WW; Milliken, GA; Claassen, EA; Wolfinger, RD, SAS for mixed models: introduction and basic applications (2018), Cary: SAS Institute, Cary
[66] Štulajter, F., Consistency of linear and quadratic least squares estimators in regression models with covariance stationary errors, Appl Math-Czech, 36, 2, 149-155 (1991) · Zbl 0727.62087
[67] Štulajter, F., Predictions in time series using regression models (2002), New York: Springer, New York · Zbl 1011.62102
[68] Štulajter, F., The MSE of the BLUP in a finite discrete spectrum LRM, Tatra Mt Math Publ, 26, 1, 125-131 (2003) · Zbl 1154.62372
[69] Štulajter, F.; Witkovský, V., Estimation of variances in orthogonal finite discrete spectrum linear regression models, Metrika, 60, 2, 105-118 (2004) · Zbl 1083.62079
[70] Weiss, CJ, Scientific computing for chemists: an undergraduate course in simulations, data processing, and visualization, J Chem Educ, 94, 5, 592-597 (2017)
[71] Witkovský, V., Estimation, Testing, and prediction regions of the fixed and random effects by solving the Henderson’s mixed model equations, Meas Sci Rev, 12, 6, 234-248 (2012)
[72] WitkovskýV(2018) mixed—File Exchange—MATLAB Central. https://www.mathworks.com/matlabcentral/fileexchange/200. Accessed 19 Feb 2020
[73] Wu, WB; Xiao, H.; Rao, TS; Rao, SS; Rao, CR, Covariance matrix estimation in time series, Handbook of statistics, time series analysis: methods and applications, 187-209 (2012), Amsterdam: North-Holland, Elsevier, Amsterdam · Zbl 1242.62005
[74] Ża̧dło, T., On MSE of EBLUP, Stat Pap, 50, 1, 101-118 (2009) · Zbl 1312.62014
[75] Zhang, F., The Schur complement and its applications (2005), New York: Springer, New York · Zbl 1075.15002
[76] Zimmermann, P.; Casamayou, A.; Cohen, N.; Connan, G.; Dumont, T.; Fousse, L.; Maltey, F.; Meulien, M.; Mezzarobba, M.; Pernet, C.; Thiéry, NM; Bray, E.; Cremona, J.; Forets, M.; Ghitza, A.; Thomas, H., Computational mathematics with SageMath (2018), Philadelphia: SIAM, Philadelphia · Zbl 1434.65001
[77] Zwiernik, P.; Uhler, C.; Richards, D., Maximum likelihood estimation for linear Gaussian covariance models, J R Stat Soc Series B Stat Methodol, 79, 4, 1269-1292 (2017) · Zbl 1373.62267
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.