×

A wavelet-based approach for imputation in nonstationary multivariate time series. (English) Zbl 1475.62067

Summary: Many multivariate time series observed in practice are second order nonstationary, i.e. their covariance properties vary over time. In addition, missing observations in such data are encountered in many applications of interest, due to recording failures or sensor dropout, hindering successful analysis. This article introduces a novel method for data imputation in multivariate nonstationary time series, based on the so-called locally stationary wavelet modelling paradigm. Our methodology is shown to perform well across a range of simulation scenarios, with a variety of missingness structures, as well as being competitive in the stationary time series setting. We also demonstrate our technique on data arising in a health monitoring application.

MSC:

62-08 Computational methods for problems pertaining to statistics
62D10 Missing data
62M10 Time series, auto-correlation, regression, etc. in statistics (GARCH)
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Ae Lee, J.; Gill, J., Missing value imputation for physical activity data measured by accelerometer, Stat. Meth. Med. Res., 27, 2, 490-506 (2018)
[2] Ahrabian, A., Elsaleh, T., Fathy, Y., Barnaghi, P.: Detecting changes in the variance of multi-sensory accelerometer data using MCMC. In: IEEE Sensors, IEEE, pp. 1-3 (2017)
[3] Alvarez, FM; Troncoso, A.; Riquelme, JC; Ruiz, J., Energy time series forecasting based on pattern sequence similarity, IEEE Trans on Knowl Data Eng, 23, 8, 1230-1243 (2011)
[4] Audigier, V.; Husson, F.; Josse, J., Multiple imputation for continuous variables using a Bayesian principal component analysis, J Stat Comput Simul, 86, 11, 2140-2156 (2016) · Zbl 1510.62262
[5] Augustin, NH; Mattocks, C.; Faraway, JJ; Greven, S.; Ness, AR, Modelling a response as a function of high-frequency count data: The association between physical activity and fat mass, Stat Meth Med Res, 26, 5, 2210-2226 (2017)
[6] Bar-Joseph, Z.; Gerber, GK; Gifford, DK; Jaakkola, TS; Simon, I., Continuous representations of time-series gene expression data, J of Comput Biol, 10, 3-4, 341-356 (2003) · Zbl 1130.62368
[7] Barigozzi, M.; Cho, H.; Fryzlewicz, P., Simultaneous multiple change-point and factor analysis for high-dimensional time series, J Econometr, 206, 1, 187-225 (2018) · Zbl 1398.62221
[8] Barnett, I.; Torous, J.; Staples, P.; Keshavan, M.; Onnela, JP, Beyond smartphones and sensors: choosing appropriate statistical methods for the analysis of longitudinal data, J Am Med Inform Assoc, 25, 12, 1669-1674 (2018)
[9] Bidargaddi, N., Sarela, A., Boyle, J., Cheung, V., Karunanithi, M., Klingbei, L., Yelland, C., Gray, L.: Wavelet based approach for posture transition estimation using a waist worn accelerometer. In: 29th Annual International Conference of the IEEE Eng. Med. Biol. Soc., 2007., pp. 1884-1887 (2007)
[10] Bos, R.; De Waele, S.; Broersen, PM, Autoregressive spectral estimation by application of the Burg algorithm to irregularly sampled data, IEEE Trans Instrum Meas, 51, 6, 1289-1294 (2002)
[11] Brocklebank, LA; Falconer, CL; Page, AS; Perry, R.; Cooper, AR, Accelerometer-measured sedentary time and cardiometabolic biomarkers: a systematic review, Prevent Med, 76, 92-102 (2015)
[12] Broersen, PMT, Automatic spectral analysis with missing data, Digital Signal Process, 16, 6, 754-766 (2006)
[13] Cao, W., Wang, D., Li, J., Zhou, H., Li, L., Li, Y.: Brits: bidirectional recurrent imputation for time series. In: Adv. Neural Info. Process. Syst., pp. 6775-6785 (2018)
[14] Caussinus, H., Models and uses of principal component analysis, Multidimension Data Anal, 86, 149-170 (1986)
[15] Che, Z.; Purushotham, S.; Cho, K.; Sontag, D.; Liu, Y., Recurrent neural networks for multivariate time series with missing values, Sci Rep, 8, 1, 6085 (2018)
[16] Cranstoun, S.; Ombao, H.; Von Sachs, R.; Guo, W.; Litt, B., Time-frequency spectral estimation of multichannel EEG using the auto-SLEX method, IEEE Trans Biomed Eng, 49, 9, 988-996 (2002)
[17] Dahlhaus, R., A likelihood approximation for locally stationary processes, Ann Stat, 28, 6, 1762-1794 (2000) · Zbl 1010.62078
[18] Dahlhaus, R.: Locally stationary processes. In: Handbook of Statistics, vol. 30, Elsevier, pp. 351-413 (2012)
[19] Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B pp. 1-38 (1977) · Zbl 0364.62022
[20] Doucoure, B.; Agbossou, K.; Cardenas, A., Time series prediction using artificial wavelet neural network and multi-resolution analysis: Application to wind speed data, Renewable Energy, 92, 202-211 (2016)
[21] Dua, D., Graff, C.: UCI machine learning repository. URL http://archive.ics.uci.edu/ml (2017)
[22] Eckley, IA; Nason, GP, Efficient computation of the discrete autocorrelation wavelet inner product matrix, Stat Comput, 15, 2, 83-92 (2005)
[23] Ford, BL, An overview of hot-deck procedures, Incompl Data Sample Surv, 2, Part IV, 185-207 (1983)
[24] Fryzlewicz, P.; Van Bellegem, S.; Von Sachs, R., Forecasting non-stationary time series by wavelet process modelling, Ann Inst Stat Math, 55, 4, 737-764 (2003) · Zbl 1047.62085
[25] Fryzlewicz, P.; Sapatinas, T.; Subba Rao, S., A Haar-Fisz technique for locally stationary volatility estimation, Biometrika, 93, 3, 687-704 (2006) · Zbl 1109.62095
[26] Ghahramani, Z., Jordan, M.I.: Supervised learning from incomplete data via an EM approach. In: Adv. Neural Info. Process. Syst., pp. 120-127 (1994)
[27] Godfrey, A., Conway, R., Leonard, M., Meagher, D., OLaighin, G.M.: Motion analysis in delirium: A wavelet based approach for sub classification. In: 30th Ann. Intern. Conf. IEEE Eng. Med. Biol. Soc., 2008., pp. 3574-3577 (2008)
[28] Gott, AN; Eckley, IA, A note on the effect of wavelet choice on the estimation of the evolutionary wavelet spectrum, Commun Stat - Simul Computat, 42, 2, 393-406 (2013) · Zbl 1327.62465
[29] Graham, J., Missing data analysis: Making it work in the real world, Ann Rev Psychol, 60, 549-576 (2009)
[30] Hargreaves, JK; Knight, MI; Pitchford, JW; Oakenfull, RJ; Chawla, S.; Munns, J.; Davis, SJ, Wavelet spectral testing: application to nonstationary circadian rhythms, Ann Appl Stat, 13, 3, 1817-1846 (2019) · Zbl 1433.62304
[31] Honaker, J.; King, G., What to do about missing values in time-series cross-section data, Am J Polit Sci, 54, 2, 561-581 (2010)
[32] Honaker, J.; King, G.; Blackwell, M., Amelia II: A Program for Missing Data, J Stat Softw, 45, 7, 1-47 (2011)
[33] Honaker, J., King, G., Blackwell, M.: Amelia: A Program for Missing Data. URL https://CRAN.R-project.org/package=Amelia (2015)
[34] Husson, F., Josse, J.: missMDA: Handling Missing Values with Multivariate Data Analysis. URL https://CRAN.R-project.org/package=missMDA (2018) · Zbl 1316.62006
[35] Janssen, WGM; Külchü, DG; Horemans, HLD; Stam, HJ; Bussmann, JBJ, Sensitivity of accelerometry to assess balance control during sit-to-stand movement, IEEE Trans Neur Syst Rehab Eng, 16, 5, 479-484 (2008)
[36] Jones, RH, Maximum likelihood fitting of ARMA models to time series with missing observations, Technometrics, 22, 3, 389-395 (1980) · Zbl 0451.62069
[37] Josse, J., Husson, F.: missMDA: A package for handling missing values in multivariate data analysis. J Stat Softw 70(1), 1-31 (2016) doi:10.18637/jss.v070.i01, URL https://www.jstatsoft.org/v070/i01
[38] Junger, W., de Leon, A.P.: mtsdi: Multivariate Time Series Data Imputation. URL https://CRAN.R-project.org/package=mtsdi (2018)
[39] Junger, WL; de Leon, AP, Imputation of missing data in time series for air pollutants, Atmos Environ, 102, 96-104 (2015)
[40] Khan, AM; Siddiqi, MH; Lee, SW, Exploratory data analysis of acceleration signals to select light-weight and accurate features for real-time activity recognition on smartphones, Sensors, 13, 10, 13,099-13,122 (2013)
[41] Killick, R.; Eckley, IA; Jonathan, P., A wavelet-based approach for detecting changes in second order structure within nonstationary time series, Electron J Stat, 7, 1167-1183 (2013) · Zbl 1337.62269
[42] Knight, MI; Nunes, MA; Nason, G., Spectral estimation for locally stationary time series with missing observations, Stat Comput, 22, 4, 877-895 (2012) · Zbl 1252.60034
[43] Knight, MI; Leeming, KA; Nason, GP; Nunes, MA, Generalised network autoregressive processes and the GNAR package, J Stat Soft, 96, 5, 1-36 (2020)
[44] Laguna, P.; Moody, GB; Mark, RG, Power spectral density of unevenly sampled data by least-square analysis: performance and application to heart rate signals, IEEE Trans Biomed Eng, 45, 6, 698-715 (1998)
[45] Little, R., Rubin, D.B.: Statistical Analysis with Missing Data, 2nd edn. Wiley (2002) · Zbl 1011.62004
[46] Lobato, F.; Sales, C.; Araujo, I.; Tadaiesky, V.; Dias, L.; Ramos, L.; Santana, A., Multi-objective genetic algorithm for missing data imputation, Pattern Recognit Lett, 68, 126-131 (2015)
[47] Lomb, NR, Least-squares frequency analysis of unequally spaced data, Astrophys Space Sci, 39, 2, 447-462 (1976)
[48] Luo, Y., Cai, X., Zhang, Y., Xu, J., et al.: Multivariate time series imputation with generative adversarial networks. In: Adv. Neural Info. Process. Syst., pp. 1596-1607 (2018)
[49] Mayrhofer, R.; Gellersen, H., Shake well before use: Intuitive and secure pairing of mobile devices, IEEE Trans Mob Comput, 8, 6, 792-806 (2009)
[50] McDonald, L.; Oguz, M.; Carroll, R.; Thakkar, P.; Yang, F.; Dhalwani, N.; Cox, A.; Merinopoulou, E.; Malcolm, B.; Mehmud, F., Comparison of accelerometer-derived physical activity levels between individuals with and without cancer: a UK Biobank study, Fut Oncol, 15, 33, 3763-3774 (2019)
[51] Molenaar, PCM; De Gooijer, JG; Schmitz, B., Dynamic factor analysis of nonstationary multivariate time series, Psychometr, 57, 3, 333-349 (1992) · Zbl 0825.92165
[52] Moritz, S.; Bartz-Beielstein, T., imputeTS: Time Series Missing Value Imputation in R, R J, 9, 1, 207-218 (2017)
[53] Nason, G.P.: wavethresh: Wavelets Statistics and Transforms. URL https://CRAN.R-project.org/package=wavethresh (2016)
[54] Nason, GP; Von Sachs, R.; Kroisandt, G., Wavelet processes and adaptive estimation of the evolutionary wavelet spectrum, J R Stat Soc B, 62, 2, 271-292 (2000)
[55] Ombao, H.; Von Sachs, R.; Guo, W., SLEX analysis of multivariate nonstationary time series, J Am Stat Assoc, 100, 470, 519-531 (2005) · Zbl 1117.62407
[56] Park, T.; Eckley, IA; Ombao, HC, Estimating time-evolving partial coherence between signals via multivariate locally stationary wavelet processes, IEEE Trans Signal Process, 62, 20, 5240-5250 (2014) · Zbl 1394.94446
[57] Pratama, I., Permanasari, A.E., Ardiyanto, I., Indrayani, R.: A review of missing values handling methods on time-series data. In: 2016 Int. Conf. Info. Technol. Syst. Innovation (ICITSI), IEEE, pp. 1-6 (2016)
[58] Preece, S.; Goulermas, J.; Kenney, L.; Howard, D., A comparison of feature extraction methods for the classification of dynamic activities from accelerometer data, IEEE Trans Biomed Eng, 53, 3, 871-879 (2009)
[59] Preece, SJ; Goulermas, JY; Kenney, LPJ; Howard, D.; Meijer, K.; Crompton, R., Activity identification using body-mounted sensors-a review of classification techniques, Physiol Meas, 30, 4, R1 (2009)
[60] Reyes-Ortiz, JL; Oneto, L.; Samà, A.; Parra, X.; Anguita, D., Transition-aware human activity recognition using smartphones, Neurocomput, 171, 754-767 (2016)
[61] Rubin, D.B.: Multiple Imputation for Nonresponse in Surveys (Wiley Series in Probability and Statistics). Wiley (1987)
[62] Scargle, JD, Studies in astronomical time series analysis. II-Statistical aspects of spectral analysis of unevenly spaced data, Astrophys J, 263, 835-853 (1982)
[63] Schafer, JL; Olsen, MK, Multiple imputation for multivariate missing-data problems: A data analyst’s perspective, Multivar Behav Res, 33, 4, 545-571 (1998)
[64] Sekine, M.; Akay, M.; Tamura, T.; Higashi, Y.; Fujimoto, T., Investigating body motion patterns in patients with Parkinson’s disease using matching pursuit algorithm, Med Biol Eng Comput, 42, 1, 30-36 (2004)
[65] Sridevi, S., Rajaram, S., Parthiban, C., SibiArasan, S., Swadhikar, C.: Imputation for the analysis of missing values and prediction of time series data. In: 2011 Int. Conf. Recent Trends Info. Technol. (ICRTIT), IEEE, pp, 1158-1163 (2011)
[66] Stekhoven, D.J.: missForest: Nonparametric Missing Value Imputation using Random Forest. URL https://CRAN.R-project.org/package=missForest (2013)
[67] Stekhoven, DJ; Bühlmann, P., MissForest-non-parametric missing value imputation for mixed-type data, Bioinf, 28, 1, 112-118 (2011)
[68] Stărică, C.; Granger, C., Nonstationarities in stock returns, Rev Econ Stat, 87, 3, 503-522 (2005)
[69] Tang, J.; Zhang, G.; Wang, Y.; Wang, H.; Liu, F., A hybrid approach to integrate fuzzy c-means based imputation method with genetic algorithm for missing traffic volume data estimation, Transp Res Part C: Emerging Technol, 51, 29-40 (2015)
[70] Taylor, S., Park, T., Eckley, I.A., Killick, R.: mvLSW: Multivariate Locally Stationary Wavelet Process Estimation. URL https://CRAN.R-project.org/package=mvLSW (2017)
[71] Taylor, S., Park, T., Eckley, I.A.: Multivariate locally stationary wavelet analysis with the mvLSW R package. J Stat Softw 90(11):1-19, doi:10.18637/jss.v090.i11,URL https://www.jstatsoft.org/v090/i11 (2019)
[72] Taylor, S.J.: Modelling Financial Time Series, 2nd edn. World Scientific Publishing (2007) · Zbl 1130.91345
[73] Trindade, AA, Implementing modified Burg algorithms in multivariate subset autoregressive modeling, J Stat Softw, 8, 1, 1-68 (2003)
[74] Troiano, RP; McClain, JJ; Brychta, RJ; Chen, KY, Evolution of accelerometer methods for physical activity research, Brit J Sports Med, 48, 13, 1019-1023 (2014)
[75] Tsay, R.S.: Multivariate Time Series Analysis: with R and Financial Applications. John Wiley & Sons (2013)
[76] Tsay, R.S.: MTS: All-Purpose Toolkit for Analysing Multivariate Time Series (MTS) and Estimating Multivariate Volatility Models. URL https://CRAN.R-project.org/package=MTS (2015)
[77] Van Dongen, HPA; Olofsen, E.; Van Hartevelt, JH; Kruyt, EW, A procedure of multiple period searching in unequally spaced time-series with the Lomb-Scargle method, Biol Rhythm Res, 30, 2, 149-177 (1999)
[78] Van Hees, V.T., Sabia, S., Anderson, K.N., Denton, S.J., Oliver, J., Catt, M., Abell, J.G., Kivimäki, M., Trenell, M.I., Singh-Manoux, A.: A novel, open access method to assess sleep duration using a wrist-worn accelerometer. PloS one 10(11), (2015)
[79] Wen, L.; Cui, W.; Levine, AM; Bradt, HV, Orbital modulation of X-rays from Cygnus X-1 in its hard and soft states, Astrophys J, 525, 2, 968-977 (1999)
[80] Wu, S.F., Chang, C.Y., Lee, S.J.: Time series forecasting with missing values. In: 2015 1st Int. Conf. Ind. Networks Intell. Syst. (INISCom), IEEE, pp. 151-156 (2015)
[81] Wu, WB; Zhou, Z., Gaussian approximations for non-stationary multiple time series, Stat Sinica, 21, 3, 1397-1413 (2019) · Zbl 1251.60029
[82] Yin, S.; Huang, Z., Performance monitoring for vehicle suspension system via fuzzy positivistic c-means clustering based on accelerometer measurements, IEEE Trans Mechatron, 20, 5, 2613-2620 (2014)
[83] Yoon, J., Jordon, J., Van Der Schaar, M.: Gain: Missing data imputation using generative adversarial nets, arXiv preprint arXiv:1806.02920 (2018)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.