##
**Misalignment problem in matrix decomposition with missing values.**
*(English)*
Zbl 07465778

Summary: Data collection within a real-world environment may be compromised by several factors such as data-logger malfunctions and communication errors, during which no data is collected. As a consequence, appropriate tools are required to handle the missing values when analysing and processing such data. This problem is often tackled via matrix decomposition. While it has been successfully applied in a wide range of applications, in this work we report an issue that has been neglected in literature and “degenerates” the quality of the imputations obtained by matrix decomposition in multivariate time-series (with smooth evolution). Briefly, the problem consists of the misalignment of the matrix decomposition result: the missing values imputations fall within an incorrect range of values and the transitions between observed and imputed values are not smooth. We address this problem by proposing a post-processing alignment strategy. According to our experiments, the post-processing adjustment substantially improves the accuracy of the imputations (when the misalignment occurs). Moreover, the results also suggest that the misalignment occurs mostly when dealing with a small number of time-series due to lack of generalization ability.

### MSC:

68T05 | Learning and adaptive systems in artificial intelligence |

### Software:

imputeTS; GP-VAE; missForest; NAOMI; E2EGAN; GRU-ODE-Bayes; BRITS; matrix-completion; GitHub; CRAN
PDF
BibTeX
XML
Cite

\textit{S. Fernandes} et al., Mach. Learn. 110, No. 11--12, 3157--3175 (2021; Zbl 07465778)

Full Text:
DOI

### References:

[1] | Acar, E.; Dunlavy, DM; Kolda, TG; Mørup, M., Scalable tensor factorizations for incomplete data, Chemometrics and Intelligent Laboratory Systems, 106, 1, 41-56 (2011) |

[2] | Azur, MJ; Stuart, EA; Frangakis, C.; Leaf, PJ, Multiple imputation by chained equations: What is it and how does it work?, International Journal of Methods in Psychiatric Research, 20, 1, 40-49 (2011) |

[3] | Balzano, L.; Chi, Y.; Lu, YM, Streaming pca and subspace tracking: The missing data case, Proceedings of the IEEE, 106, 8, 1293-1310 (2018) |

[4] | Banerjee, S., & Roy, A. (2014). Linear algebra and matrix analysis for statistics. CRC Press · Zbl 1309.15002 |

[5] | Bao, Y.; Fang, H.; Zhang, J., Topicmf: Simultaneously exploiting ratings and reviews for recommendation, AAAI, 14, 2-8 (2014) |

[6] | Cai, JF; Candès, EJ; Shen, Z., A singular value thresholding algorithm for matrix completion, SIAM Journal on Optimization, 20, 4, 1956-1982 (2010) · Zbl 1201.90155 |

[7] | Cao, W., Wang, D., Li, J., Zhou, H., Li, L., & Li, Y. (2018). Brits: Bidirectional recurrent imputation for time series. In Advances in neural information processing systems (pp. 6775-6785). |

[8] | Che, Z.; Purushotham, S.; Cho, K.; Sontag, D.; Liu, Y., Recurrent neural networks for multivariate time series with missing values, Scientific Reports, 8, 1, 1-12 (2018) |

[9] | Chen, G., Liu, X.Y ., Kong, L., Lu, J.L., Gu, Y., Shu, W., & Wu, M. Y. (2013). Multiple attributes-based data recovery in wireless sensor networks. In 2013 IEEE global communications conference (GLOBECOM) (pp. 103-108). IEEE. |

[10] | Chen, X., & Sun, L. (2020). Low-rank autoregressive tensor completion for multivariate time series forecasting. 2006.10436 |

[11] | Chen, X., Chen, Y., & He, Z. (2018). Urban traffic speed dataset of guangzhou, China. doi:10.5281/zenodo.1205229. |

[12] | Cichocki, A.; Phan, AH, Fast local algorithms for large scale nonnegative matrix and tensor factorizations, IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, 92, 3, 708-721 (2009) |

[13] | Cui, Z., Ke, R., & Wang, Y. (2018). Deep bidirectional and unidirectional lstm recurrent neural network for network-wide traffic speed prediction. arXiv preprint arXiv:180102143 |

[14] | Cui, Z.; Henrickson, K.; Ke, R.; Wang, Y., Traffic graph convolutional recurrent neural network: A deep learning framework for network-scale traffic learning and forecasting, IEEE Transactions on Intelligent Transportation Systems, 21, 11, 4883-4894 (2019) |

[15] | De Brouwer, E., Simm, J., Arany, A., & Moreau, Y. (2019). Gru-ode-bayes: Continuous modeling of sporadically-observed time series. In Advances in neural information processing systems (pp. 7379-7390). |

[16] | Duan, T. (2020) Lightweight python library for in-memory matrix completion. https://github.com/tonyduan/matrix-completion |

[17] | Fonollosa, J.; Rodríguez-Luján, I.; Trincavelli, M.; Vergara, A.; Huerta, R., Chemical discrimination in turbulent gas mixtures with mox sensors validated by gas chromatography-mass spectrometry, Sensors, 14, 10, 19336-19353 (2014) |

[18] | Fortuin, V., Baranchuk, D., Rätsch, G., & Mandt, S. (2020). Gp-vae: Deep probabilistic time series imputation. In International conference on artificial intelligence and statistics (pp. 1651-1661). PMLR. |

[19] | Karkouch, A.; Mousannif, H.; Moatassime, HA; Noel, T., Data quality in internet of things: A state-of-the-art survey, Journal of Network and Computer Applications, 73, 57-81 (2016) |

[20] | Khayati, M.; Lerner, A.; Tymchenko, Z.; Cudré-Mauroux, P., Mind the gap: An experimental evaluation of imputation of missing values techniques in time series, Proceedings of the VLDB Endowment, 13, 5, 768-782 (2020) |

[21] | Kim, Y., & Choi, S. (2009). Weighted nonnegative matrix factorization. In 2009 IEEE international conference on acoustics, speech and signal processing (pp. 1541-1544). |

[22] | Lepot, M.; Aubin, JB; Clemens, FH, Interpolation in time series: An introductive overview of existing methods, their performance criteria and uncertainty assessment, Water, 9, 10, 796 (2017) |

[23] | Liu, Y., Yu, R., Zheng, S., Zhan, E., & Yue, Y. (2019). Naomi: Non-autoregressive multiresolution sequence imputation. In Advances in neural information processing systems (pp. 11238-11248). |

[24] | Luo, Y., Zhang, Y., Cai, X., & Yuan, X. (2019). E2gan: End-to-end generative adversarial network for multivariate time series imputation. In Proceedings of the 28th international joint conference on artificial intelligence (pp. 3094-3100). AAAI Press. |

[25] | Menne, MJ; Durre, I.; Vose, RS; Gleason, BE; Houston, TG, An overview of the global historical climatology network-daily database, Journal of Atmospheric and Oceanic Technology, 29, 7, 897-910 (2012) |

[26] | Menne, M. J., Durre, I., Korzeniewski, B., McNeal, S., Thomas, K., Yin, X., Anthony, S., Ray, R., Vose, R., Gleason, B. E., & Houston, T. G. (2020). Global historical climatology network-daily (ghcn-daily). Version 3.38, NOAA National Climatic Data Center. doi:10.7289/V5D21VHZ,. |

[27] | Moritz, S., Moritz, M. S., & ByteCompile, T. (2019). Package “imputets”. cran r-project org. |

[28] | Santiago, A. R., Antunes, M., Barraca, J. P., Gomes, D., & Aguiar, R. L. (2019). SCoTv2: Large scale data acquisition, processing, and visualization platform. In 2019 7th International conference on future internet of things and cloud (FiCloud). IEEE. doi:10.1109/ficloud.2019.00053. |

[29] | Shu, X., Porikli, F., & Ahuja, N. (2014). Robust orthonormal subspace learning: Efficient recovery of corrupted low-rank matrices. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3874-3881). |

[30] | Stekhoven, DJ; Bühlmann, P., MissForest-non-parametric missing value imputation for mixed-type data, Bioinformatics, 28, 1, 112-118 (2011) |

[31] | Tarpey, T., A note on the prediction sum of squares statistic for restricted least squares, The American Statistician, 54, 2, 116-118 (2000) |

[32] | Williams, AH; Kim, TH; Wang, F.; Vyas, S.; Ryu, SI; Shenoy, KV; Schnitzer, M.; Kolda, TG; Ganguli, S., Unsupervised discovery of demixed, low-dimensional neural dynamics across multiple timescales through tensor component analysis, Neuron, 98, 6, 1099-1115 (2018) |

[33] | Xie, K.; Ning, X.; Wang, X.; Xie, D.; Cao, J.; Xie, G.; Wen, J., Recover corrupted data in sensor networks: A matrix completion solution, IEEE Transactions on Mobile Computing, 16, 5, 1434-1448 (2016) |

[34] | Yu, H.F., Rao, N., & Dhillon, I. S. (2016). Temporal regularized matrix factorization for high-dimensional time series prediction. In Advances in neural information processing systems (pp. 847-855). |

[35] | Zhang, D., & Balzano, L. (2016). Global convergence of a grassmannian gradient descent algorithm for subspace estimation. In AISTATS (pp. 1460-1468). |

[36] | Zhao, Q.; Zhang, L.; Cichocki, A., Bayesian cp factorization of incomplete tensors with automatic rank determination, IEEE Transactions on Pattern Analysis and Machine Intelligence, 37, 9, 1751-1763 (2015) |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.