×

Time-series event-based prediction: an unsupervised learning framework based on genetic programming. (English) Zbl 1409.68235

Summary: In this paper, we propose an unsupervised learning framework based on Genetic Programming (GP) to predict the position of any particular target event (defined by the user) in a time-series. GP is used to automatically build a library of candidate temporal features. The proposed framework receives a training set \(S = \{(\mathbf{V}_{\mathbf{a}}) \mid \mathbf{a} = \boldsymbol{0} \cdots \mathbf{n} \}\), where each \(V_a\) is a time-series vector such that \(\forall V_a \in S, V_a = \{(\mathbf{x}_{\mathbf{t}}) \mid \mathbf{t} = \boldsymbol{0} \cdots \mathbf{t}_{\max} \}\) where \(\mathbf{t}_{\max}\) is the size of the time-series. All \(V_a \in S\) are assumed to be generated from the same environment. The proposed framework uses a divide-and-conquer strategy for the training phase. The training process of the proposed framework works as follow. The user specifies the target event that needs to be predicted (e.g., Highest value, Second Highest value, …, etc.). Then, the framework classifies the training samples into different Bins, where \(\mathrm{Bins} = \{(\mathbf{b}_{\mathbf{i}}) \mid \mathbf{i} = \boldsymbol{0} \cdots \mathbf{t}_{\max} \}\), based on the time-slot \( t\) of the target event in each \(V_a\) training sample. Each \(\mathbf{b}_{\mathbf{i}} \in \mathrm{Bins}\) will contain a subset of \(S\). For each \(b_i\), the proposed framework further classifies its samples into statistically independent clusters. To achieve this, each \(b_i\) is treated as an independent problem where GP is used to evolve programs to extract statistical features from each \(b_i\)’s members and classify them into different clusters using the K-means algorithm. At the end of the training process, GP is used to build an ‘event detector’ that receives an unseen time-series and predicts the time-slot where the target event is expected to occur. Empirical evidence on artificially generated data and real-world data shows that the proposed framework significantly outperforms standard Radial Basis Function Networks, standard GP system, Gaussian Process regression, Linear regression, and Polynomial Regression.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
62M10 Time series, auto-correlation, regression, etc. in statistics (GARCH)
62M20 Inference from stochastic processes and prediction
90C59 Approximation methods and heuristics in mathematical programming

Software:

SPOT
PDFBibTeX XMLCite
Full Text: DOI

References:

[3] Anderson, J. R.; Michalski, R. S.; Michalski, R. S.; Mitchell, T. M., Machine Learning: An Artificial Intelligence Approach, vol. 2 (1986), Morgan Kaufman · Zbl 0593.68060
[5] Brockwell, P.; Davis, R., Time Series: Theory and Methods (2009), Springer-Verlag · Zbl 1169.62074
[6] Cao, H.; Kang, L.; Chen, Y.; Yu, J., Evolutionary modeling of systems of ordinary differential equations with genetic programming, Genet. Program. Evolvable Machines, 1, 4, 309-337 (2000) · Zbl 1060.68569
[7] Chatfield, C., The Analysis of Time Series. The Analysis of Time Series, Texts in Statistical Science (1996), Chapman & Hall: Chapman & Hall London[u.a.] · Zbl 0870.62068
[8] Chen, Y.; Yang, B.; Meng, Q.; Zhao, Y.; Abraham, A., Time-series forecasting using a system of ordinary differential equations, Inform. Sci., 181, 1, 106-114 (2011)
[9] Dohare, D.; Devi, V. S., Combination of similarity measures for time series classification using genetic algorithms, (IEEE Congress on Evolutionary Computation (2011), IEEE), 401-408
[10] Forouzanfar, M.; Doustmohammadi, A.; Hasanzadeh, S.; Shakouri G, H., Transport energy demand forecast using multi-level genetic programming, Appl. Energy, 91, 1, 496-503 (2012)
[13] Hetland, M.; Strom, P., Evolutionary rule mining in time series databases, Machine Learn., 58, 107-125 (2005) · Zbl 1073.68607
[14] Jackson, D., The performance of a selection architecture for genetic programming, (Proceedings of the 11th European Conference on Genetic Programming. Proceedings of the 11th European Conference on Genetic Programming, EuroGP 2008, LNCS, vol. 4971 (2008), Springer: Springer Naples), 170-181
[15] Kattan, A.; Agapitos, A.; Poli, R., Unsupervised problem decomposition using genetic programming, (Esparcia-Alczar, A. I.; Ekrt, A.; Silva, S.; Dignum, S.; Etaner-Uyar, A. S., EuroGP. EuroGP, Lecture Notes in Computer Science, vol. 6021 (2010), Springer), 122-133
[16] Kattan, A.; Al-Mulla, M.; Sepulveda, F.; Poli, R., Detecting localised muscle fatigue during isometric contraction using genetic programming, (Correia, A. D.; Rosa, A. C.; Madani, K., IJCCI (2009), INSTICC Press), 292-297
[17] Koza, J. R., Genetic Programming II: Automatic Discovery of Reusable Programs (1994), MIT Press: MIT Press Cambridge, Massachusetts · Zbl 0850.68160
[18] Langin, C., Introduction to data mining, Scalable Comput.: Practice Exp., 9, 4 (2008)
[19] Lee, G. Y., Genetic recursive regression for modeling and forecasting real-world chaotic time series, (Spector, L.; Langdon, W. B.; O’Reilly, U.-M.; Angeline, P. J., Advances in Genetic Programming 3 (1999), MIT Press: MIT Press Cambridge, MA, USA), 401-423, (Chapter 17)
[20] Manning, A., Monopsony in Motion: Imperfect Competition in Labor Markets (2003), Princeton Univ Pr
[21] Mendivil, S. G.; Castillo, O.; Melin, P., Optimization of artificial neural network architectures for time series prediction using parallel genetic algorithms, (Castillo, O.; Melin, P.; Kacprzyk, J.; Pedrycz, W., Soft Computing for Hybrid Intelligent Systems. Soft Computing for Hybrid Intelligent Systems, Studies in Computational Intelligence, vol. 154 (2008), Springer: Springer Berlin, Heidelberg), 387-399
[22] Moraglio, A.; Kattan, A., Geometric generalisation of surrogate model based optimisation to combinatorial spaces, (EvoCop. EvoCop, Lecture Notes in Computer Science (2011), Springer)
[23] Nogales, F.; Conejo, A., Electricity price forecasting through transfer function models, J. Oper. Res. Soc., 57, 4, 350-356 (2005) · Zbl 1086.91505
[25] Panyaworayan, W.; Wuetschner, G., Time series prediction using a recursive algorithm of a combination of genetic programming and constant optimization, (Barry, A. M., GECCO 2002: Proceedings of the Bird of a Feather Workshops, Genetic and Evolutionary Computation Conference (2002), AAAI: AAAI New York), 101-107
[26] Peacock, J. A., Two-dimensional goodness-of-fit testing in astronomy, Royal Astronom. Soc. Monthly Notices, 202, 615-627 (1983)
[28] Povinelli, R. J.; Feng, X., A new temporal pattern identification method for characterization and prediction of complex time series events, IEEE Trans. Knowl. Data Eng., 15, 2, 339-352 (2003)
[31] Ruiz, R., Review of “experimental methods for the analysis of optimization algorithms”, Thomas Bartz-Beielstein, Marco Chiarandini, Lu’s Paquete, Mike Preuss, Springer, 2010, Eur. J. Oper. Res., 214, 2, 453-456 (2011) · Zbl 1203.68003
[33] Tsang, E.; Yung, P.; Li, J., EDDIE-automation, a decision support tool for financial forecasting, Decis. Support Syst., 37, 4, 559-565 (2004)
[34] Wang, W.; Hidvégi, Z.; Whinston, A., Shill bidding in multi-round online auctions, (Proceedings of the 35th Annual Hawaii International Conference on System Sciences, 2002. Proceedings of the 35th Annual Hawaii International Conference on System Sciences, 2002, HICSS (2002), IEEE), 7
[35] Wei, W. W.S., Time Series Analysis - Univariate and Multivariate Methods (1989), Addison-Wesley
[36] Weiss, G. M.; Hirsh, H., Learning to predict rare events in categorical time-series data: technical report, (Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (1998), AAAI Press: AAAI Press Menlo Park, CA)
[38] Yang, F.; Li, M.; Huang, A.; Li, J., Forecasting time series with genetic programming based on least square method, J. Syst. Sci. Complexity, 27, 1, 117-129 (2014) · Zbl 1294.93087
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.