Continuous online sequence learning with an unsupervised neural network model. (English) Zbl 1474.68249

Summary: The ability to recognize and predict temporal sequences of sensory inputs is vital for survival in natural environments. Based on many known properties of cortical neurons, hierarchical temporal memory (HTM) sequence memory recently has been proposed as a theoretical framework for sequence learning in the cortex. In this letter, we analyze properties of HTM sequence memory and apply it to sequence learning and prediction problems with streaming data. We show the model is able to continuously learn a large number of variable order temporal sequences using an unsupervised Hebbian-like learning rule. The sparse temporal codes formed by the model can robustly handle branching temporal sequences by maintaining multiple predictions until there is sufficient disambiguating evidence. We compare the HTM sequence memory with other sequence learning algorithms, including statistical methods – autoregressive integrated moving average; feedforward neural networks – time delay neural network and online sequential extreme learning machine; and recurrent neural networks – long short-term memory and echo-state networks on sequence prediction problems with both artificial and real-world data. The HTM model achieves comparable accuracy to other state-of-the-art algorithms. The model also exhibits properties that are critical for sequence learning, including continuous online learning, the ability to handle multiple predictions and branching sequences with high-order statistics, robustness to sensor noise and fault tolerance, and good performance without task-specific hyperparameter tuning. Therefore, the HTM sequence memory not only advances our understanding of how the brain may solve the sequence learning problem but is also applicable to real-world sequence learning problems from continuous data streams.


68T05 Learning and adaptive systems in artificial intelligence
68W27 Online algorithms; streaming algorithms
92B20 Neural networks for/in biological studies, artificial life and related topics
92C20 Neural biology
Full Text: DOI arXiv


[1] Abeles, M. (1982). Local cortical circuits: An electrophysiological study. Berlin: Springer. ,
[2] Ahmad, S., & Hawkins, J. (2016). How do neurons operate on sparse distributed representations? A mathematical theory of sparsity, neurons and active dendrites. arXiv.1601.00720
[3] Antic, S. D., Zhou, W. L., Moore, A. R., Short, S. M., & Ikonomu, K. D. (2010). The decade of the dendritic NMDA spike. J. Neurosci. Res., 88, 2991-3001. ,
[4] Ben Taieb, S., Bontempi, G., Atiya, A. F., & Sorjamaa, A. (2012). A review and comparison of strategies for multi-step ahead time series forecasting based on the NN5 forecasting competition. Expert Syst. Appl., 39(8), 7067-7083. ,
[5] Bishop, C. (2006). Pattern recognition and machine learning. Singapore: Springer. · Zbl 1107.68072
[6] Brea, J., Senn, W., & Pfister, J.-P. (2013). Matching recall and storage in sequence learning with spiking neural networks. J. Neurosci., 33(23), 9565-9575. ,
[7] Bridle, J. (1989). Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. In F. Fogelman Soulié & J. Héravlt (Eds.), Neurocomputing: Algorithms, architectures and applications (pp. 227-236). Berlin: Springer-Verlag.
[8] Brosch, M., & Schreiner, C. E. (2000). Sequence sensitivity of neurons in cat primary auditory cortex. Cereb. Cortex., 10(12), 1155-1167. ,
[9] Buxhoeveden, D. P. (2002). The minicolumn hypothesis in neuroscience. Brain, 125(5), 935-951. ,
[10] Clegg, B. A., Digirolamo, G. J., & Keele, S. W. (1998). Sequence learning. Trends Cogn. Sci. 2(8), 275-281. ,
[11] Crone, S. F., Hibon, M., & Nikolopoulos, K. (2011). Advances in forecasting with neural networks? Empirical evidence from the NN3 competition on time series prediction. Int. J. Forecast., 27(3), 635-660. ,
[12] Dietterich, T. G. (2002). Machine learning for sequential data: A review. In Proceedings of the Jt. IAPR Int. Work. Struct. Syntactic, Stat. Pattern Recognition (pp. 15-30). Berlin: Springer-Verlag. , · Zbl 1073.68712
[13] Domingos, P., & Hulten, G. (2000). Mining high-speed data streams. Proceedings of the Sixth ACM SIGKDD Int. Conf. Knowl. Discov. Data Mining (pp. 71-80). New York: ACM Press. ,
[14] Durbin, J., & Koopman, S. J. (2012). Time series analysis by state space methods (2nd ed.). New York: Oxford University Press. , · Zbl 1270.62120
[15] Fine, S., Singer, Y., & Tishby, N. (1998). The hierarchical hidden markov model: Analysis and applications. Mach. Learn., 32(1), 41-62. , · Zbl 0901.68178
[16] Földiák, P. (2002). Sparse coding in the primate cortex. In M. A. Arbib (Ed.), The handbook of brain theory and neural networks (2nd ed.), (pp. 1064-1068). Cambridge, MA: MIT Press.
[17] Gaber, M. M., Zaslavsky, A., & Krishnaswamy, S. (2005). Mining data streams. ACM SIGMOD Rec., 34(2), 18. , · Zbl 1087.68557
[18] Gama, J. (2010). Knowledge discovery from data streams. Boca Raton, FL: Chapman and Hall/CRC. , · Zbl 1230.68017
[19] Gavornik, J. P., & Bear M. F. (2014). Learned spatiotemporal sequence recognition and prediction in primary visual cortex. Nat. Neurosci., 17, 732-737. ,
[20] Graves, A. (2012). Supervised sequence labelling with recurrent neural networks. New York: Springer. , · Zbl 1235.68014
[21] Greff, K., Srivastava, R., Koutnik, J., Steunebrink, B. R., & Schmidhuber, J. (2015). LSTM: A search space Odyssey. arXiv.1503.04069
[22] Hawkins, J., & Ahmad, S. (2016). Why neurons have thousands of synapses: A theory of sequence memory in neocortex. Front. Neural Circuits, 10. ,
[23] Hawkins, J., Ahmad, S., & Dubinsky, D. (2011). Cortical learning algorithm and hierarchical temporal memory. Numenta white paper. http://numenta.org/resources/HTM_CorticalLearningAlgorithms.pdf
[24] Hebb, D. (1949). The organization of behavior: A neuropsychological theory. Sci. Educ., 44(1), 335.
[25] Henaff, M., Szlam, A., & Lecun, Y. (2016). Orthogonal RNNs and long-memory tasks. arXiv.1602.06662
[26] Hinton, G., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv.1207.0580 · Zbl 1318.68153
[27] Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Comput., 9(8), 1735-1780. ,
[28] Huang, G.-B., Wang, D. H., & Lan, Y. (2011). Extreme learning machines: A survey. Int. J. Mach. Learn. Cybern., 2(2), 107-122. ,
[29] Huang, G.-B., Zhu, Q.-Y., & Siew, C.-K. (2006). Extreme learning machine: Theory and applications. Neurocomputing, 70, 489-501. ,
[30] Hyndman, R. J., & Athanasopoulos, G. (2013). Forecasting: Principles and practice. OTexts, https://www.otexts.org/fpp.
[31] Hyndman, R. J., & Khandakar, Y. (2008). Automatic time series forecasting: The forecast package for R. J. Stat. Softw., 26(3).
[32] Igel, C., & Hüsken, M. (2003). Empirical evaluation of the improved Rprop learning algorithms. Neurocomputing, 50, 105-123. , · Zbl 1006.68811
[33] Jaeger, H. (2002). Tutorial on training recurrent neural networks, covering BPPT, RTRL, EKF and the “echo state network” approach (GMD Rep. 159. 48). Hanover: German National Research Center for Information Technology.
[34] Jaeger, H., & Haas, H. (2004). Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication. Science, 304(5667), 78-80. ,
[35] Kanerva, P. (1988). Sparse distributed memory. Cambrige, MA: MIT Press. · Zbl 0685.68009
[36] Lavin, A., & Ahmad, S. (2015). Evaluating real-time anomaly detection algorithms: The Numenta anomaly benchmark. In Proceedings of the 14th Int. Conf. Mach. Learn. Appl.Piscataway, NJ: IEEE.
[37] LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436-444. ,
[38] Lee, M., Hwang, K., & Sung, W. (2014). Fault tolerance analysis of digital feedforward deep neural networks. In Proceedings of the 2014 IEEE Int. Conf. Acoust. Speech Signal Processing, (pp. 5031-5035). Piscataway, NJ: IEEE.
[39] Liang, N.-Y., Huang, G.-B., Saratchandran, P., & Sundararajan, N. (2006). A fast and accurate online sequential learning algorithm for feedforward networks. IEEE Trans. Neural Netw., 17(6), 1411-1423. ,
[40] Lipton, Z. C., Berkowitz, J., & Elkan, C. (2015). A critical review of recurrent neural networks for sequence learning. arXiv.1506.00019[cs.LG]
[41] Lughofer, E., & Angelov, P. (2011). Handling drifts and shifts in on-line data streams with evolving fuzzy systems. Appl. Soft Comput., 11(2), 2057-2068. ,
[42] Major, G., Larkum, M. E., & Schiller, J. (2013). Active properties of neocortical pyramidal neuron dendrites. Annu. Rev. Neurosci., 36 1-24. ,
[43] Massey, P. V., & Bashir, Z. I. (2007). Long-term depression: Multiple forms and implications for brain function. Trends Neurosci., 30(4), 176-184. ,
[44] Mauk, M. D., & Buonomano, D. V. (2004). The neural basis of temporal processing. Annu. Rev. Neurosci., 27, 307-340. ,
[45] McFarland, J. M., Cui, Y., & Butts, D. A. (2013). Inferring nonlinear neuronal computation based on physiologically plausible inputs. PLoS Comput. Biol., 9(7), e1003143. ,
[46] Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv.1301.3781
[47] Mnatzaganian, J., Fokoué, E., & Kudithipudi, D. (2016). A mathematical formalization of hierarchical temporal memory’s spatial pooler. arXiv.1601.06116.
[48] Moreira-Matias, L., Gama, J., Ferreira, M., Mendes-Moreira, J., & Damas, L. (2013). Predicting taxi-passenger demand using streaming data. IEEE Trans. Intell. Transp. Syst., 14(3), 1393-1402. ,
[49] Mountcastle, V. B. (1997). The columnar organization of the neocortex. Brain, 120 (Pt. 4), 701-722. ,
[50] Nikolić, D., Häusler, S., Singer, W., & Maass, W. (2009). Distributed fading memory for stimulus properties in the primary visual cortex. PLoS Biol., 7(12), e1000260. ,
[51] Olshausen, B. A., & Field, D. J. (2004). Sparse coding of sensory inputs. Curr. Opin. Neurobiol., 14, 481-487. ,
[52] Polsky, A., Mel, B. W., & Schiller, J. (2004). Computational subunits in thin dendrites of pyramidal cells. Nat. Neurosci., 7(6), 621-627. ,
[53] Ponulak, F., & Kasiński, A. (2010). Supervised learning in spiking neural networks with ReSuMe: Sequence learning, classification, and spike shifting. Neural Comput., 22(2), 467-510. , · Zbl 1183.92018
[54] Purdy, S. (2016). Encoding data for HTM systems. arXiv.1602.05925
[55] Rabiner, L., & Juang, B. (1986). An introduction to hidden Markov models. IEEE ASSP Mag., 3(1), 4-16. ,
[56] Rao, R. P., & Sejnowski, T. J. (2001). Predictive learning of temporal sequences in recurrent neocortical circuits. In Proceedings of theNovartis Found. Symp., 239, (pp. 208-229; discussion 229-240). ,
[57] Sadato, N., Pascual-Leone, A., Grafman, J., Ibañez, V., Deiber, M. P., & Hallett, M. (1996). Activation of the primary visual cortex by Braille reading in blind subjects. Nature, 380(6574), 526-528. ,
[58] Sayed-Mouchaweh, M., & Lughofer, E. (2012). Learning in non-stationary environments: Methods and applications. New York: Springer. , · Zbl 1451.68028
[59] Schaul, T., Bayer, J., Wierstra, D., Sun, Y., Felder, M., Sehnke, F., … Schmidhuber, J. (2010). PyBrain. J. Mach. Learn. Res. 11, 743-746.
[60] Schmidhuber, J. (2009). Simple algorithmic theory of subjective beauty, novelty, surprise, interestingness, attention, curiosity, creativity, art, science, music, jokes. Journal of the Society of Instrument and Control Engineers, 48(1), 21-32.
[61] Schmidhuber, J. (2014). Deep learning in neural networks: An overview. Neural Networks, 61, 85-117. ,
[62] Sejnowski, T., & Rosenberg, C. (1987). Parallel networks that learn to pronounce English text. J. Complex Syst., 1(1), 145-168. · Zbl 0655.68107
[63] Sharma, J., Angelucci, A., & Sur, M. (2000). Induction of visual orientation modules in auditory cortex. Nature, 404(6780), 841-847. ,
[64] Smith, S. L., Smith, I. T, Branco, T., & Häusser, M. (2013). Dendritic spikes enhance stimulus selectivity in cortical neurons in vivo. Nature, 503(7474), 115-120. ,
[65] Spruston, N. (2008). Pyramidal neurons: Dendritic structure and synaptic integration. Nat. Rev. Neurosci., 9(3), 206-221. ,
[66] Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, & K. Q. Weinberger (Eds.), Advances in neural information processing systems, 27 (pp. 3104-3112). Red Hook, NY: Curran.
[67] Tchernev, E. B., Mulvaney, R. G., & Phatak, D. S. (2005). Investigating the fault tolerance of neural networks. Neural Comput., 17(7), 1646-1664. , · Zbl 1103.68731
[68] Tran, A. H., Yanushkevich, S. N., Lyshevski, S. E., & Shmerko, V. P. (2011). Design of neuromorphic logic networks and fault-tolerant computing. In Proceedings of the 2011 11th IEEE Int. Conf. Nanotechnology, (pp. 457-462). Piscataway, NJ: IEEE. ,
[69] Waibel, A., Hanazawa, T., Hinton, G., Shikano, K., & Lang, K. J. (1989). Phoneme recognition using time-delay neural networks. IEEE Trans. Acoust., 37(3), 328-339. ,
[70] Wang, X., & Han, M. (2014). Online sequential extreme learning machine with kernels for nonstationary time series prediction. Neurocomputing, 145, 90-97. ,
[71] Williams, R. J., & Peng, J. (1990). An efficient gradient-based algorithm for on-line training of recurrent network trajectories. Neural Comput., 2(4), 490-501. ,
[72] Williams, R. J., & Zipser, D. (1989). A learning algorithm for continually running fully recurrent neural networks. Neural Comput., 1(2), 270-280. ,
[73] Xu, S., Jiang, W., Poo, M.-M., & Dan, Y. (2012). Activity recall in a visual cortical ensemble. Nat. Neurosci., 15(3), 449-455, S1-S2. ,
[74] Zito, K., & Svoboda, K. (2002). Activity-dependent synaptogenesis in the adult Mammalian cortex. Neuron, 35(6), 1015-1017. ,
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.