Lebanon, Guy; Zhao, Yang; Zhao, Yanjun Modeling temporal text streams using the local multinomial model. (English) Zbl 1329.62244 Electron. J. Stat. 4, 566-584 (2010). Summary: Temporal text data such as news feeds cannot be adequately modeled by standard \(n\)-grams which correspond to multinomial or Markov chain models. Instead, we examine the application of local \(n\)-grams to modeling time stamped documents. We derive the asymptotic bias and variance and consider the bandwidth selection problem. Experimental results are presented on news feeds and web search query logs. MSC: 62G99 Nonparametric inference 62P99 Applications of statistics Keywords:kernel smoothing; text modeling Software:RCV1 × Cite Format Result Cite Review PDF Full Text: DOI Euclid References: [1] Baeza-Yates, R. and Ribeiro-Neto, B. (1999)., Modern Information Retrieval . Addison Wesley. [2] Jelinek, F. (1999)., Statistical methods for speech recognition . MIT press. [3] Jurafsky, D., Martin, J. H. and Kehler, A. (2000)., Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition . MIT Press. [4] Lewis, D., Yang, Y., Rose, T. and Li, F. (2004). RCV1: A new benchmark collection for text categorization research., Journal of Machine Learning Research 5 361-397. [5] Manning, C. D. and Schutze, H. (1999)., Foundations of Statistical Natural Language Processing . MIT Press. [6] Pass, G., Chowdhury, A. and Torgeson, C. (2006). A picture of search. In, The First International Conference on Scalable Information Systems . [7] Trujillo, A. (1999)., Translation engines: techniques for machine translation . Springer Verlag. · Zbl 0934.68088 [8] Yang, Y. (1999). An evaluation of statistical approaches to text categorization., Information Retrieval 1 69-90. This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.