×

Poisson random fields for dynamic feature models. (English) Zbl 1442.62070

Summary: We present the Wright-Fisher Indian buffet process (WF-IBP), a probabilistic model for time-dependent data assumed to have been generated by an unknown number of latent features. This model is suitable as a prior in Bayesian nonparametric feature allocation models in which the features underlying the observed data exhibit a dependency structure over time. More specifically, we establish a new framework for generating dependent Indian buffet processes, where the Poisson random field model from population genetics is used as a way of constructing dependent beta processes. Inference in the model is complex, and we describe a sophisticated Markov Chain Monte Carlo algorithm for exact posterior simulation. We apply our construction to develop a nonparametric focused topic model for collections of time-stamped text documents and test it on the full corpus of NIPS papers published from 1987 to 2015.

MSC:

62G05 Nonparametric estimation
60G55 Point processes (e.g., Poisson, Cox, Hawkes processes)
62P30 Applications of statistics in engineering and industry; control charts
65C05 Monte Carlo methods
PDF BibTeX XML Cite
Full Text: arXiv Link

References:

[1] A. Ahmed and E. P. Xing. Timeline: A Dynamic Hierarchical Dirichlet Process Model for Recovering Birth/Death and Evolution of Topics in Text Stream. UAI, abs/1203.3463, 2012.
[2] A. Amei and S. Sawyer. A time-dependent Poisson random field model for polymorphism within and between two related biological species. Annals of Applied Probability, 20(5): 1663–1696, 2010. · Zbl 1210.92009
[3] A. Amei and S. Sawyer.Statistical inference of selection and divergence from a timedependent poisson random field model. PLoS ONE, 7(4):e34413, 2012. · Zbl 1210.92009
[4] C. Andrieu, A. Doucet, and R. Holenstein. Particle Markov chain Monte Carlo methods. Journal of the Royal Statistal Society B, 72:269–342, 2010. · Zbl 1184.65001
[5] A. Asuncion, M. Welling, P. Smyth, and Y. W. Teh. On Smoothing and Inference for Topic Models. UAI, pages 27–34, 2009.
[6] D. M. Blei and J. D. Lafferty. Dynamic topic models. ICML, pages 113–120, 2006.
[7] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent Dirichlet Allocation. JMLR, 3:993–1022, 2003. · Zbl 1112.68379
[8] A. R. Boyko, S. H. Williamson, A. R. Indap, J. D. Degenhardt, R. D. Hernandez, K. E. Lohmueller, M. D. Adams, S. Schmidt, J. J. Sninsky, S. R. Sunyaev, T. J. White, R. Nielsen, A. G. Clark, and C. D. Bustamante.Assessing the evolutionary impact of amino acid mutations in the human genome. PLoS Genet, 4(5), 2008.
[9] C. D. Bustamante, J. Wakeley, S. Sawyer, and D. L. Hartl. Directional Selection and the Site-Frequency Spectrum. Genetics, 159(4):1779–1788, 2001.
[10] C. D. Bustamante, R. Nielsen, and D. L. Hartl. Maximum likelihood and Bayesian methods for estimating the distribution of selective effects among classes of mutations using DNA polymorphism data. Theoretical Population Biology, 63(2):91–103, 2003. ISSN 0040-5809. · Zbl 1104.62118
[11] C. E. Dangerfield, D. Kay, S. MacNamara, and K. Burrage. A boundary preserving numerical algorithm for the Wright-Fisher model with mutation. BIT Numerical Mathematics, 52(2):283–304, 2012. · Zbl 1255.65019
[12] F. Doshi-Velez and Z. Ghahramani. Accelerated Sampling for the Indian Buffet Process. ICML, pages 273–280, 2009.
[13] A. Dubey, A. Hefny, S. Williamson, and E. P. Xing. A Nonparametric Mixture Model for Topic Modeling over Time. SDM, pages 530–538, 2013.
[14] S. N. Ethier and T. G. Kurtz. Markov processes: characterization and convergence. Wiley series in probability and mathematical statistics. J. Wiley & Sons, 1986. ISBN 0-47108186-8.
[15] W. J. Ewens. Mathematical Population Genetics. Springer-Verlag, Berlin, 2004. 43 · Zbl 1060.92046
[16] S. Gershman, P. I. Frazier, and D. M. Blei. Distance Dependent Infinite Latent Feature Models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(2):334– 345, 2015.
[17] Z. Ghahramani, T. L. Griffiths, and P. Sollich.Bayesian nonparametric latent feature models. Bayesian Statistics, pages 201–225, 2007. · Zbl 1252.62004
[18] A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schlkopf, and A. J. Smola. A Kernel Method for the Two-Sample-Problem. NIPS, pages 513–520, 2006.
[19] R. C. Griffiths. The frequency spectrum of a mutation, and its age, in a general diffusion model. Theoretical Population Biology, 64(2):241–251, 2003.
[20] T. L. Griffiths and Z. Ghahramani. The Indian Buffet Process: An Introduction and Review. JMLR, 12:1185–1224, 2011. · Zbl 1280.62038
[21] T. L. Griffiths and M. Steyvers. Finding scientific topics. PNAS, 101:5228–5235, 2004.
[22] R. N. Gutenkunst, R. D. Hernandez, S. H. Williamson, and C. D. Bustamante. Inferring the Joint Demographic History of Multiple Populations from Multidimensional SNP Frequency Data. PLoS Genet, 5(10), 2009.
[23] D. L. Hartl, E. N. Moriyama, and S. A. Sawyer. Selection intensity for codon bias. Genetics, pages 227–234, 1994.
[24] P. A. Jenkins and D. Span‘o. Exact simulation of the Wright-Fisher diffusion. Annals of Applied Probability, 2017. To appear.
[25] S. Karlin and H. M. Taylor. A Second Course in Stochastic Processes. Academic Press, 1981. · Zbl 0469.60001
[26] J. F. C. Kingman. Completely random measures. Pacific Journal of Mathematics, 21(1): 59–78, 1967. · Zbl 0155.23503
[27] Y. LeCun, Y. Bengio, and G. E. Hinton. Deep learning. Nature, 521:436–444, 2015.
[28] K. T. Miller, T. L. Griffiths, and M. I. Jordan. The Phylogenetic Indian Buffet Process: A Non-Exchangeable Nonparametric Prior for Latent Features.UAI, abs/1206.3279: 403–410, 2012.
[29] Y. Ogata. On Lewis’ simulation method for point processes. IEEE Transactions on Information Theory, 27(1):23–30, 1981. · Zbl 0449.60037
[30] V. Rao and Y. W. Teh. Spatial Normalized Gamma Processes. NIPS, pages 1554–1562, 2009.
[31] S. A. Sawyer and D. L. Hartl. Population genetics of polymorphism and divergence. Genetics, 132(4):1161–1176, 1992.
[32] Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierarchical Dirichlet Processes. Journal of the American Statistical Association, 101(476):1566–1581, 2006. 44 · Zbl 1171.62349
[33] Y. W. Teh, D. G¨or¨ur, and Z. Ghahramani. Stick-breaking Construction for the Indian Buffet Process. AISTATS, 11:556–563, 2007.
[34] R. Thibaux and M. I. Jordan. Hierarchical beta processes and the Indian buffet process. AISTATS, 2:564–571, 2007.
[35] S. G. Walker. Sampling the Dirichlet mixture model with slices. Communications in Statistics - Simulation and Computation, 36(1):45–54, 2007. · Zbl 1113.62058
[36] S. Williamson, C. Wang, K. Heller, and D. Blei. Focused Topic Models. NIPS Workshop on Applications for Topic Models: Text and Beyond, pages 1–4, 2009.
[37] S. Williamson, P. Orbanz, and Z. Ghahramani. Dependent Indian Buffet Processes. AISTATS, pages 924–931, 2010a.
[38] S. Williamson, C. Wang, K. A. Heller, and D. M. Blei. The IBP Compound Dirichlet Process and its Application to Focused Topic Modeling. ICML, pages 1151–1158, 2010b.
[39] S. H. Williamson, R. Hernandez, A. Fledel-Alon, L. Zhu, R. Nielsen, and C. D. Bustamante. Simultaneous inference of selection and population growth from patterns of variation in the human genome. PNAS, 102(22):7882–7887, 2005.
[40] M. Zhou, H. Yang, G. Sapiro, D. B. Dunson, and L. Carin. Dependent Hierarchical Beta Process for Image Interpolation and Denoising. AISTATS, 15:883–891, 2011.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.