×

Cluster and feature modeling from combinatorial stochastic processes. (English) Zbl 1331.62124

Summary: One of the focal points of the modern literature on Bayesian nonparametrics has been the problem of clustering, or partitioning, where each data point is modeled as being associated with one and only one of some collection of groups called clusters or partition blocks. Underlying these Bayesian nonparametric models are a set of interrelated stochastic processes, most notably the Dirichlet process and the Chinese restaurant process. In this paper we provide a formal development of an analogous problem, called feature modeling, for associating data points with arbitrary nonnegative integer numbers of groups, now called features or topics. We review the existing combinatorial stochastic process representations for the clustering problem and develop analogous representations for the feature modeling problem. These representations include the beta process and the Indian buffet process as well as new representations that provide insight into the connections between these processes. We thereby bring the same level of completeness to the treatment of Bayesian nonparametric feature modeling that has previously been achieved for Bayesian nonparametric clustering.

MSC:

62F15 Bayesian inference
60G09 Exchangeability for stochastic processes
62H30 Classification and discrimination; cluster analysis (statistical aspects)
60C05 Combinatorial probability
60G57 Random measures
62G05 Nonparametric estimation

References:

[1] Adams, R. P., Ghahramani, Z. and Jordan, M. I. (2010). Tree-structured stick breaking for hierarchical data. Adv. Neural Inf. Process. Syst. 23 19-27.
[2] Aldous, D. J. (1985). Exchangeability and related topics. In École D’été de Probabilités de Saint-Flour , XIII- 1983. Lecture Notes in Math. 1117 1-198. Springer, Berlin. · Zbl 0562.60042
[3] Bertoin, J. (1996). Lévy Processes. Cambridge Tracts in Mathematics 121 . Cambridge Univ. Press, Cambridge. · Zbl 0861.60003
[4] Bertoin, J. (1999). Subordinators: Examples and Applications. In Lectures on Probability Theory and Statistics ( Saint-Flour , 1997). Lecture Notes in Math. 1717 1-91. Springer, Berlin. · Zbl 0955.60046 · doi:10.1007/978-3-540-48115-7_1
[5] Bertoin, J. (2000). Subordinators, Lévy processes with no negative jumps, and branching processes. Unpublished manuscript.
[6] Blackwell, D. and MacQueen, J. B. (1973). Ferguson distributions via Pólya urn schemes. Ann. Statist. 1 353-355. · Zbl 0276.62010 · doi:10.1214/aos/1176342372
[7] Blei, D. M. and Frazier, P. I. (2011). Distance dependent Chinese restaurant processes. J. Mach. Learn. Res. 12 2461-2488. · Zbl 1280.68157
[8] Blei, D. M., Griffiths, T. L. and Jordan, M. I. (2010). The nested Chinese restaurant process and Bayesian nonparametric inference of topic hierarchies. J. ACM 57 Art. 7, 30. · Zbl 1327.68187 · doi:10.1145/1667053.1667056
[9] Blei, D. M. and Jordan, M. I. (2006). Variational inference for Dirichlet process mixtures. Bayesian Anal. 1 121-143 (electronic). · Zbl 1331.62259 · doi:10.1214/06-BA104
[10] Bochner, S. (1955). Harmonic Analysis and the Theory of Probability . Univ. California Press, Berkeley and Los Angeles. · Zbl 0068.11702
[11] Broderick, T., Jordan, M. I. and Pitman, J. (2012). Beta processes, stick-breaking and power laws. Bayesian Anal. 7 439-475. · Zbl 1330.62218 · doi:10.1214/12-BA715
[12] Broderick, T., Pitman, J. and Jordan, M. I. (2013). Feature allocations, probability functions, and paintboxes. Bayesian Anal. · Zbl 1329.62278 · doi:10.1214/13-BA823
[13] Broderick, T., Mackey, L., Paisley, J. and Jordan, M. I. (2011). Combinatorial clustering and the beta negative binomial process. Available at . arXiv:1111.1802
[14] De Finetti, B. (1931). Funzione caratteristica di un fenomeno aleatorio. Atti della R. Academia Nazionale dei Lincei , Serie 6. 4 251-299. · JFM 57.0610.01
[15] Dunson, D. B. and Park, J.-H. (2008). Kernel stick-breaking processes. Biometrika 95 307-323. · Zbl 1437.62448 · doi:10.1093/biomet/asn012
[16] Escobar, M. D. (1994). Estimating normal means with a Dirichlet process prior. J. Amer. Statist. Assoc. 89 268-277. · Zbl 0791.62039 · doi:10.2307/2291223
[17] Escobar, M. D. and West, M. (1995). Bayesian density estimation and inference using mixtures. J. Amer. Statist. Assoc. 90 577-588. · Zbl 0826.62021 · doi:10.2307/2291069
[18] Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. Ann. Statist. 1 209-230. · Zbl 0255.62037 · doi:10.1214/aos/1176342360
[19] Freedman, D. A. (1965). Bernard Friedman’s urn. Ann. Math. Statist. 36 956-970. · Zbl 0138.12003 · doi:10.1214/aoms/1177700068
[20] Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence 6 721-741. · Zbl 0573.62030 · doi:10.1109/TPAMI.1984.4767596
[21] Gnedin, A. and Pitman, J. (2006). Exchangeable Gibbs partitions and Stirling triangles. J. Math. Sci. 138 5674-5685. · Zbl 1293.60010
[22] Griffiths, T. and Ghahramani, Z. (2006). Infinite latent feature models and the Indian buffet process. In Advances in Neural Information Processing Systems 18 (Y. Weiss, B. Schölkopf and J. Platt, eds.) 475-482. MIT Press, Cambridge, MA.
[23] Griffiths, T. L. and Ghahramani, Z. (2011). The Indian buffet process: An introduction and review. J. Mach. Learn. Res. 12 1185-1224. · Zbl 1280.62038
[24] Hansen, B. and Pitman, J. (1998). Prediction Rules for Exchangeable Sequences Related to Species Sampling. Technical Report 520, Univ. California, Berkeley. · Zbl 0944.62109 · doi:10.1016/S0167-7152(99)00109-1
[25] Hewitt, E. and Savage, L. J. (1955). Symmetric measures on Cartesian products. Trans. Amer. Math. Soc. 80 470-501. · Zbl 0066.29604 · doi:10.2307/1992999
[26] Hjort, N. L. (1990). Nonparametric Bayes estimators based on beta processes in models for life history data. Ann. Statist. 18 1259-1294. · Zbl 0711.62033 · doi:10.1214/aos/1176347749
[27] Hoppe, F. M. (1984). Pólya-like urns and the Ewens’ sampling formula. J. Math. Biol. 20 91-94. · Zbl 0547.92009 · doi:10.1007/BF00275863
[28] Ishwaran, H. and James, L. F. (2001). Gibbs sampling methods for stick-breaking priors. J. Amer. Statist. Assoc. 96 161-173. · Zbl 1014.62006 · doi:10.1198/016214501750332758
[29] Ishwaran, H. and Zarepour, M. (2000). Markov chain Monte Carlo in approximate Dirichlet and beta two-parameter process hierarchical models. Biometrika 87 371-390. · Zbl 0949.62037 · doi:10.1093/biomet/87.2.371
[30] Jordan, M. I., Ghahramani, Z., Jaakkola, T. S. and Saul, L. K. (1999). An introduction to variational methods for graphical models. Machine Learning 37 183-233. · Zbl 0945.68164 · doi:10.1023/A:1007665907178
[31] Kim, Y. (1999). Nonparametric Bayesian estimators for counting processes. Ann. Statist. 27 562-588. · Zbl 0980.62078 · doi:10.1214/aos/1018031207
[32] Kingman, J. F. C. (1967). Completely random measures. Pacific J. Math. 21 59-78. · Zbl 0155.23503 · doi:10.2140/pjm.1967.21.59
[33] Kingman, J. F. C. (1978). The representation of partition structures. J. London Math. Soc. (2) 18 374-380. · Zbl 0415.92009 · doi:10.1112/jlms/s2-18.2.374
[34] Kingman, J. F. C. (1993). Poisson Processes. Oxford Studies in Probability 3 . Oxford Univ. Press, New York. · Zbl 0771.60001
[35] Lee, J., Quintana, F. A., Müller, P. and Trippa, L. (2008). Defining predictive probability functions for species sampling models. Technical report. · Zbl 1331.62152 · doi:10.1214/12-STS407
[36] Li, W. and McCallum, A. (2006). Pachinko allocation: DAG-structured mixture models of topic correlations. In Proceedings of the 23 rd International Conference on Machine Learning 577-584. ACM, New York, NY.
[37] MacEachern, S. N. (1994). Estimating normal means with a conjugate style Dirichlet process prior. Comm. Statist. Simulation Comput. 23 727-741. · Zbl 0825.62053 · doi:10.1080/03610919408813196
[38] McCloskey, J. W. (1965). A model for the distribution of individuals by species in an environment. Ph.D. thesis, Michigan State Univ.
[39] McCullagh, P., Pitman, J. and Winkel, M. (2008). Gibbs fragmentation trees. Bernoulli 14 988-1002. · Zbl 1158.60373 · doi:10.3150/08-BEJ134
[40] Neal, R. M. (2000). Markov chain sampling methods for Dirichlet process mixture models. J. Comput. Graph. Statist. 9 249-265.
[41] Paisley, J., Zaas, A., Woods, C. W., Ginsburg, G. S. and Carin, L. (2010). A stick-breaking construction of the beta process. In International Conference on Machine Learning . Haifa, Israel.
[42] Papaspiliopoulos, O. and Roberts, G. O. (2008). Retrospective Markov chain Monte Carlo methods for Dirichlet process hierarchical models. Biometrika 95 169-186. · Zbl 1437.62576 · doi:10.1093/biomet/asm086
[43] Patil, G. P. and Taillie, C. (1977). Diversity as a concept and its implications for random communities. In Proceedings of the 41 st Session of the International Statistical Institute ( New Delhi , 1977) 497-515. New Delhi. · Zbl 0516.92022
[44] Pitman, J. (1995). Exchangeable and partially exchangeable random partitions. Probab. Theory Related Fields 102 145-158. · Zbl 0821.60047 · doi:10.1007/BF01213386
[45] Pitman, J. (1996). Some developments of the Blackwell-MacQueen urn scheme. In Statistics , Probability and Game Theory. Institute of Mathematical Statistics Lecture Notes-Monograph Series 30 245-267. IMS, Hayward, CA. · doi:10.1214/lnms/1215453576
[46] Pitman, J. (2003). Poisson-Kingman partitions. In Statistics and Science : A Festschrift for Terry Speed. Institute of Mathematical Statistics Lecture Notes-Monograph Series 40 1-34. IMS, Beachwood, OH. · doi:10.1214/lnms/1215091133
[47] Pitman, J. (2006). Combinatorial Stochastic Processes. Lecture Notes in Math. 1875 . Springer, Berlin. · Zbl 1103.60004 · doi:10.1007/b11601500
[48] Pitman, J. and Yor, M. (1997). The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator. Ann. Probab. 25 855-900. · Zbl 0880.60076 · doi:10.1214/aop/1024404422
[49] Pólya, G. (1930). Sur quelques points de la théorie des probabilités. Ann. Inst. H. Poincaré 1 117-161.
[50] Rogers, L. C. G. and Williams, D. (2000). Diffusions , Markov Processes , and Martingales. Vol. 1: Foundations . Cambridge Univ. Press, Cambridge. · Zbl 0949.60003
[51] Sethuraman, J. (1994). A constructive definition of Dirichlet priors. Statist. Sinica 4 639-650. · Zbl 0823.62007
[52] Teh, Y. W., Görür, D. and Ghahramani, Z. (2007). Stick-breaking construction for the indian buffet process. In Proceedings of the International Conference on Artificial Intelligence and Statistics 11.
[53] Thibaux, R. and Jordan, M. I. (2007). Hierarchical beta processes and the Indian buffet process. In Proceedings of the International Conference on Artificial Intelligence and Statistics 11.
[54] Walker, S. G. (2007). Sampling the Dirichlet mixture model with slices. Comm. Statist. Simulation Comput. 36 45-54. · Zbl 1113.62058 · doi:10.1080/03610910601096262
[55] Wolpert, R. L. and Ickstadt, K. (2004). Reflecting uncertainty in inverse problems: A Bayesian solution using Lévy processes. Inverse Problems 20 1759-1771. · Zbl 1077.45001 · doi:10.1088/0266-5611/20/6/004
[56] Zhou, M., Hannah, L., Dunson, D. and Carin, L. (2012). Beta-negative binomial process and Poisson factor analysis. In International Conference on Artificial Intelligence and Statistics .
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.