×

zbMATH — the first resource for mathematics

Bayesian prediction with multiple-samples information. (English) Zbl 1369.62116
Summary: The prediction of future outcomes of a random phenomenon is typically based on a certain number of “analogous” observations from the past. When observations are generated by multiple samples, a natural notion of analogy is partial exchangeability and the problem of prediction can be effectively addressed in a Bayesian nonparametric setting. Instead of confining ourselves to the prediction of a single future experimental outcome, as in most treatments of the subject, we aim at predicting features of an unobserved additional sample of any size. We first provide a structural property of prediction rules induced by partially exchangeable arrays, without assuming any specific nonparametric prior. Then we focus on a general class of hierarchical random probability measures and devise a simulation algorithm to forecast the outcome of \(m\) future observations, for any \(m \geq 1\). The theoretical result and the algorithm are illustrated by means of a real dataset, which also highlights the “borrowing strength” behavior across samples induced by the hierarchical specification.

MSC:
62H12 Estimation in multivariate analysis
62F15 Bayesian inference
62G05 Nonparametric estimation
60G09 Exchangeability for stochastic processes
60G57 Random measures
PDF BibTeX XML Cite
Full Text: DOI
References:
[1] Barrientos, A. F.; Jara, A.; Quintana, F. A., Fully nonparametric regression for bounded data using dependent Bernstein polynomials, J. Amer. Statist. Assoc., (2016)
[2] Bunge, J.; Willis, A.; Walsh, F., Estimating the number of species in microbial diversity studies, Annu. Rev. Stat. Appl., 1, 427-445, (2014)
[3] F. Camerlenghi, A. Lijoi, P. Orbanz, I. Prünster, Distribution theory for hierarchical processes, 2016. Technical Report (submitted for publication).
[4] Carnap, R., Logical foundations of probability, (1950), University of Chicago Press Chicago · Zbl 0040.07001
[5] Chao, A.; Jost, L., Coverage-based rarefaction and extrapolation: standardizing samples by completeness rather than size, Ecology, 93, 2533-2547, (2012)
[6] Cifarelli, D.; Regazzini, E., Problemi statistici nonparametrici in condizioni di scambiabilità parziale, (1978), Quaderni Istituto di Matematica Finanziaria Università di Torino
[7] De Blasi, P.; Favaro, S.; Lijoi, A.; Mena, R. H.; Ruggiero, M.; Prünster, I., Are Gibbs-type priors the most natural generalization of the Dirichlet process?, IEEE Trans. Pattern Anal. Mach. Intell., 37, 212-229, (2015)
[8] de Finetti, B., Probabilismo, Logos, Erkenntnis, 31, 169-223, (1989), Translated in
[9] de Finetti, B., La prévision: ses lois logiques, ses sources subjectives, Ann. Inst. H. Poincaré, 7, 1-68, (1937) · JFM 63.1070.02
[10] de Finetti, B., Sur la condition d’équivalence partielle, Actual. Sci. Ind., 5-18, (1938)
[11] Gasthaus, J.; Teh, Y. W., Improvements to the sequence memoizer, (Advances in Neuro Information Processing Systems, Vol. 23, Proceedings of a meeting held 6-9 December 2010, Vancouver, British Columbia, Canada, 24th Annual Conference on Neural Information Processing Systems 2010, (2010))
[12] Good, I. J., The population frequencies of species and the estimation of population parameters, Biometrika, 40, 237-264, (1953) · Zbl 0051.37103
[13] Good, I. J.; Toulmin, G. H., The number of new species, and the increase in population coverage, when a sample is increased, Biometrika, 43, 45-63, (1956) · Zbl 0070.14403
[14] Griffin, J. E.; Leisen, F., Compound random measures and their use in Bayesian non-parametrics, J. R. Stat. Soc. Ser. B., 79, 2, 525-545, (2017)
[15] Gutierrez, L.; Mena, R. H.; Ruggiero, M., A time dependent Bayesian nonparametric model for air quality analysis, Comput. Statist. Data Anal., 95, 161-175, (2016) · Zbl 06918610
[16] (Hjort, N. L; Holmes, C.; Müller, P.; Walker, S. G., Bayesian Nonparametrics, (2010), Cambridge University Press Cambridge, UK)
[17] V. Huynh, D. Phung, S. Venkatesh, X. Nguyen, M. Hoffman, H.H. Bui, Scalable nonparametric Bayesian multilevel clustering, 2016, in: Proceedings of the ICML 2014.
[18] Jo, S.; Lee, J.; Müller, P.; Quintana, F. A.; Trippa, L., Dependent species sampling models for spatial density estimation, Bayesian Anal., (2016), (in press)
[19] Lijoi, A.; Mena, R. H.; Prünster, I., Bayesian nonparametric estimation of the probability of discovering a new species, Biometrika, 94, 769-786, (2007) · Zbl 1156.62374
[20] Lijoi, A.; Mena, R. H.; Prünster, I., A Bayesian nonparametric approach for comparing clustering structures in EST libraries, J. Comput. Biol., 15, 1315-1327, (2008)
[21] Lijoi, A.; Prünster, I., Models beyond the Dirichlet process, (Hjort, N. L.; Holmes, C. C.; Müller, P.; Walker, S. G., Bayesian Nonparametrics, (2010), Cambridge University Press Cambridge), 80-136
[22] MacEachern, S. N., Dependent nonparametric processes, (ASA Proceedings of the Section on Bayesian Statistical Science, (1999), American Statistical Association Alexandria), 50-55
[23] MacEachern, S. N., Dependent Dirichlet processes. technical report, (2000), Department of Statistics Ohio State University
[24] Mao, C. X., Prediction of the conditional probability of discovering a new class, J. Amer. Statist. Assoc., 99, 1108-1118, (2004) · Zbl 1055.62007
[25] Mena, R. H.; Ruggiero, M., Dynamic density estimation with diffusive Dirichlet mixtures, Bernoulli, 22, 901-926, (2016) · Zbl 1388.62099
[26] Müller, P.; Quintana, F. A., Nonparametric Bayesian data analysis, Statist. Sci., 19, 95-110, (2004) · Zbl 1057.62032
[27] Müller, P.; Quintana, F. A.; Jara, A.; Hanson, T., Bayesian nonparametric data analysis, (2015), Springer New York · Zbl 1333.62003
[28] Nguyen, X., Borrowing strength in hierarchical Bayes: posterior concentration of the Dirichlet base measure, Bernoulli, 22, 1535-1571, (2016) · Zbl 1360.62103
[29] V. Nguyen, D. Phung, X. Nguyen, S. Venkatesh, H.H. Bui, Bayesian nonparametric multilevel clustering with group-level contexts, 2014, in: Proceedings of the ICML 2014.
[30] Orlitsky, A.; Suresh, A. T.; Wu, Y., Optimal prediction of the number of unseen species, Proc. Natl. Acad. Sci., 113, 13283-13288, (2016)
[31] Pitman, J., Combinatorial stochastic processes, (École D’été de Probabilités de Saint-Flour XXXII, Lecture Notes in Mathematics, N. 1875, (2006), Springer New York)
[32] Pitman, J.; Yor, M., The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator, Ann. Probab., 25, 855-900, (1997) · Zbl 0880.60076
[33] Teh, Y. W., A hierarchical Bayesian language model based on Pitman-Yor processes, (Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, (2006), Association for Computational Linguistics Morristown, NJ), 985-992
[34] Teh, Y. W.; Jordan, M. I.; Beal, M. J.; Blei, D. M., Hierarchical Dirichlet processes, J. Amer. Statist. Assoc., 101, 1566-1581, (2006) · Zbl 1171.62349
[35] Vickers, J., The problem of induction, (Zalta, E. N., The Stanford Encyclopedia of Philosophy, (2011))
[36] Zhu, W.; Leisen, F., A multivariate extension of a vector of two-parameter Poisson-Dirichlet processes, J. Nonparametr. Stat., 27, 89-105, (2015) · Zbl 1320.62059
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.