The dependent Dirichlet process and related models. (English) Zbl 07474196

Summary: Standard regression approaches assume that some finite number of the response distribution characteristics, such as location and scale, change as a (parametric or nonparametric) function of predictors. However, it is not always appropriate to assume a location/scale representation, where the error distribution has unchanging shape over the predictor space. In fact, it often happens in applied research that the distribution of responses under study changes with predictors in ways that cannot be reasonably represented by a finite dimensional functional form. This can seriously affect the answers to the scientific questions of interest, and therefore more general approaches are indeed needed. This gives rise to the study of fully nonparametric regression models. We review some of the main Bayesian approaches that have been employed to define probability models where the complete response distribution may vary flexibly with predictors. We focus on developments based on modifications of the Dirichlet process, historically termed dependent Dirichlet processes, and some of the extensions that have been proposed to tackle this general problem using nonparametric approaches.


62-XX Statistics
Full Text: DOI arXiv


[1] Aldous, D. J. (1985). Exchangeability and related topics. In École d’été de Probabilités de Saint-Flour, XIII—1983. Lecture Notes in Math. 1117 1-198. Springer, Berlin. · Zbl 0562.60042
[2] Argiento, R., Guglielmi, A. and Pievatolo, A. (2010). Bayesian density estimation and model selection using nonparametric hierarchical mixtures. Comput. Statist. Data Anal. 54 816-832. · Zbl 1464.62019
[3] Ascolani, F., Lijoi, A. and Ruggiero, M. (2021). Predictive inference with Fleming-Viot-driven dependent Dirichlet processes. Bayesian Anal. Advance publication. · Zbl 1475.62229
[4] Barndorff-Nielsen, O. (1973). On \(M\)-ancillarity. Biometrika 60 447-455. · Zbl 0277.62006
[5] Barndorff-Nielsen, O. (1978). Information and Exponential Families in Statistical Theory. Wiley Series in Probability and Mathematical Statistics. Wiley, Chichester. · Zbl 0387.62011
[6] Barrientos, A. F., Jara, A. and Quintana, F. A. (2012). On the support of MacEachern’s dependent Dirichlet processes and extensions. Bayesian Anal. 7 277-309. · Zbl 1330.60067
[7] Barrios, E., Lijoi, A., Nieto-Barajas, L. E. and Prünster, I. (2013). Modeling with normalized random measure mixture models. Statist. Sci. 28 313-334. · Zbl 1331.62120
[8] Beraha, M., Guglielmi, A. and Quintana, F. A. (2020). The semi-hierarchical Dirichlet process and its application to clustering homogeneous distributions. Preprint. Available at arXiv:2005.10287.
[9] Camerlenghi, F., Dunson, D. B., Lijoi, A., Prünster, I. and Rodríguez, A. (2019a). Latent nested nonparametric priors (with discussion). Bayesian Anal. 14 1303-1356. · Zbl 1436.62108
[10] Camerlenghi, F., Lijoi, A., Orbanz, P. and Prünster, I. (2019b). Distribution theory for hierarchical processes. Ann. Statist. 47 67-92. · Zbl 1478.60151
[11] Campbell, T., Syed, S., Yang, C.-Y., Jordan, M. I. and Broderick, T. (2019). Local exchangeability. Preprint. Available at arXiv:1906.09507.
[12] Caron, F., Davy, M., Doucet, A., Duflos, E. and Vanheeghe, P. (2008). Bayesian inference for linear dynamic models with Dirichlet process mixtures. IEEE Trans. Signal Process. 56 71-84. · Zbl 1391.62144
[13] Chen, C., Ding, N. and Buntine, W. (2012). Dependent hierarchical normalized random measures for dynamic topic modeling. In Proceedings of the 29th International Conference on Machine Learning (ICML-12) (J. Langford and J. Pineau, eds.). ICML ’12 895-902. Omnipress, New York.
[14] Chen, C., Rao, V., Buntine, W. and Teh, Y. W. (2013). Dependent normalized random measures. In Proceedings of the 30th International Conference on Machine Learning (S. Dasgupta and D. McAllester, eds.). Proceedings of Machine Learning Research 28 969-977. PMLR, Atlanta, GA.
[15] Chipman, H. A., George, E. I. and McCulloch, R. E. (2010). BART: Bayesian additive regression trees. Ann. Appl. Stat. 4 266-298. · Zbl 1189.62066
[16] Chung, Y. and Dunson, D. B. (2009). Nonparametric Bayes conditional distribution modeling with variable selection. J. Amer. Statist. Assoc. 104 1646-1660. · Zbl 1205.62039
[17] Cifarelli, D. and Regazzini, E. (1978). Problemi statistici non parametrici in condizioni di scambialbilita parziale e impiego di medie associative. Technical report. Quaderni Istituto Matematica Finanziaria, Torino.
[18] De Iorio, M., Müller, P., Rosner, G. L. and MacEachern, S. N. (2004). An ANOVA model for dependent random measures. J. Amer. Statist. Assoc. 99 205-215. · Zbl 1089.62513
[19] De Iorio, M., Johnson, W. O., Müller, P. and Rosner, G. L. (2009). Bayesian nonparametric nonproportional hazards survival modeling. Biometrics 65 762-771. · Zbl 1172.62073
[20] Devroye, L. (1986). Nonuniform Random Variate Generation. Springer, New York. · Zbl 0593.65005
[21] De la Cruz-Mesía, R., Quintana, F. A. and Müller, P. (2007). Semiparametric Bayesian classification with longitudinal markers. J. R. Stat. Soc. Ser. C. Appl. Stat. 56 119-137. · Zbl 1490.62363
[22] Di Lucca, M. A., Guglielmi, A., Müller, P. and Quintana, F. A. (2013). A simple class of Bayesian nonparametric autoregression models. Bayesian Anal. 8 63-87. · Zbl 1329.62376
[23] Duan, J. A., Guindani, M. and Gelfand, A. E. (2007). Generalized spatial Dirichlet process models. Biometrika 94 809-825. · Zbl 1156.62064
[24] Dunson, D. B. and Herring, A. H. (2006). Semiparametric Bayesian latent trajectory models. Technical report. ISDS Discussion Paper 16, Duke Univ., Durham, NC.
[25] Dunson, D. B. and Park, J.-H. (2008). Kernel Stick-Breaking Processes. Biometrika 95 307-323. · Zbl 1437.62448
[26] Dunson, D. B., Pillai, N. and Park, J.-H. (2007). Bayesian density regression. J. R. Stat. Soc. Ser. B. Stat. Methodol. 69 163-183. · Zbl 1120.62025
[27] Epifani, I. and Lijoi, A. (2010). Nonparametric priors for vectors of survival functions. Statist. Sinica 20 1455-1484. · Zbl 1200.62121
[28] Faraway, J. J. (2016). Extending the Linear Model with R: Generalized Linear, Mixed Effects and Nonparametric Regression Models. Chapman & Hall/CRC Texts in Statistical Science Series. CRC Press, Boca Raton, FL. · Zbl 1353.62002
[29] Favaro, S. and Teh, Y. W. (2013). MCMC for normalized random measure mixture models. Statist. Sci. 28 335-359. · Zbl 1331.62138
[30] Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. Ann. Statist. 1 209-230. · Zbl 0255.62037
[31] Ferguson, T. S. (1974). Prior distributions on spaces of probability measures. Ann. Statist. 2 615-629. · Zbl 0286.62008
[32] Fuentes-García, R., Mena, R. H. and Walker, S. G. (2009). A nonparametric dependent process for Bayesian regression. Statist. Probab. Lett. 79 1112-1119. · Zbl 1159.62027
[33] Gelfand, A. E. and Kottas, A. (2001). Nonparametric Bayesian modeling for stochastic order. Ann. Inst. Statist. Math. 53 865-876. · Zbl 1003.62047
[34] Gelfand, A. E., Kottas, A. and MacEachern, S. N. (2005). Bayesian nonparametric spatial modeling with Dirichlet process mixing. J. Amer. Statist. Assoc. 100 1021-1035. · Zbl 1117.62342
[35] Giudici, P., Mezzetti, M. and Muliere, P. (2003). Mixtures of products of Dirichlet processes for variable selection in survival analysis. J. Statist. Plann. Inference 111 101-115. · Zbl 1033.62099
[36] Green, P. J. and Richardson, S. (2001). Modelling heterogeneity with and without the Dirichlet process. Scand. J. Stat. 28 355-375. · Zbl 0973.62031
[37] Griffin, J. E., Kolossiatis, M. and Steel, M. F. J. (2013). Comparing distributions by using dependent normalized random-measure mixtures. J. R. Stat. Soc. Ser. B. Stat. Methodol. 75 499-529. · Zbl 1411.62083
[38] Griffin, J. E. and Leisen, F. (2017). Compound random measures and their use in Bayesian non-parametrics. J. R. Stat. Soc. Ser. B. Stat. Methodol. 79 525-545. · Zbl 1412.60071
[39] Griffin, J. E. and Steel, M. F. J. (2006). Order-based dependent Dirichlet processes. J. Amer. Statist. Assoc. 101 179-194. · Zbl 1118.62360
[40] Griffin, J. E. and Steel, M. F. J. (2010). Bayesian nonparametric modelling with the Dirichlet process regression smoother. Statist. Sinica 20 1507-1527. · Zbl 1410.62057
[41] Griffin, J. E. and Steel, M. F. J. (2011). Stick-breaking autoregressive processes. J. Econometrics 162 383-396. · Zbl 1441.62709
[42] Gutiérrez, L., Mena, R. H. and Ruggiero, M. (2016). A time dependent Bayesian nonparametric model for air quality analysis. Comput. Statist. Data Anal. 95 161-175. · Zbl 1468.62071
[43] Gutiérrez, L., Barrientos, A. F., González, J. and Taylor-Rodríguez, D. (2019). A Bayesian nonparametric multiple testing procedure for comparing several treatments against a control. Bayesian Anal. 14 649-675. · Zbl 1421.62046
[44] Györfi, L., Kohler, M., Krzyżak, A. and Walk, H. (2002). A Distribution-Free Theory of Nonparametric Regression. Springer Series in Statistics. Springer, New York. · Zbl 1021.62024
[45] Härdle, W. (1991). Smoothing Techniques: With Implementation in S. Springer Series in Statistics. Springer, New York. · Zbl 0716.62040
[46] James, L. F., Lijoi, A. and Prünster, I. (2009). Posterior analysis for normalized random measures with independent increments. Scand. J. Stat. 36 76-97. · Zbl 1190.62052
[47] Jara, A. and Hanson, T. E. (2011). A class of mixtures of dependent tail-free processes. Biometrika 98 553-566. · Zbl 1231.62178
[48] Jara, A., Lesaffre, E., De Iorio, M. and Quintana, F. (2010). Bayesian semiparametric inference for multivariate doubly-interval-censored data. Ann. Appl. Stat. 4 2126-2149. · Zbl 1220.62023
[49] Jara, A., Hanson, T., Quintana, F., Müller, P. and Rosner, G. L. (2011). DPpackage: Bayesian semi- and nonparametric modeling in R. J. Stat. Softw. 40 1-30.
[50] Kessler, D. C., Hoff, P. D. and Dunson, D. B. (2015). Marginally specified priors for non-parametric Bayesian estimation. J. R. Stat. Soc. Ser. B. Stat. Methodol. 77 35-58. · Zbl 1414.62090
[51] Klemelä, J. (2014). Multivariate Nonparametric Regression and Visualization: With R and Applications to Finance. Wiley Series in Computational Statistics. Wiley, Hoboken, NJ. · Zbl 1288.62004
[52] Kolossiatis, M., Griffin, J. E. and Steel, M. F. J. (2013). On Bayesian nonparametric modelling of two correlated distributions. Stat. Comput. 23 1-15. · Zbl 1322.62105
[53] Lau, J. W. and So, M. K. P. (2008). Bayesian mixture of autoregressive models. Comput. Statist. Data Anal. 53 38-60. · Zbl 1452.62655
[54] Lavine, M. (1992). Some aspects of Pólya tree distributions for statistical modelling. Ann. Statist. 20 1222-1235. · Zbl 0765.62005
[55] Leisen, F. and Lijoi, A. (2011). Vectors of two-parameter Poisson-Dirichlet processes. J. Multivariate Anal. 102 482-495. · Zbl 1207.62062
[56] Lijoi, A., Nipoti, B. and Prünster, I. (2014). Bayesian inference with dependent normalized completely random measures. Bernoulli 20 1260-1291. · Zbl 1309.60048
[57] Lijoi, A. and Prünster, I. (2009). Distributional properties of means of random probability measures. Stat. Surv. 3 47-95. · Zbl 1190.62056
[58] Lin, D., Grimson, E. and Fisher, J. W. (2010). Construction of dependent Dirichlet processes based on Poisson processes. In Advances in Neural Information Processing Systems 23 (J. D. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel and A. Culotta, eds.) 1396-1404. Curran Associates, Red Hook.
[59] Lo, A. Y. (1984). On a class of Bayesian nonparametric estimates. I. Density estimates. Ann. Statist. 12 351-357. · Zbl 0557.62036
[60] MacEachern, S. N. (1999). Dependent nonparametric processes. In ASA Proceedings of the Section on Bayesian Statistical Science Amer. Statist. Assoc., Alexandria, VA.
[61] MacEachern, S. N. (2000). Dependent Dirichlet processes. Technical report. Department of Statistics, The Ohio State Univ.
[62] Mena, R. H. and Ruggiero, M. (2016). Dynamic density estimation with diffusive Dirichlet mixtures. Bernoulli 22 901-926. · Zbl 1388.62099
[63] Mena, R. H., Ruggiero, M. and Walker, S. G. (2011). Geometric stick-breaking processes for continuous-time Bayesian nonparametric modeling. J. Statist. Plann. Inference 141 3217-3230. · Zbl 1216.62048
[64] Mira, A. and Petrone, S. (1996). Bayesian hierarchical nonparametric inference for change-point problems. In Bayesian Statistics, 5 (Alicante, 1994). Oxford Sci. Publ. 693-703. Oxford Univ. Press, New York.
[65] Muliere, P. and Petrone, S. (1993). A Bayesian predictive approach to sequential search for an optimal dose: Parametric and nonparametric models. J. Ital. Stat. Soc. 2 349-364. · Zbl 1446.62283
[66] Müller, P., Erkanli, A. and West, M. (1996). Bayesian curve fitting using multivariate normal mixtures. Biometrika 83 67-79. · Zbl 0865.62029
[67] Müller, P. and Quintana, F. (2010). Random partition models with regression on covariates. J. Statist. Plann. Inference 140 2801-2808. · Zbl 1191.62073
[68] Müller, P., Quintana, F. and Rosner, G. (2004). A method for combining inference across related nonparametric Bayesian models. J. R. Stat. Soc. Ser. B. Stat. Methodol. 66 735-749. · Zbl 1046.62053
[69] Müller, P., Quintana, F. and Rosner, G. L. (2011). A product partition model with regression on covariates. J. Comput. Graph. Statist. 20 260-278.
[70] Müller, P., Quintana, F. A., Jara, A. and Hanson, T. (2015). Bayesian Nonparametric Data Analysis. Springer, New York. · Zbl 1333.62003
[71] Petrone, S., Guindani, M. and Gelfand, A. E. (2009). Hybrid Dirichlet mixture models for functional data. J. R. Stat. Soc. Ser. B. Stat. Methodol. 71 755-782. · Zbl 1248.62079
[72] Prünster, I. and Ruggiero, M. (2013). A Bayesian nonparametric approach to modeling market share dynamics. Bernoulli 19 64-92. · Zbl 1288.62042
[73] Quintana, F. A. and Iglesias, P. L. (2003). Bayesian clustering and product partition models. J. R. Stat. Soc. Ser. B. Stat. Methodol. 65 557-574. · Zbl 1065.62115
[74] Regazzini, E., Lijoi, A. and Prünster, I. (2003). Distributional results for means of normalized random measures with independent increments. Ann. Statist. 31 560-585. · Zbl 1068.62034
[75] Reich, B. J. and Fuentes, M. (2007). A multivariate semiparametric Bayesian spatial modeling framework for hurricane surface wind fields. Ann. Appl. Stat. 1 249-264. · Zbl 1129.62114
[76] Ren, L., Du, L., Carin, L. and Dunson, D. B. (2011). Logistic stick-breaking process. J. Mach. Learn. Res. 12 203-239. · Zbl 1280.62079
[77] Rigon, T. and Durante, D. (2021). Tractable Bayesian density regression via logit stick-breaking priors. J. Statist. Plann. Inference 211 131-142. · Zbl 1455.62148
[78] Rodríguez, A. and Dunson, D. B. (2011). Nonparametric Bayesian models through probit stick-breaking processes. Bayesian Anal. 6 145-177. · Zbl 1330.62120
[79] Rodríguez, A., Dunson, D. B. and Gelfand, A. E. (2008). The nested Dirichlet process. J. Amer. Statist. Assoc. 103 1131-1144. · Zbl 1205.62062
[80] Rodriguez, A. and ter Horst, E. (2008). Bayesian dynamic density estimation. Bayesian Anal. 3 339-365. · Zbl 1330.62180
[81] Sethuraman, J. (1994). A constructive definition of Dirichlet priors. Statist. Sinica 4 639-650. · Zbl 0823.62007
[82] Sklar, M. (1959). Fonctions de répartition à \(n\) dimensions et leurs marges. Publ. Inst. Stat. Univ. Paris 8 229-231. · Zbl 0100.14202
[83] Teh, Y. W., Jordan, M. I., Beal, M. J. and Blei, D. M. (2006). Hierarchical Dirichlet processes. J. Amer. Statist. Assoc. 101 1566-1581. · Zbl 1171.62349
[84] Tokdar, S. T., Zhu, Y. M. and Ghosh, J. K. (2010). Bayesian density regression with logistic Gaussian process and subspace projection. Bayesian Anal. 5 319-344. · Zbl 1330.62182
[85] Trippa, L., Müller, P. and Johnson, W. (2011). The multivariate beta process and an extension of the Polya tree model. Biometrika 98 17-34. · Zbl 1214.62101
[86] Wade, S., Mongelluzzo, S. and Petrone, S. (2011). An enriched conjugate prior for Bayesian nonparametric inference. Bayesian Anal. 6 359-385. · Zbl 1330.62219
[87] Wang, C. and Rosner, G. L. (2019). A Bayesian nonparametric causal inference model for synthesizing randomized clinical trial and real-world evidence. Stat. Med. 38 2573-2588.
[88] Xu, Z., MacEachern, S. N. and Xu, X. (2015). Modeling non-Gaussian time series with nonparametric Bayesian model. IEEE Trans. Pattern Anal. Mach. Intell. 37 372-382
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.