×

A comparative review of dimension reduction methods in approximate Bayesian computation. (English) Zbl 1331.62123

Summary: Approximate Bayesian computation (ABC) methods make use of comparisons between simulated and observed summary statistics to overcome the problem of computationally intractable likelihood functions. As the practical implementation of ABC requires computations based on vectors of summary statistics, rather than full data sets, a central question is how to derive low-dimensional summary statistics from the observed data with minimal loss of information. In this article we provide a comprehensive review and comparison of the performance of the principal methods of dimension reduction proposed in the ABC literature. The methods are split into three nonmutually exclusive classes consisting of best subset selection methods, projection techniques and regularization. In addition, we introduce two new methods of dimension reduction. The first is a best subset selection method based on Akaike and Bayesian information criteria, and the second uses ridge regression as a regularization procedure. We illustrate the performance of these dimension reduction techniques through the analysis of three challenging models and data sets.

MSC:

62F15 Bayesian inference
62J07 Ridge regression; shrinkage estimators (Lasso)
65C60 Computational problems in statistics (MSC2010)
62Pxx Applications of statistics

Software:

abc; pls; ismev; abctools
PDF BibTeX XML Cite
Full Text: DOI arXiv Euclid

References:

[1] Abdi, H. and Williams, L. J. (2010). Partial least square regression, projection on latent structure regression. Wiley Interdiscip. Rev. Comput. Stat. 2 433-459.
[2] Aeschbacher, S., Beaumont, M. A. and Futschik, A. (2012). A novel approach for choosing summary statistics in approximate Bayesian computation. Genetics 192 1027-1047.
[3] Akaike, H. (1974). A new look at the statistical model identification. IEEE Trans. Automat. Control AC-19 716-723. · Zbl 0314.62039
[4] Allingham, D., King, R. A. R. and Mengersen, K. L. (2009). Bayesian estimation of quantile distributions. Stat. Comput. 19 189-201.
[5] Baddeley, A. and Jensen, E. B. V. (2004). Stereology for Statisticians . Chapman & Hall/CRC, Boca Raton, FL.
[6] Barnes, C., Filippi, S., Stumpf, M. P. H. and Thorne, T. (2012). Considerate approaches to constructing summary statistics for ABC model selection. Stat. Comput. 22 1181-1197. · Zbl 1252.62002
[7] Barthelmé, S. and Chopin, N. (2011). Expectation-propagation for summary-less, likelihood-free inference. Available at . · Zbl 1367.62063
[8] Beaumont, M. A. (2010). Approximate Bayesian computation in evolution and ecology. Annual Review of Ecology , Evolution , and Systematics 41 379-406.
[9] Beaumont, M. A., Zhang, W. and Balding, D. J. (2002). Approximate Bayesian computation in population genetics. Genetics 162 2025-2035.
[10] Beaumont, M. A., Marin, J. M., Cornuet, J. M. and Robert, C. P. (2009). Adaptivity for ABC algorithms: The ABC-PMC scheme. Biometrika 96 983-990. · Zbl 1437.62393
[11] Bertorelle, G., Benazzo, A. and Mona, S. (2010). ABC as a flexible framework to estimate demography over space and time: Some cons, many pros. Mol. Ecol. 19 2609-2625.
[12] Blum, M. G. B. (2010a). Approximate Bayesian computation: A nonparametric perspective. J. Amer. Statist. Assoc. 105 1178-1187. · Zbl 1390.62052
[13] Blum, M. G. B. (2010b). Choosing the summary statistics and the acceptance rate in approximate Bayesian computation. In COMPSTAT 2010: Proceedings in Computational Statistics (G. Saporta and Y. Lechevallier, eds.) 47-56. Springer, New York.
[14] Blum, M. G. B. and François, O. (2010). Non-linear regression models for approximate Bayesian computation. Stat. Comput. 20 63-73.
[15] Blum, M. G. B., Nunes, M. A., Prangle, D. and Sisson, S. A. (2013). Supplement to “A Comparative Review of Dimension Reduction Methods in Approximate Bayesian Computation.” . · Zbl 1331.62123
[16] Bonassi, F. V., You, L. and West, M. (2011). Bayesian learning from marginal data in bionetwork models. Stat. Appl. Genet. Mol. Biol. 10 Art. 49, 29. · Zbl 06351955
[17] Bortot, P., Coles, S. G. and Sisson, S. A. (2007). Inference for stereological extremes. J. Amer. Statist. Assoc. 102 84-92. · Zbl 1284.62795
[18] Boulesteix, A.-L. and Strimmer, K. (2007). Partial least squares: A versatile tool for the analysis of high-dimensional genomic data. Brief. Bioinformatics 8 32-44.
[19] Coles, S. (2001). An Introduction to Statistical Modeling of Extreme Values . Springer, London. · Zbl 0980.62043
[20] Csilléry, K., François, O. and Blum, M. G. B. (2012). abc: An R package for approximate Bayesian computation. Methods in Ecology and Evolution 3 475-479.
[21] Csilléry, K., Blum, M. G. B., Gaggiotti, O. and François, O. (2010). Approximate Bayesian computation in practice. Trends in Ecology and Evolution 25 410-418.
[22] Del Moral, P., Doucet, A. and Jasra, A. (2012). An adaptive sequential Monte Carlo method for approximate Bayesian computation. Stat. Comput. 22 1009-1020. · Zbl 1252.65025
[23] Drovandi, C. C. and Pettitt, A. N. (2011). Estimation of parameters for macroparasite population evolution using approximate Bayesian computation. Biometrics 67 225-233. · Zbl 1217.62128
[24] Drovandi, C. C., Pettitt, A. N. and Faddy, M. J. (2011). Approximate Bayesian computation using indirect inference. J. R. Stat. Soc. Ser. C. Appl. Stat. 60 317-337.
[25] Estoup, A., Lombaert, E., Marin, J. M., Guillemaud, T., Pudlo, P., Robert, C. and Cornuet, J. M. (2012). Estimation of demo-genetic model probabilities with approximate Bayesian computation using linear discriminant analysis on summary statistics. Molecular Ecology Resources 12 846-855.
[26] Fan, Y., Nott, D. J. and Sisson, S. A. (2012). Regression density estimation ABC. Unpublished manuscript.
[27] Fearnhead, P. and Prangle, D. (2012). Constructing summary statistics for approximate Bayesian computation: Semi-automatic approximate Bayesian computation (with discussion). J. R. Stat. Soc. Ser. B Stat. Methodol. 74 419-474.
[28] Filippi, S., Barnes, C. P. and Stumpf, M. P. H. (2012). Contribution to the discussion of Fearnhead and Prangle (2012). Constructing summary statistics for approximate Bayesian computation: Semi-automatic approximate Bayesian computation. J. R. Stat. Soc. Ser. B Stat. Methodol. 74 459-460.
[29] Geman, S., Bienenstock, E. and Doursat, R. (1992). Neural networks and the bias/variance dilemma. Neural Comput. 4 1-58.
[30] Golub, G. H., Heath, M. and Wahba, G. (1979). Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics 21 215-223. · Zbl 0461.62059
[31] Heggland, K. and Frigessi, A. (2004). Estimating functions in indirect inference. J. R. Stat. Soc. Ser. B Stat. Methodol. 66 447-462. · Zbl 1062.62098
[32] Hoerl, A. E. and Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12 55-67. · Zbl 0202.17205
[33] Hudson, R. R. (2002). Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics 18 337-338.
[34] Hurvich, C. M., Simonoff, J. S. and Tsai, C.-L. (1998). Smoothing parameter selection in nonparametric regression using an improved Akaike information criterion. J. R. Stat. Soc. Ser. B Stat. Methodol. 60 271-293. · Zbl 0909.62039
[35] Hurvich, C. M. and Tsai, C.-L. (1989). Regression and time series model selection in small samples. Biometrika 76 297-307. · Zbl 0669.62085
[36] Irizarry, R. A. (2001). Information and posterior probability criteria for model selection in local likelihood estimation. J. Amer. Statist. Assoc. 96 303-315. · Zbl 1015.62016
[37] Jasra, A., Singh, S. S., Martin, J. S. and McCoy, E. (2012). Filtering via approximate Bayesian computation. Statist. Comput. 22 1223-1237. · Zbl 1252.62093
[38] Jeremiah, E., Sisson, S. A., Marshall, L., Mehrotra, R. and Sharma, A. (2011). Bayesian calibration and uncertainty analysis for hydrological models: A comparison of adaptive-Metropolis and sequential Monte Carlo samplers. Water Resources Research 47 W07547, 13pp.
[39] Joyce, P. and Marjoram, P. (2008). Approximately sufficient statistics and Bayesian computation. Stat. Appl. Genet. Mol. Biol. 7 Art. 26, 18. · Zbl 1276.62077
[40] Jung, H. and Marjoram, P. (2011). Choice of summary statistic weights in approximate Bayesian computation. Stat. Appl. Genet. Mol. Biol. 10 Art. 45, 25. · Zbl 1296.92043
[41] Konishi, S., Ando, T. and Imoto, S. (2004). Bayesian information criteria and smoothing parameter selection in radial basis function networks. Biometrika 91 27-43. · Zbl 1132.62313
[42] Leuenberger, C. and Wegmann, D. (2010). Bayesian computation and model selection without likelihoods. Genetics 184 243-252.
[43] Lopes, J. S. and Beaumont, M. A. (2010). ABC: A useful Bayesian tool for the analysis of population data. Infect. Genet. Evol. 10 826-833.
[44] Luciani, F., Sisson, S. A., Jiang, H., Francis, A. R. and Tanaka, M. M. (2009). The epidemiological fitness cost of drug resistance in Mycobacterium tuberculosis. Proc. Natl. Acad. Sci. USA 106 14711-14715.
[45] Marjoram, P., Molitor, J., Plagnol, V. and Tavare, S. (2003). Markov chain Monte Carlo without likelihoods. Proc. Natl. Acad. Sci. USA 100 15324-15328.
[46] Mevik, B.-H. and Cederkvist, H. R. (2004). Mean squared error of prediction (MSEP) estimates for principal component regression (PCR) and partial least squares regression (PLSR). Journal of Chemometrics 18 422-429.
[47] Mevik, B.-H. and Wehrens, R. (2007). The pls package: Principal component and partial least squares regression in R. Journal of Statistical Software 18 1-24.
[48] Minka, T. (2001). Expectation propagation for approximate Bayesian inference. Proceedings of Uncertainty in Artificial Intelligence 17 362-369.
[49] Nakagome, S., Fukumizu, K. and Mano, S. (2012). Kernel approximate Bayesian computation for population genetic inferences. Available at .
[50] Nix, D. A. and Weigend, A. S. (1995). Learning local error bars for nonlinear regression. In Advances in Neural Information Processing Systems 7 ( NIPS‘ 94) (G. Tesauo, D. Touretzky and T. Leen, eds.) 489-496. MIT Press, Cambridge.
[51] Nordborg, M. (2007). Coalescent theory. In Handbook of Statistical Genetics , 3rd ed. (D. J. Balding, M. J. Bishop and C. Cannings, eds.) 179-208. Wiley, Chichester.
[52] Nott, D. J., Fan, Y. and Sisson, S. A. (2012). Contribution to the discussion of Fearnhead and Prangle (2012). Constructing summary statistics for approximate Bayesian computation: Semi-automatic approximate Bayesian computation. J. R. Stat. Soc. Ser. B Stat. Methodol. 74 466.
[53] Nott, D. J., Fan, Y., Marshall, L. and Sisson, S. A. (2013). Approximate Bayesian computation and Bayes linear analysis: Towards high-dimensional approximate Bayesian computation. J. Comput. Graph. Statist.
[54] Nunes, M. A. and Balding, D. J. (2010). On optimal selection of summary statistics for approximate Bayesian computation. Stat. Appl. Genet. Mol. Biol. 9 Art. 34, 16. · Zbl 1304.92047
[55] Peters, G. W., Fan, Y. and Sisson, S. A. (2012). On sequential Monte Carlo, partial rejection control and approximate Bayesian computation. Stat. Comput. 22 1209-1222. · Zbl 1252.65022
[56] Pritchard, J. K., Seielstad, M. T., Perez-Lezaun, A. and Feldman, M. W. (1999). Population growth of human Y chromosomes: A study of Y chromosome microsatellites. Mol. Biol. Evol. 16 1791-1798.
[57] Ripley, B. D. (1994). Neural networks and related methods for classification. J. R. Stat. Soc. Ser. B Stat. Methodol. 56 409-456. · Zbl 0815.62037
[58] Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461-464. · Zbl 0379.62005
[59] Sedki, M. A. and Pudlo, P. (2012). Contribution to the discussion of Fearnhead and Prangle (2012). Constructing summary statistics for approximate Bayesian computation: Semi-automatic approximate Bayesian computation. J. R. Stat. Soc. Ser. B Stat. Methodol. 74 466-467.
[60] Shannon, C. E. (1948). A mathematical theory of communication. Bell System Tech. J. 27 379-423, 623-656. · Zbl 1154.94303
[61] Singh, H., Misra, N., Hnizdo, V., Fedorowicz, A. and Demchuk, E. (2003). Nearest neighbor estimates of entropy. Amer. J. Math. Management Sci. 23 301-321.
[62] Sisson, S. A., Fan, Y. and Tanaka, M. M. (2007). Sequential Monte Carlo without likelihoods. Proc. Natl. Acad. Sci. USA 104 1760-1765 (electronic). · Zbl 1160.65005
[63] Sisson, S. A. and Fan, Y. (2011). Likelihood-free Markov chain Monte Carlo. In Handbook of Markov Chain Monte Carlo (S. P. Brooks, A. Gelman, G. Jones and X. L. Meng, eds.) 319-341. CRC Press, Boca Raton, FL.
[64] Taniguchi, M. and Tresp, V. (1997). Averaging regularized estimators. Neural Comput. 9 1163-1178.
[65] Toni, T., Welch, D., Strelkowa, N., Ipsen, A. and Stumpf, M. P. (2009). Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. Journal of the Royal Society Interface 6 187-202.
[66] Vinzi, V. E., Chin, W. W., Henseler, J. and Wang, H., eds. (2010). Handbook of Partial Least Squares : Concepts , Methods and Applications . Springer, Heidelberg. · Zbl 1186.62001
[67] Wegmann, D., Leuenberger, C. and Excoffier, L. (2009). Efficient approximate Bayesian computation coupled with Markov chain Monte Carlo without likelihood. Genetics 182 1207-1218.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.