Estimating large correlation matrices for international migration. (English) Zbl 1405.62235

Summary: The United Nations is the major organization producing and regularly updating probabilistic population projections for all countries. International migration is a critical component of such projections, and between-country correlations are important for forecasts of regional aggregates. However, in the data we consider there are 200 countries and only 12 data points, each one corresponding to a five-year time period. Thus a \(200\times200\) correlation matrix must be estimated on the basis of 12 data points. Using Pearson correlations produces many spurious correlations. We propose a maximum a posteriori estimator for the correlation matrix with an interpretable informative prior distribution. The prior serves to regularize the correlation matrix, shrinking a priori untrustworthy elements towards zero. Our estimated correlation structure improves projections of net migration for regional aggregates, producing narrower projections of migration for Africa as a whole and wider projections for Europe. A simulation study confirms that our estimator outperforms both the Pearson correlation matrix and a simple shrinkage estimator when estimating a sparse correlation matrix.


62P25 Applications of statistics to social sciences
62H20 Measures of association (correlation, canonical correlation, etc.)


spcov; glasso
Full Text: DOI arXiv Euclid


[1] Abel, G. (2013). Estimating global migration flow tables using place of birth data. Demogr. Res.28 505–546.
[2] Antoniadis, A. and Fan, J. (2001). Regularization of wavelet approximations. J. Amer. Statist. Assoc.96 939–967. With discussion and a rejoinder by the authors. · Zbl 1072.62561
[3] Azose, J. J. and Raftery, A. E. (2015). Bayesian probabilistic projection of international migration. Demography52 1627–1650.
[4] Azose, J. J., Ševčíková, H. and Raftery, A. E. (2016). Probabilistic population projections with migration uncertainty. Proc. Natl. Acad. Sci. USA113 6460–6465.
[5] Barbé, E. and Johansson-Nogués, E. (2008). The EU as a modest ‘force for good’: The European Neighbourhood Policy. Int. Aff.84 81–96.
[6] Barnard, J., McCulloch, R. and Meng, X.-L. (2000). Modeling covariance matrices in terms of standard deviations and correlations, with application to shrinkage. Statist. Sinica10 1281–1311. · Zbl 0980.62045
[7] Beck, A. and Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci.2 183–202. · Zbl 1175.94009
[8] Bickel, P. J. and Levina, E. (2008a). Covariance regularization by thresholding. Ann. Statist.36 2577–2604. · Zbl 1196.62062
[9] Bickel, P. J. and Levina, E. (2008b). Regularized estimation of large covariance matrices. Ann. Statist.36 199–227. · Zbl 1132.62040
[10] Bien, J. and Tibshirani, R. J. (2011). Sparse estimation of a covariance matrix. Biometrika98 807–820. · Zbl 1228.62063
[11] Bijak, J. and Wiśniowski, A. (2010). Bayesian forecasting of immigration to selected European countries by using expert knowledge. J. Roy. Statist. Soc. Ser. A173 775–796.
[12] Bijak, J., Kupiszewska, D., Kupiszewski, M., Saczuk, K. and Kicinger, A. (2007). Population and labour force projections for 27 European countries, 2002–2052: Impact of international migration on population ageing. Eur. J. Popul.23 1–31.
[13] Brown, S. K. and Bean, F. D. (2012). Population growth. In Debates on U.S. Immigration (J. Gans, E. M. Replogle and D. J. Tichenor, eds.). SAGE, Thousand Oaks, CA.
[14] Chaudhuri, S., Drton, M. and Richardson, T. S. (2007). Estimation of a covariance matrix with zeros. Biometrika94 199–216. · Zbl 1143.62032
[15] Chen, X., Xu, M. and Wu, W. B. (2013). Covariance and precision matrix estimation for high-dimensional time series. Ann. Statist.41 2994–3021. · Zbl 1294.62123
[16] Chi, E. C. and Lange, K. (2014). Stable estimation of a covariance matrix guided by nuclear norm penalties. Comput. Statist. Data Anal.80 117–128.
[17] Crush, J. (1999). Fortress South Africa and the deconstruction of apartheid’s migration regime. Geoforum30 1–11.
[18] Cui, Y., Leng, C. and Sun, D. (2016). Sparse estimation of high-dimensional correlation matrices. Comput. Statist. Data Anal.93 390–403. · Zbl 1468.62044
[19] de Beer, J., Raymer, J., van der Erf, R. and van Wissen, L. (2010). Overcoming the problems of inconsistent international migration data: A new method applied to flows in Europe. Eur. J. Popul.26 459–481.
[20] Deng, X. and Tsui, K.-W. (2013). Penalized covariance matrix estimation using a matrix-logarithm transformation. J. Comput. Graph. Statist.22 494–512.
[21] El Karoui, N. (2008). Operator norm consistent estimation of large-dimensional sparse covariance matrices. Ann. Statist.36 2717–2756. · Zbl 1196.62064
[22] Fan, J., Han, F. and Liu, H. (2014). Challenges of big data analysis. Nat. Sci. Rev.1 293–314.
[23] Fan, J., Huang, T. and Li, R. (2007). Analysis of longitudinal data with semiparametric estimation of convariance function. J. Amer. Statist. Assoc.102 632–641. · Zbl 1172.62323
[24] Fan, J., Liao, Y. and Liu, H. (2016). An overview of the estimation of large covariance and precision matrices. Econom. J.19 C1–C32.
[25] Fan, J., Liao, Y. and Mincheva, M. (2013). Large covariance estimation by thresholding principal orthogonal complements. J. R. Stat. Soc. Ser. B. Stat. Methodol.75 603–680. With 33 discussions by 57 authors and a reply by Fan, Liao and Mincheva.
[26] Fassmann, H. and Munz, R. (1994). European East–West migration, 1945–1992. Int. Migr. Rev.28 520–538.
[27] Fosdick, B. K. and Raftery, A. E. (2014). Regional probabilistic fertility forecasting by modeling between-country correlations. Demogr. Res.30 1011–1034.
[28] Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics9 432–441. · Zbl 1143.62076
[29] Furrer, R. and Bengtsson, T. (2007). Estimation of high-dimensional prior and posterior covariance matrices in Kalman filter variants. J. Multivariate Anal.98 227–255. · Zbl 1105.62091
[30] Gneiting, T. and Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. J. Amer. Statist. Assoc.102 359–378. · Zbl 1284.62093
[31] Harris, J. R. and Todaro, M. P. (1970). Migration, unemployment and development: A two-sector analysis. Am. Econ. Rev.60 126–142.
[32] Hersbach, H. (2000). Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast.15 559–570.
[33] Huang, A. and Wand, M. P. (2013). Simple marginally noninformative prior distributions for covariance matrices. Bayesian Anal.8 439–451. · Zbl 1329.62135
[34] Huang, J. Z., Liu, N., Pourahmadi, M. and Liu, L. (2006). Covariance matrix selection and estimation via penalised normal likelihood. Biometrika93 85–98. · Zbl 1152.62346
[35] International Organization for Migration (2015). Migration Governance Framework (C/106/40). International Organization for Migration, Geneva. Available at https://governingbodies.iom.int/system/files/en/council/106/C-106-40-Migration-Governance-Framework.pdf.
[36] International Organization for Migration and McKinsey & Company (2018). More than Numbers: How Migration Data Can Deliver Real-Life Benefits for Migrants and Governments. International Organization for Migration, Geneva. Available at https://publications.iom.int/system/files/pdf/more_than_numbers.pdf.
[37] James, W. and Stein, C. (1961). Estimation with quadratic loss. In Proc. 4th Berkeley Sympos. Math. Statist. and Prob., Vol. I 361–379. Univ. California Press, Berkeley, CA. · Zbl 1281.62026
[38] Ledoit, O. and Wolf, M. (2003). Improved estimation of the covariance matrix of stock returns with an application to portfolio selection. J. Empir. Finance10 603–621.
[39] Ledoit, O. and Wolf, M. (2004). A well-conditioned estimator for large-dimensional covariance matrices. J. Multivariate Anal.88 365–411. · Zbl 1032.62050
[40] Ledoit, O. and Wolf, M. (2012). Nonlinear shrinkage estimation of large-dimensional covariance matrices. Ann. Statist.40 1024–1060. · Zbl 1274.62371
[41] Lee, E. S. (1966). A theory of migration. Demography3 47–57.
[42] Leonard, T. and Hsu, J. S. J. (1992). Bayesian inference for a covariance matrix. Ann. Statist.20 1669–1696. · Zbl 0765.62031
[43] Levina, E., Rothman, A. and Zhu, J. (2008). Sparse estimation of large covariance matrices via a nested Lasso penalty. Ann. Appl. Stat.2 245–263. · Zbl 1137.62338
[44] Liechty, J. C., Liechty, M. W. and Müller, P. (2004). Bayesian correlation estimation. Biometrika91 1–14. · Zbl 1132.62314
[45] Liu, H., Wang, L. and Zhao, T. (2014). Sparse covariance matrix estimation with eigenvalue constraints. J. Comput. Graph. Statist.23 439–459.
[46] Nocedal, J. and Wright, S. J. (2006). Numerical Optimization, 2nd ed. Springer, New York. · Zbl 1104.65059
[47] Okolski, M. Regional dimension of international migration in Central and Eastern Europe. Genus54 11–36.
[48] Pourahmadi, M. (2011). Covariance estimation: The GLM and regularization perspectives. Statist. Sci.26 369–387. · Zbl 1246.62139
[49] Raymer, J., Wiśniowski, A., Forster, J. J., Smith, P. W. F. and Bijak, J. (2013). Integrated modeling of European migration. J. Amer. Statist. Assoc.108 801–819. · Zbl 06224967
[50] Rogers, A. (1990). Requiem for the net migrant. Geogr. Anal.22 283–300.
[51] Sjaastad, L. A. (1962). The costs and returns of human migration. J. Polit. Econ.70 80–93.
[52] Stark, O. and Bloom, D. E. (1985). The new economics of labor migration. Am. Econ. Rev.75 173–178.
[53] Thielemann, E. (2008). The future of the common European asylum system. Eur. Policy Anal.1 1–8.
[54] Tierney, L. and Kadane, J. B. (1986). Accurate approximations for posterior moments and marginal densities. J. Amer. Statist. Assoc.81 82–86. · Zbl 0587.62067
[55] United Nations (2012). World Population Prospects: The 2012 Revision. United Nations, New York.
[56] United Nations (2016). Agreement Concerning the Relationship Between the United Nations and the International Organization for Migration (A/RES/70/976). United Nations, New York. Available at https://digitallibrary.un.org/record/837208/files/A_RES_70_296-EN.pdf.
[57] United Nations (2017). World Population Prospects: The 2017 Revision. United Nations, New York.
[58] Wei, G. C. and Tanner, M. A. (1990). A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms. J. Amer. Statist. Assoc.85 699–704.
[59] Wiśniowski, A., Smith, P. W., Bijak, J., Raymer, J. and Forster, J. J. (2015). Bayesian population forecasting: Extending the Lee–Carter method. Demography52 1035–1059.
[60] Wright, E. (2010). 2008-based national population projections for the United Kingdom and constituent countries. Popul. Trends139 91–114.
[61] Zhang, T. and Zou, H. (2014). Sparse precision matrix estimation via lasso penalized D-trace loss. Biometrika101 103–120. · Zbl 1285.62063
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.