People born in the Middle East but residing in the Netherlands: invariant population size estimates and the role of active and passive covariates. (English) Zbl 1454.62508

Summary: Including covariates in loglinear models of population registers improves population size estimates for two reasons. First, it is possible to take heterogeneity of inclusion probabilities over the levels of a covariate into account; and second, it allows subdivision of the estimated population by the levels of the covariates, giving insight into characteristics of individuals that are not included in any of the registers. The issue of whether or not marginalizing the full table of registers by covariates over one or more covariates leaves the estimated population size estimate invariant is intimately related to collapsibility of contingency tables [S. Asmussen and D. Edwards, Biometrika 70, 567–578 (1983; Zbl 0549.62041)]. We show that, with information from two registers, population size invariance is equivalent to the simultaneous collapsibility of each margin consisting of one register and the covariates. We give a short path characterization of the loglinear model which describes when marginalizing over a covariate leads to different population size estimates. Covariates that are collapsible are called passive, to distinguish them from covariates that are not collapsible and are termed active. We make the case that it can be useful to include passive covariates within the estimation model, because they allow a finer description of the population in terms of these covariates. As an example we discuss the estimation of the population size of people born in the Middle East but residing in the Netherlands.


62P25 Applications of statistics to social sciences
62D05 Sampling theory, sample surveys


Zbl 0549.62041
Full Text: DOI arXiv Euclid


[1] Asmussen, S. and Edwards, D. (1983). Collapsibility and response variables in contingency tables. Biometrika 70 567-578. · Zbl 0549.62041
[2] Baker, S. (1990). A simple EM algorithm for capture-recapture data with categorical covariates (with discussion). Biometrics 46 1193-1197. · Zbl 1132.68649
[3] Bartolucci, F. and Forcina, A. (2001). Analysis of capture-recapture data with a Rasch-type model allowing for conditional dependence and multidimensionality. Biometrics 57 714-719. · Zbl 1209.62371
[4] Bishop, Y. M. M., Fienberg, S. E. and Holland, P. W. (1975). Discrete Multivariate Analysis : Theory and Practice . The MIT Press, Cambridge, MA. · Zbl 0332.62039
[5] Buckland, S. and Garthwire, P. (1991). Quantifying precision of mark-recapture estimates using the bootstrap and related methods. Biometrics 47 255-268.
[6] Chao, A., Tsay, P. K., Lin, S. H., Shau, W. Y. and Chao, D. Y. (2001). The applications of capture-recapture models to epidemiological data. Stat. Med. 20 3123-3157.
[7] Cormack, R. (1989). Log-linear models for capture-recapture. Biometrics 45 395-413. · Zbl 0707.62244
[8] Fienberg, S. E. (1972). The multiple recapture census for closed populations and incomplete \(2^{k}\) contingency tables. Biometrika 59 591-603. · Zbl 0255.62048
[9] Fienberg, S., Johnson, M. and Junker, B. (1999). Classical multilevel and Bayesian approaches to population size estimation using multiple lists. J. Roy. Statist. Soc. Ser. A 162 383-406.
[10] Hessen, D. J. (2011). Loglinear representations of multivariate Bernoulli Rasch models. British J. Math. Statist. Psych. 64 337-354. · Zbl 1218.62127
[11] Hickman, L. J. and Suttorp, M. J. (2008). Are deportable aliens a unique threat to public safety? Comparing the recidivism of deportable and nondeportable aliens. Crime and Public Policy 7 59-82.
[12] IWGDMF: International Working Group for Disease Monitoring and Forecasting (1995). Capture-recapture and multiple record systems estimation. Part i. History and theoretical development. American Journal of Epidemiology 142 1059-1068.
[13] Kim, S.-H. and Kim, S.-H. (2006). A note on collapsibility in DAG models of contingency tables. Scand. J. Stat. 33 575-590. · Zbl 1113.62070
[14] Little, R. J. A. and Rubin, D. B. (1987). Statistical Analysis with Missing Data . Wiley, New York. · Zbl 0665.62004
[15] Meng, X. L. and Rubin, D. B. (1991). IPF for contingency tables with missing data via the ECM algorithm. In Proceedings of the Statistical Computing Section of the American Statistical Association 244-247. Amer. Statist. Assoc., Washington, DC.
[16] Pollock, K. H. (2002). The use of auxiliary variables in capture-recapture modelling: An overview. J. Appl. Stat. 29 85-106. · Zbl 1346.62152
[17] Schafer, J. L. (1997a). Analysis of Incomplete Multivariate Data. Monographs on Statistics and Applied Probability 72 . Chapman & Hall, London. · Zbl 0997.62510
[18] Schafer, J. (1997b). Imputation of missing covariates under a general linear mixed model. Dept. Statistics, Penn State Univ.
[19] Sutherland, J. M., Schwarz, C. J. and Rivest, L.-P. (2007). Multilist population estimation with incomplete and partial stratification. Biometrics 63 910-916. · Zbl 1146.62106
[20] Valente, P. (2010). Main results of the UNECE/UNSD survey on the 2010 / 2011 round of censuses in the UNECE region . Eurostat, Luxembourg.
[21] van der Heijden, P. G. M., Zwane, E. and Hessen, D. (2009). Structurally missing data problems in multiple list capture-recapture data. AStA Adv. Stat. Anal. 93 5-21. · Zbl 1379.62101
[22] van der Heijden, P. G. M., Whittaker, J., Cruyff, M., Bakker, B. and van der Vliet, R. (2012). Supplement to “People born in the Middle East but residing in the Netherlands: Invariant population size estimates and the role of active and passive covariates.” . · Zbl 1454.62508
[23] Whittaker, J. (1990). Graphical Models in Applied Multivariate Statistics . Wiley, Chichester. · Zbl 0732.62056
[24] Zwane, E. N. and van der Heijden, P. G. M. (2007). Analysing capture-recapture data when some variables of heterogeneous catchability are not collected or asked in all registrations. Stat. Med. 26 1069-1089.
[25] Zwane, E., van der Pal, K. and van der Heijden, P. G. M. (2004). The multiple-record systems estimator when registrations refer to different but overlapping populations. Stat. Med. 23 2267-2281.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.