The essential role of pair matching in cluster-randomized experiments, with application to the Mexican Universal Health Insurance evaluation. (English) Zbl 1327.62061

Summary: A basic feature of many field experiments is that investigators are only able to randomize clusters of individuals – such as households, communities, firms, medical practices, schools or classrooms – even when the individual is the unit of interest. To recoup the resulting efficiency loss, some studies pair similar clusters and randomize treatment within pairs. However, many other studies avoid pairing, in part because of claims in the literature, echoed by clinical trials standards organizations, that this matched-pair, cluster-randomization design has serious problems. We argue that all such claims are unfounded. We also prove that the estimator recommended for this design in the literature is unbiased only in situations when matching is unnecessary; its standard error is also invalid. To overcome this problem without modeling assumptions, we develop a simple design-based estimator with much improved statistical properties. We also propose a model-based approach that includes some of the benefits of our design-based estimator as well as the estimator in the literature. Our methods also address individual-level noncompliance, which is common in applications but not allowed for in most existing methods. We show that from the perspective of bias, efficiency, power, robustness or research costs, and in large or small samples, pairing should be used in cluster-randomized experiments whenever feasible; failing to do so is equivalent to discarding a considerable fraction of one’s data. We develop these techniques in the context of a randomized evaluation we are conducting of the Mexican Universal Health Insurance Program.


62D05 Sampling theory, sample surveys
62P10 Applications of statistics to biology and medical sciences; meta analysis
91B30 Risk theory, insurance (MSC2010)
Full Text: DOI arXiv Euclid


[1] Angrist, J. and Lavy, V. (2002). The effect of high school matriculation awards: Evidence from randomized trials. Working Paper 9389, National Bureau of Economic Research, Washington, DC.
[2] Angrist, J. D., Imbens, G. W. and Rubin, D. B. (1996). Identification of causal effects using instrumental variables (with discussion). J. Amer. Statist. Assoc. 91 444-455. · Zbl 0897.62130
[3] Arceneaux, K. (2005). Using cluster randomized field experiments to study voting behavior. The Annals of the American Academy of Political and Social Science 601 169-179.
[4] Ball, S. and Bogatz, G. A. (1972). Reading with television: An evaluation of the electric company. Technical Report PR-72-2, Educational Testing Service, Princeton, NJ.
[5] Bloom, H. S. (2006). The core analytics of randomized experiments for social research. Technical report, MDRC.
[6] Box, G. E., Hunger, W. G. and Hunter, J. S. (1978). Statistics for Experimenters . Wiley, New York. · Zbl 0394.62003
[7] Braun, T. M. and Feng, Z. (2001). Optimal permutation tests for the analysis of group randomized trials. J. Amer. Statist. Assoc. 96 1424-1432. JSTOR: · Zbl 1051.62042
[8] Campbell, M., Elbourne, D. and Altman, D. (2004). CONSORT statement: Extension to cluster randomised trials. BMJ 328 702-708.
[9] Campbell, M., Mollison, J. and Grimshaw, J. (2001). Cluster trials in implementation research: Estimation of intracluster correlation coefficients and sample size. Statist. Med. 20 391-399.
[10] Campbell, M. J. (2004). Editorial: Extending consort to include cluster trials. BMJ 328 654-655. Available at http://www.bmj.com/cgi/content/full/328/7441/654
[11] Cornfield, J. (1978). Randomization by group: A formal analysis. American Journal of Epidemiology 108 100-102.
[12] Cox, D. R. (1958). Planning of Experiments . Wiley, New York. · Zbl 0084.15802
[13] Donner, A. (1987). Statistical methodology for paired cluster designs. American Journal of Epidemiology 126 972-979.
[14] Donner, A. (1998). Some aspects of the design and analysis of cluster randomization trials. Appl. Statist. 47 95-113.
[15] Donner, A. and Donald, A. (1987). Analysis of data arising from a stratified design with the cluster as unit of randomization. Statist. Med. 6 43-52.
[16] Donner, A. and Hauck, W. (1989). Estimation of a common odds ration in paired-cluster randomization designs. Statist. Med. 8 599-607.
[17] Donner, A. and Klar, N. (1993). Confidence interval construction for effect measures arising from cluster randomization trials. Journal of Clinical Epidemiology 46 123-131.
[18] Donner, A. and Klar, N. (2000a). Design and Analysis of Cluster Randomization Trials in Health Research . Oxford Univ. Press, New York.
[19] Donner, A. and Klar, N. (2000b). Design and Analysis of Cluster Randomization Trials in Health Research . Arnold, London.
[20] Donner, A. and Klar, N. (2004). Pitfalls of and controversies in cluster randomization trials. American Journal of Public Health 94 416-422.
[21] Feng, Z., Diehr, P., Peterson, A. and McLerran, D. (2001). Selected statistical issues in group randomized trials. Annual Review of Public Health 22 167-187.
[22] Fisher, R. A. (1935). The Design of Experiments . Oliver and Boyd, London.
[23] Frangakis, C. E., Rubin, D. B. and Zhou, X.-H. (2002). Clustered encouragement designs with individual noncompliance: Bayesian inference with randomization, and application to advance directive forms (with discussion). Biostatistics 3 147-164. · Zbl 1134.62319
[24] Frenk, J., Sepúlveda, J., Gómez-Dantés, O. and Knaul, F. (2003). Evidence-based health policy: Three generations of reform in Mexico. The Lancet 362 1667-1671.
[25] Gail, M. H., Byar, D. P., Pechacek, T. F. and Corle, D. K. (1992). Aspects of statistical design for the community intervention trial for smoking cessation (COMMIT). Controlled Clinical Trials 13 16-21.
[26] Gail, M. H., Mark, S. D., Carroll, R. J., Green, S. B. and Pee, D. (1996). On design considerations and randomization-based inference for community intervention trials. Statist. Med. 15 1069-1992.
[27] Greevy, R., Lu, B., Silber, J. H. and Rosenbaum, P. (2004). Optimal multivariate matching before randomization. Biostatistics 5 263-275. · Zbl 1096.62078
[28] Hayes, R. and Bennett, S. (1999). Simple sample size calculation for cluster-randomized trials. International Journal of Epidemiology 28 319-326.
[29] Higgins, J. and Green, S., eds. (2006). Cochrane Handbook for Systematic Review of Interventions 4.2.5 [Updated September 2006] . Wiley, Chichester, UK.
[30] Hill, J. L., Rubin, D. B. and Thomas, N. (1999). The design of the New York school choice scholarship program evaluation. In Research Designs: Inspired by the Work of Donald Campbell (L. Bickman, ed.) 155-180. Sage, Thousand Oaks.
[31] Holland, P. W. (1986). Statistics and causal inference. J. Amer. Statist. Assoc. 81 945-960. JSTOR: · Zbl 0607.62001
[32] Imai, K. (2008). Variance identification and efficiency analysis in randomized experiments under the matched-pair design. Statist. Med. 27 4857-4873.
[33] Imai, K., King, G. and Stuart, E. A. (2008). Misunderstandings among experimentalists and observationalists about causal inference. J. Roy. Statist. Soc. , Ser. A 171 481-502. · Zbl 05529657
[34] Imai, K., King, G. and Nall, C. (2009). Replication data for: The essential role of pair matching in cluster-randomized experiments, with application to the Mexican universal health insurance evaluation hdl:1902.1/11047 UNF:3:jeUN9XODtYUp2iUbe8gWZQ== Murray Research Archive [Distributor]. · Zbl 1327.62061
[35] Kalton, G. (1968). Standardization: A technique to control for extraneous variables. Appl. Statist. 17 118-136.
[36] King, G., Gakidou, E., Ravishankar, N., Moore, R. T., Lakin, J., Vargas, M., Téllez-Rojo, M. M., Ávila, J. E. H., Ávila, M. H. and Llamas, H. H. (2007). A ‘politically robust’ experimental design for public policy evaluation, with application to the Mexican universal health insurance program. Journal of Policy Analysis and Management 26 479-506. Available at http://gking.harvard.edu/files/abs/spd-abs.shtml.
[37] King, G., Gakidou, E., Imai, K., Lakin, J., Moore, R. T., Ravishankar, N., Vargas, M., Tèllez-Rojo, M. M., Ávila, J. E. H., Ávila, M. H. and Llamas, H. H. (2009). Public policy for the poor? A randomised assessment of the Mexican universal health insurance programme. The Lancet . To appear. Available at http://gking.harvard.edu/files/abs/spi-abs.shtml.
[38] Klar, N. and Donner, A. (1997). The merits of matching in community intervention trials: A cautionary tale. Statist. Med. 16 1753-1764.
[39] Klar, N. and Donner, A. (1998). Author’s reply. Statist. Med. 17 2151-2152.
[40] Maldonado, G. and Greenland, S. (2002). Estimating causal effects. International Journal of Epidemiology 31 422-429.
[41] Martin, D. C., Diehr, P., Perrin, E. B. and Koepsell, T. D. (1993). The effect of matching on the power of randomized community intervention studies. Statist. Med. 12 329-338.
[42] McLaughlan, G. and Peel, D. (2000). Finite Mixture Models . Wiley, New York.
[43] Medical Research Council (2002). Cluster randomized trials: Methodological and ethical considerations. Technical report, MRC Clinical Trials Series. Available at http://www.mrc.ac.uk/Utilities/Documentrecord/index.htm?d=MRC002406.
[44] Moulton, L. (2004). Covariate-based constrained randomization of group-randomized trials. Clinical Trials 1 297.
[45] Murray, D. M. (1998). Design and Analysis of Community Trials . Oxford Univ. Press, Oxford.
[46] Neyman, J. (1923). On the application of probability theory to agricultural experiments: Essay on principles, section 9. Statist. Sci. 5 465-480. (Translated in 1990.)
[47] Raudenbush, S. W. (1997). Statistical analysis and optimal design for cluster-randomized trials. Psychological Methods 2 173-185.
[48] Raudenbush, S. W., Martinez, A. and Spybrook, J. (2007). Strategies for improving precision in group-randomized experiments. Educational Evaluation and Policy Analysis 29 5-29.
[49] Rosenbaum, P. R. (2007). Interference between units in randomized experiments. J. Amer. Statist. Assoc. 102 191-200. · Zbl 1284.62494
[50] Rubin, D. B. (1990). Comments on “On the application of probability theory to agricultural experiments. Essay on principles. Section 9” by J. Splawa-Neyman translated from the Polish and edited by D. M. Dabrowska and T. P. Speed. Statist. Sci. 5 472-480. · Zbl 0955.01559
[51] Rubin, D. B. (1991). Practical implications of modes of statistical inference for causal effects and the critical role of the assignment mechanism. Biometrics 47 1213-1234. JSTOR: · Zbl 0825.62832
[52] Small, D., Ten Have, T. and Rosenbaum, P. (2008). Randomization inference in a group-randomized trial of treatments for depression: Covariate adjustment, noncompliance and quantile effects. J. Amer. Statist. Assoc. 103 271-279. · Zbl 1471.62513
[53] Snedecor, G. W. and Cochran, W. G. (1989). Statistical Methods , 8th ed. Iowa State Univ. Press, Ames, IA. · Zbl 0727.62003
[54] Sobel, M. E. (2006). What do randomized studies of housing mobility demonstrate?: Causal inference in the face of interference. J. Amer. Statist. Assoc. 101 1398-1407. · Zbl 1171.62365
[55] Sommer, A., Djunaedi, E., Loeden, A. A., Tarwotjo, I. J., West, K. P. and Tilden, R. (1986). Impact of vitamin A supplementation on childhood mortality, a randomized clinical trial. Lancet 1 1169-1173.
[56] Thompson, S. G. (1998). Letter to the editor: The merits of matching in community intervention trials: A cautionary tale by N. Klar and A. Donner. Statist. Med. 17 2149-2151.
[57] Turner, R. M., White, I. R. and Croudace, T. (2007). Analysis of cluster-randomized cross-over data. Statist. Med. 26 274-289.
[58] Varnell, S., Murray, D., Janega, J. and Blitstein, J. (2004). Design and analysis of group-randomized trials: A review of recent practices. American Journal of Public Health 93 393-399.
[59] Wei, L. J. (1982). Interval estimation of location difference with incomplete data. Biometrika 69 249-251. JSTOR:
[60] What Works Clearinghouse (2006). Evidence standards for reviewing studies. Technical report, Institute for Educational Sciences. Available at http://www.whatworks.ed.gov/reviewprocess/standards.html.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.