Matching methods for causal inference: a review and a look forward. (English) Zbl 1328.62007

Summary: When estimating causal effects using observational data, it is desirable to replicate a randomized experiment as closely as possible by obtaining treated and control groups with similar covariate distributions. This goal can often be achieved by choosing well-matched samples of the original treated and control groups, thereby reducing bias due to the covariates. Since the 1970s, work on matching methods has examined how to best choose treated and control subjects for comparison. Matching methods are gaining popularity in fields such as economics, epidemiology, medicine and political science. However, until now the literature and related advice has been scattered across disciplines. Researchers who are interested in using matching methods – or developing methods related to matching – do not have a single place to turn to learn about past and current research. This paper provides a structure for thinking about matching methods and guidance on their use, coalescing the existing research (both old and new) and providing a summary of where the literature on matching methods is now and where it should be headed.


62-02 Research exposition (monographs, survey articles) pertaining to statistics
62K99 Design of statistical experiments


cem; DOS; rbounds; twang
Full Text: DOI arXiv Euclid


[1] Abadie, A. and Imbens, G. W. (2006). Large sample properties of matching estimators for average treatment effects. Econometrica 74 235-267. JSTOR: · Zbl 1112.62042
[2] Abadie, A. and Imbens, G. W. (2009a). Bias corrected matching estimators for average treatment effects. Journal of Educational and Behavioral Statistics . To appear. Available at . · Zbl 1214.62031
[3] Abadie, A. and Imbens, G. W. (2009b). Matching on the estimated propensity score. Working Paper 15301, National Bureau of Economic Research, Cambridge, MA.
[4] Agodini, R. and Dynarski, M. (2004). Are experiments the only option? A look at dropout prevention programs. Review of Economics and Statistics 86 180-194.
[5] Althauser, R. and Rubin, D. (1970). The computerized construction of a matched sample. American Journal of Sociology 76 325-346.
[6] Augurzky, B. and Schmidt, C. (2001). The propensity score: A means to an end. Discussion Paper 271, Institute for the Study of Labor (IZA).
[7] Austin, P. C. (2007). The performance of different propensity score methods for estimating marginal odds ratios. Stat. Med. 26 3078-3094.
[8] Austin, P. (2009). Using the standardized difference to compare the prevalence of a binary variable between two groups in observational research. Comm. Statist. Simulation Comput. 38 1228-1234. · Zbl 1167.62473
[9] Austin, P. C. and Mamdani, M. M. (2006). A comparison of propensity score methods: A case-study illustrating the effectiveness of post-ami statin use. Stat. Med. 25 2084-2106.
[10] Bang, H. and Robins, J. M. (2005). Doubly robust estimation in missing data and causal inference models. Biometrics 61 962-972. · Zbl 1087.62121
[11] Brookhart, M. A., Schneeweiss, S., Rothman, K. J., Glynn, R. J., Avorn, J. and Sturmer, T. (2006). Variable selection for propensity score models. American Journal of Epidemiology 163 1149-1156.
[12] Carpenter, R. (1977). Matching when covariables are normally distributed. Biometrika 64 299-307. · Zbl 0363.62056
[13] Chapin, F. (1947). Experimental Designs in Sociological Research . Harper, New York.
[14] Cochran, W. G. (1968). The effectiveness of adjustment by subclassification in removing bias in observational studies. Biometrics 24 295-313. JSTOR:
[15] Cochran, W. G. and Rubin, D. B. (1973). Controlling bias in observational studies: A review. Sankhyā Ser. A 35 417-446. · Zbl 0291.62012
[16] Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences , 2nd ed. Earlbaum, Hillsdale, NJ. · Zbl 0747.62110
[17] Cornfield, J. (1959). Smoking and lung cancer: Recent evidence and a discussion of some questions. Journal of the National Cancer Institute 22 173-200.
[18] Crump, R., Hotz, V. J., Imbens, G. W. and Mitnik, O. (2009). Dealing with limited overlap in estimation of average treatment effects. Biometrika 96 187-199. · Zbl 1163.62083
[19] Czajka, J. C., Hirabayashi, S., Little, R. and Rubin, D. B. (1992). Projecting from advance data using propensity modeling. J. Bus. Econom. Statist. 10 117-131.
[20] D’Agostino, Jr., R. B. and Rubin, D. B. (2000). Estimating and using propensity scores with partially missing data. J. Amer. Statist. Assoc. 95 749-759.
[21] Dehejia, R. H. and Wahba, S. (1999). Causal effects in nonexperimental studies: Re-evaluating the evaluation of training programs. J. Amer. Statist. Assoc. 94 1053-1062.
[22] Dehejia, R. H. and Wahba, S. (2002). Propensity score matching methods for non-experimental causal studies. Review of Economics and Statistics 84 151-161.
[23] Diamond, A. and Sekhon, J. S. (2006). Genetic matching for estimating causal effects: A general multivariate matching method for achieving balance in observational studies. Working paper. Univ. California, Berkeley. Available at .
[24] Drake, C. (1993). Effects of misspecification of the propensity score on estimators of treatment effects. Biometrics 49 1231-1236.
[25] Frangakis, C. E. and Rubin, D. B. (2002). Principal stratification in causal inference. Biometrics 58 21-29. JSTOR: · Zbl 1209.62288
[26] Glazerman, S., Levy, D. M. and Myers, D. (2003). Nonexperimental versus experimental estimates of earnings impacts. Annals of the American Academy of Political and Social Science 589 63-93.
[27] Greenland, S. (2003). Quantifying biases in causal models: Classical confounding vs collider-stratification bias. Epidemiology 14 300-306.
[28] Greenland, S. and Finkle, W. D. (1995). A critical look at methods for handling missing covariates in epidemiologic regression analyses. American Journal of Epidemiology 142 1255-1264.
[29] Greenland, S., Robins, J. M. and Pearl, J. (1999). Confounding and collapsibility in causal inference. Statist. Sci. 14 29-46. · Zbl 1059.62506
[30] Greenwood, E. (1945). Experimental Sociology: A Study in Method . King’s Crown Press, New York.
[31] Greevy, R., Lu, B., Silber, J. H. and Rosenbaum, P. (2004). Optimal multivariate matching before randomization. Biostatistics 5 263-275. · Zbl 1096.62078
[32] Gu, X. and Rosenbaum, P. R. (1993). Comparison of multivariate matching methods: Structures, distances, and algorithms. J. Comput. Graph. Statist. 2 405-420.
[33] Hansen, B. B. (2004). Full matching in an observational study of coaching for the SAT. J. Amer. Statist. Assoc. 99 609-618. · Zbl 1117.62349
[34] Hansen, B. B. (2008). The essential role of balance tests in propensity-matched observational studies: Comments on ‘A critical appraisal of propensity-score matching in the medical literature between 1996 and 2003’ by Peter Austin, Statistics in Medicine. Stat. Med. 27 2050-2054.
[35] Hansen, B. B. (2008). The prognostic analogue of the propensity score. Biometrika 95 481-488. · Zbl 1437.62485
[36] Harder, V. S., Stuart, E. A. and Anthony, J. (2010). Propensity score techniques and the assessment of measured covariate balance to test causal associations in psychological research. Psychological Methods .
[37] Heckman, J. J., Hidehiko, H. and Todd, P. (1997). Matching as an econometric evaluation estimator: Evidence from evaluating a job training programme. Rev. Econom. Stud. 64 605-654. · Zbl 0887.90039
[38] Heckman, J. J., Ichimura, H., Smith, J. and Todd, P. (1998). Characterizing selection bias using experimental data. Econometrica 66 1017-1098. JSTOR: · Zbl 1055.62573
[39] Heckman, J. J., Ichimura, H. and Todd, P. (1998). Matching as an econometric evaluation estimator. Rev. Econom. Stud. 65 261-294. JSTOR: · Zbl 0908.90059
[40] Heller, R., Rosenbaum, P. and Small, D. (2009). Split samples and design sensitivity in observational studies. J. Amer. Statist. Assoc. 104 1090-1101. · Zbl 1388.62231
[41] Hill, J. L. and Reiter, J. P. (2006). Interval estimation for treatment effects using propensity score matching. Stat. Med. 25 2230-2256.
[42] Hill, J., Reiter, J. and Zanutto, E. (2004). A comparison of experimental and observational data analyses. In Applied Bayesian Modeling and Causal Inference From an Incomplete-Data Perspective (A. Gelman and X.-L. Meng, eds.). Wiley, Hoboken, NJ. · Zbl 05274804
[43] Hill, J., Rubin, D. B. and Thomas, N. (1999). The design of the New York School Choice Scholarship Program evaluation. In Research Designs: Inspired by the Work of Donald Campbell , (L. Bickman, ed.) 155-180. Sage, Thousand Oaks, CA.
[44] Hirano, K., Imbens, G. W. and Ridder, G. (2003). Efficient estimation of average treatment effects using the estimated propensity score. Econometrica 71 1161-1189. JSTOR: · Zbl 1152.62328
[45] Ho, D. E., Imai, K., King, G. and Stuart, E. A. (2007). Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Political Analysis 15 199-236.
[46] Holland, P. W. (1986). Statistics and causal inference. J. Amer. Statist. Assoc. 81 945-960. JSTOR: · Zbl 0607.62001
[47] Hong, G. and Raudenbush, S. W. (2006). Evaluating kindergarten retention policy: A case study of causal inference for multilevel observational data. J. Amer. Statist. Assoc. 101 901-910. · Zbl 1120.62347
[48] Horvitz, D. and Thompson, D. (1952). A generalization of sampling without replacement from a finite universe. J. Amer. Statist. Assoc. 47 663-685. JSTOR: · Zbl 0047.38301
[49] Hudgens, M. G. and Halloran, M. E. (2008). Toward causal inference with interference. J. Amer. Statist. Assoc. 103 832-842. · Zbl 1471.62507
[50] Iacus, S. M., King, G. and Porro, G. (2009). CEM: Software for coarsened exact matching. J. Statist. Software 30 9. Available at .
[51] Imai, K. and van Dyk, D. A. (2004). Causal inference with general treatment regimes: Generalizing the propensity score. J. Amer. Statist. Assoc. 99 854-866. · Zbl 1117.62361
[52] Imai, K., King, G. and Stuart, E. A. (2008). Misunderstandings among experimentalists and observationalists in causal inference. J. Roy. Statist. Soc. Ser. A 171 481-502. · Zbl 05529657
[53] Imbens, G. W. (2000). The role of the propensity score in estimating dose-response functions. Biometrika 87 706-710. JSTOR: · Zbl 1120.62334
[54] Imbens, G. W. (2004). Nonparametric estimation of average treatment effects under exogeneity: A review. Review of Economics and Statistics 86 4-29.
[55] Joffe, M. M. and Rosenbaum, P. R. (1999). Propensity scores. American Journal of Epidemiology 150 327-333.
[56] Joffe, M. M., Ten Have, T. R., Feldman, H. I. and Kimmel, S. E. (2004). Model selection, confounder control, and marginal structural models. Amer. Statist. 58 272-279.
[57] Kang, J. D. and Schafer, J. L. (2007). Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data. Statist. Sci. 22 523-539. · Zbl 1246.62073
[58] Keele, L. (2009). rbounds: An R package for sensitivity analysis with matched data. R package. Available at .
[59] King, G. and Zeng, L. (2006). The dangers of extreme counterfactuals. Political Analysis 14 131-159.
[60] Kurth, T., Walker, A. M., Glynn, R. J., Chan, K. A., Gaziano, J. M., Berger, K. and Robins, J. M. (2006). Results of multivariable logistic regresion, propensity matching, propensity adjustment, and propensity-based weighting under conditions of nonuniform effect. American Journal of Epidemiology 163 262-270.
[61] Lechner, M. (2002). Some practical issues in the evaluation of heterogeneous labour market programmes by matching methods. J. Roy. Statist. Soc. Ser. A 165 59-82. JSTOR: · Zbl 1001.62534
[62] Lee, B., Lessler, J. and Stuart, E. A. (2009). Improving propensity score weighting using machine learning. Stat. Med. 29 337-346.
[63] Li, Y. P., Propert, K. J. and Rosenbaum, P. R. (2001). Balanced risk set matching. J. Amer. Statist. Assoc. 96 455, 870-882. JSTOR: · Zbl 1047.62112
[64] Lu, B., Zanutto, E., Hornik, R. and Rosenbaum, P. R. (2001). Matching with doses in an observational study of a media campaign against drug abuse. J. Amer. Statist. Assoc. 96 1245-1253. JSTOR: · Zbl 1051.62113
[65] Lunceford, J. K. and Davidian, M. (2004). Stratification and weighting via the propensity score in estimation of causal treatment effects: A comparative study. Stat. Med. 23 2937-2960.
[66] Lunt, M., Solomon, D., Rothman, K., Glynn, R., Hyrich, K., Symmons, D. P., Sturmer, T., the British Society for Rheumatology Biologics Register and the British Society for Rheumatology Biologics Register Contrl Centre Consortium (2009). Different methods of balancing covariates leading to different effect estimates in the presence of effect modification. American Journal of Epidemiology 169 909-917.
[67] McCaffrey, D. F., Ridgeway, G. and Morral, A. R. (2004). Propensity score estimation with boosted regression for evaluating causal effects in observational studies. Psychological Methods 9 403-425.
[68] Ming, K. and Rosenbaum, P. R. (2001). A note on optimal matching with variable controls using the assignment algorithm. J. Comput. Graph. Statist. 10 455-463. JSTOR: · Zbl 04568633
[69] Morgan, S. L. and Harding, D. J. (2006). Matching estimators of causal effects: Prospects and pitfalls in theory and practice. Sociological Methods & Research 35 3-60.
[70] Potter, F. J. (1993). The effect of weight trimming on nonlinear survey estimates. In Proceedings of the Section on Survey Research Methods of American Statistical Association . Amer. Statist. Assoc., San Francisco, CA.
[71] Qu, Y. and Lipkovich, I. (2009). Propensity score estimation with missing values using a multiple imputation missingness pattern (MIMP) approach. Stat. Med. 28 1402-1414.
[72] Reinisch, J., Sanders, S., Mortensen, E. and Rubin, D. B. (1995). In utero exposure to phenobarbital and intelligence deficits in adult men. Journal of the American Medical Association 274 1518-1525.
[73] Ridgeway, G., McCaffrey, D. and Morral, A. (2006). twang: Toolkit for weighting and analysis of nonequivalent groups. Software for using matching methods in R. Available at .
[74] Robins, J. and Rotnitzky, A. (1995). Semiparametric efficiency in multivariate regression models with missing data. J. Amer. Statist. Assoc. 90 122-129. JSTOR: · Zbl 0818.62043
[75] Robins, J. M., Hernan, M. A. and Brumback, B. (2000). Marginal structural models and causal inference in epidemiology. Epidemiology 11 550-560. · Zbl 0647.62093
[76] Robins, J. M., Mark, S. and Newey, W. (1992). Estimating exposure effects by modelling the expectation of exposure conditional on confounders. Biometrics 48 479-495. JSTOR: · Zbl 0768.62099
[77] Rosenbaum, P. R. (1984). The consequences of adjustment for a concomitant variable that has been affected by the treatment. J. Roy. Statist. Soc. Ser. A 147 656-666.
[78] Rosenbaum, P. R. (1987a). Model-based direct adjustment. J. Amer. Statist. Assoc. 82 387-394. · Zbl 0622.62010
[79] Rosenbaum, P. R. (1987b). The role of a second control group in an observational study (with discussion). Statist. Sci. 2 292-316.
[80] Rosenbaum, P. R. (1991). A characterization of optimal designs for observational studies. J. Roy. Statist. Soc. Ser. B 53 597-610. JSTOR: · Zbl 0800.62465
[81] Rosenbaum, P. R. (1999). Choice as an alternative to control in observational studies (with discussion). Statist. Sci. 14 259-304. · Zbl 1059.62699
[82] Rosenbaum, P. R. (2002). Observational Studies , 2nd ed. Springer, New York. · Zbl 0985.62091
[83] Rosenbaum, P. R. (2010). Design of Observational Studies . Springer, New York. · Zbl 1308.62005
[84] Rosenbaum, P. R. and Rubin, D. B. (1983a). Assessing sensitivity to an unobserved binary covariate in an observational study with binary outcome. J. Roy. Statist. Soc. Ser. B 45 212-218.
[85] Rosenbaum, P. R. and Rubin, D. B. (1983b). The central role of the propensity score in observational studies for causal effects. Biometrika 70 41-55. JSTOR: · Zbl 0522.62091
[86] Rosenbaum, P. R. and Rubin, D. B. (1984). Reducing bias in observational studies using subclassification on the propensity score. J. Amer. Statist. Assoc. 79 516-524.
[87] Rosenbaum, P. R. and Rubin, D. B. (1985a). The bias due to incomplete matching. Biometrics 41 103-116. JSTOR: · Zbl 0607.62137
[88] Rosenbaum, P. R. and Rubin, D. B. (1985b). Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. Amer. Statist. 39 33-38.
[89] Rosenbaum, P. R., Ross, R. N. and Silber, J. H. (2007). Minimum distance matched sampling with fine balance in an observational study of treatment for ovarian cancer. J. Amer. Statist. Assoc. 102 75-83. · Zbl 1284.62670
[90] Rubin, D. B. (1973a). Matching to remove bias in observational studies. Biometrics 29 159-184.
[91] Rubin, D. B. (1973b). The use of matched sampling and regression adjustment to remove bias in observational studies. Biometrics 29 185-203.
[92] Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology 66 688-701.
[93] Rubin, D. B. (1976a). Inference and missing data (with discussion). Biometrika 63 581-592. JSTOR: · Zbl 0344.62034
[94] Rubin, D. B. (1976b). Multivariate matching methods that are equal percent bias reducing, I: Some examples. Biometrics 32 109-120. JSTOR: · Zbl 0326.62043
[95] Rubin, D. B. (1979). Using multivariate matched sampling and regression adjustment to control bias in observational studies. J. Amer. Statist. Assoc. 74 318-328. · Zbl 0413.62047
[96] Rubin, D. B. (1980). Bias reduction using Mahalanobis metric matching. Biometrics 36 293-298. · Zbl 0463.62015
[97] Rubin, D. B. (1987). Multiple Imputation for Nonresponse in Surveys . Wiley, New York. · Zbl 1070.62007
[98] Rubin, D. B. (2001). Using propensity scores to help design observational studies: Application to the tobacco litigation. Health Services & Outcomes Research Methodology 2 169-188.
[99] Rubin, D. B. (2004). On principles for modeling propensity scores in medical research. Pharmacoepidemiology and Drug Safety 13 855-857.
[100] Rubin, D. B. (2006). Matched Sampling for Causal Inference . Cambridge Univ. Press, Cambridge. · Zbl 1118.62113
[101] Rubin, D. B. (2007). The design versus the analysis of observational studies for causal effects: Parallels with the design of randomized trials. Stat. Med. 26 20-36.
[102] Rubin, D. B. and Stuart, E. A. (2006). Affinely invariant matching methods with discriminant mixtures of proportional ellipsoidally symmetric distributions. Ann. Statist. 34 1814-1826. · Zbl 1246.62159
[103] Rubin, D. B. and Thomas, N. (1992a). Affinely invariant matching methods with ellipsoidal distributions. Ann. Statist. 20 1079-1093. · Zbl 0761.62065
[104] Rubin, D. B. and Thomas, N. (1992b). Characterizing the effect of matching using linear propensity score methods with normal distributions. Biometrika 79 797-809. JSTOR: · Zbl 0765.62098
[105] Rubin, D. B. and Thomas, N. (1996). Matching using estimated propensity scores, relating theory to practice. Biometrics 52 249-264. · Zbl 0881.62121
[106] Rubin, D. B. and Thomas, N. (2000). Combining propensity score matching with additional adjustments for prognostic covariates. J. Amer. Statist. Assoc. 95 573-585.
[107] Schafer, J. L. and Kang, J. D. (2008). Average causal effects from nonrandomized studies: A practical guide and simulated case study. Psychological Methods 13 279-313.
[108] Scharfstein, D. O., Rotnitzky, A. and Robins, J. M. (1999). Adjusting for non-ignorable drop-out using semiparametric non-response models. J. Amer. Statist. Assoc. 94 1096-1120. JSTOR: · Zbl 1072.62644
[109] Schneider, E. C., Zaslavsky, A. M. and Epstein, A. M. (2004). Use of high-cost operative procedures by Medicare beneficiaries enrolled in for-profit and not-for-profit health plans. The New England Journal of Medicine 350 143-150.
[110] Setoguchi, S., Schneeweiss, S., Brookhart, M. A., Glynn, R. J. and Cook, E. F. (2008). Evaluating uses of data mining techniques in propensity score estimation: A simulation study. Pharmacoepidemiology and Drug Safety 17 546-555.
[111] Shadish, W. R., Clark, M. and Steiner, P. M. (2008). Can nonrandomized experiments yield accurate answers? A randomized experiment comparing random and nonrandom assignments. J. Amer. Statist. Assoc. 103 1334-1344. · Zbl 1286.62013
[112] Smith, H. (1997). Matching with multiple controls to estimate treatment effects in observational studies. Sociological Methodology 27 325-353.
[113] Snedecor, G. W. and Cochran, W. G. (1980). Statistical Methods , 7th ed. Iowa State Univ. Press, Ames, IA. · Zbl 0727.62003
[114] Sobel, M. E. (2006). What do randomized studies of housing mobility demonstrate?: Causal inference in the face of interference. J. Amer. Statist. Assoc. 101 1398-1407. · Zbl 1171.62365
[115] Song, J., Belin, T. R., Lee, M. B., Gao, X. and Rotheram-Borus, M. J. (2001). Handling baseline differences and missing items in a longitudinal study of HIV risk among runaway youths. Health Services & Outcomes Research Methodology 2 317-329.
[116] Stuart, E. A. (2008). Developing practical recommendations for the use of propensity scores: Discussion of “A critical appraisal of propensity score matching in the medical literature between 1996 and 2003” by P. Austin. Stat. Med. 27 2062-2065.
[117] Stuart, E. A. and Green, K. M. (2008). Using full matching to estimate causal effects in non-experimental studies: Examining the relationship between adolescent marijuana use and adult outcomes. Developmental Psychology 44 395-406.
[118] Stuart, E. A. and Ialongo, N. S. (2009). Matching methods for selection of subjects for follow-up. Multivariate Behavioral Research .
[119] Wacholder, S. and Weinberg, C. R. (1982). Paired versus two-sample design for a clinical trial of treatments with dichotomous outcome: Power considerations. Biometrics 38 801-812.
[120] Weitzen, S., Lapane, K. L., Toledano, A. Y., Hume, A. L. and Mor, V. (2004). Principles for modeling propensity scores in medical research: A systematic literature review. Pharmacoepidemiology and Drug Safety 13 841-853.
[121] Zhao, Z. (2004). Using matching to estimate treatment effects: Data requirements, matching metrics, and Monte Carlo evidence. Review of Economics and Statistics 86 91-107.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.