×

Semiparametric efficiency in GMM models with auxiliary data. (English) Zbl 1133.62023

Summary: We study semiparametric efficiency bounds and efficient estimation of parameters defined through general moment restrictions with missing data. Identification relies on auxiliary data containing information about the distribution of the missing variables conditional on proxy variables that are observed in both the primary and the auxiliary database, when such distribution is common to the two data sets. The auxiliary sample can be independent of the primary sample, or can be a subset of it. For both cases, we derive bounds when the probability of missing data given the proxy variables is unknown, or known, or belongs to a correctly specified parametric family. We find that the conditional probability is not ancillary when the two samples are independent. For all cases, we discuss efficient semiparametric estimators. An estimator based on a conditional expectation projection is shown to require milder regularity conditions than one based on inverse probability weighting.

MSC:

62G05 Nonparametric estimation
62F12 Asymptotic properties of parametric estimators
62G20 Asymptotic properties of nonparametric inference
62H12 Estimation in multivariate analysis
62D05 Sampling theory, sample surveys

References:

[1] Ai, C. and Chen, X. (2003). Efficient estimation of models with conditional moment restrictions containing unknown functions. Econometrica 71 1795-1843. JSTOR: · Zbl 1154.62323 · doi:10.1111/1468-0262.00470
[2] Begun, J., Hall, W., Huang, W. and Wellner, J. (1983). Information and asymptotic efficiency in parametric-nonparametric models. Ann. Statist. 11 432-452. · Zbl 0526.62045 · doi:10.1214/aos/1176346151
[3] Bickel, P. J., Klaassen, C. A., Ritov, Y. and Wellner, J. A. (1993). Efficient and Adaptive Estimation for Semiparametric Models . Johns Hopkins Univ. Press, Baltimore, MD. · Zbl 0786.62001
[4] Breslow, N. E., McNeney, B. and Wellner, J. A. (2003). Large sample theory for semiparametric regression models with two-phase, outcome dependent sampling. Ann. Statist. 31 1110-1139. · Zbl 1105.62335 · doi:10.1214/aos/1059655907
[5] Breslow, N. E., Robins, J. M. and Wellner, J. A. (2000). On the semiparametric efficiency of logistic regression under case-control sampling. Bernoulli 6 447-455. · Zbl 0965.62033 · doi:10.2307/3318670
[6] Carroll, R., Ruppert, D. and Stefanski, L. (1995). Measurement Error in Nonlinear Models . Chapman and Hall, New York. · Zbl 0853.62048
[7] Carroll, R. J. and Wand, M. P. (1991). Semiparametric estimation in logistic measurement error models. J. Roy. Statist. Soc. Ser. B 53 573-585. JSTOR: · Zbl 0800.62220
[8] Chen, J. B. and Breslow, N. E. (2004). Semiparametric efficient estimation for the auxiliary outcome problem with the conditional mean model. Canad. J. Statist. 32 359-372. JSTOR: · Zbl 1059.62007 · doi:10.2307/3316021
[9] Chen, X., Hong, H. and Tamer, E. (2005). Measurement error models with auxiliary data. Rev. Economic Studies 72 343-366. · Zbl 1130.91038 · doi:10.1111/j.1467-937X.2005.00335.x
[10] Chen, X., Linton, O. and van Keilegom, I. (2003). Estimation of semiparametric models when the criterion function is not smooth. Econometrica 71 1591-1608. JSTOR: · Zbl 1154.62325 · doi:10.1111/1468-0262.00461
[11] Chen, X. and Shen, X. (1998). Sieve extremum estimates for weakly dependent data. Econometrica 66 289-314. JSTOR: · Zbl 1055.62544 · doi:10.2307/2998559
[12] Clogg, C., Rubin, D., Schenker, N., Schultz, B. and Weidman, L. (1991). Multiple imputation of industry and occupation codes in census public-use samples using Bayesian logistic regression. J. Amer. Statist. Assoc. 86 68-78.
[13] Deaton, A. (2003). Adjusted Indian poverty estimates for 1999-2000. Economic and Political Weekly 38 322-326.
[14] Deaton, A. and Drèze, J. (2002). Poverty and inequality in India, a re-examination. Economic and Political Weekly 37 3729-3748.
[15] Deaton, A. and Kozel, V., eds. (2005). Data and Dogma : The Great Indian Poverty Debate . MacMillian, New Delhi, India.
[16] Gallant, A. R. and Nychka, D. W. (1987). Semi-nonparametric maximum likelihood estimation. Econometrica 55 363-390. JSTOR: · Zbl 0631.62110 · doi:10.2307/1913241
[17] Hahn, J. (1998). On the role of propensity score in efficient semiparametric estimation of average treatment effects. Econometrica 66 315-332. JSTOR: · Zbl 1055.62572 · doi:10.2307/2998560
[18] Heckman, J., LaLonde, R. and Smith, J. (1999). The economics and econometrics of active labor market programs. In Handbook of Labor Economics 3A (O. Ashenfelter and D. Card, eds.) 1865-2097. North-Holland, Amsterdam.
[19] Hirano, K., Imbens, G. and Ridder, G. (2003). Efficient estimation of average treatment effects using the estimated propensity score. Econometrica 71 1161-1189. JSTOR: · Zbl 1152.62328 · doi:10.1111/1468-0262.00442
[20] Ibragimov, I. A. and Has’minskii, R. Z. (1981). Statistical Estimation : Asymptotic Theory . Springer, New York. · Zbl 0467.62026
[21] Imbens, G., Newey, W. and Ridder, G. (2005). Mean-squared-error calculations for average treatment effects. Working paper.
[22] Lee, L. and Sepanski, J. (1995). Estimation of linear and nonlinear errors-in-variables models using validation data. J. Amer. Statist. Assoc. 90 130-140. JSTOR: · Zbl 0818.62059 · doi:10.2307/2291136
[23] Little, R. J. A. and Rubin, D. B. (2002). Statistical Analysis with Missing Data , 2nd ed. Wiley, Hoboken, NJ. · Zbl 1011.62004
[24] Newey, W. (1990). Semiparametric efficiency bounds. J. Appl. Econometrics 5 99-135. · Zbl 0705.62033 · doi:10.1002/jae.3950050202
[25] Newey, W. (1994). The asymptotic variance of semiparametric estimators. Econometrica 62 1349-82. JSTOR: · Zbl 0816.62034 · doi:10.2307/2951752
[26] Robins, J., Mark, S. and Newey, W. (1992). Estimating exposure effects by modelling the expectation of exposure conditional on confounders. Biometrics 48 479-495. JSTOR: · Zbl 0768.62099 · doi:10.2307/2532304
[27] Robins, J. M. and Rotnitzky, A. (1995). Semiparametric efficiency in multivariate regression models with missing data. J. Amer. Statist. Assoc. 90 122-129. JSTOR: · Zbl 0818.62043 · doi:10.2307/2291135
[28] Robins, J. M., Rotnitzky, A. and Zhao, L. P. (1994). Estimation of regression coefficients when some regressors are not always observed. J. Amer. Statist. Assoc. 89 846-866. JSTOR: · Zbl 0815.62043 · doi:10.2307/2290910
[29] Rotnitzky, A. and Robins, J. (1995). Semiparametric regression estimation in the presence of dependent censoring. Biometrika 82 805-820. JSTOR: · Zbl 0861.62030 · doi:10.1093/biomet/82.4.805
[30] Schenker, N. (2003). Assessing variability due to race bridging: Application to census counts and vital rates for the year 2000. J. Amer. Statist. Assoc. 98 818-828. · doi:10.1198/016214503000000756
[31] Sepanski, J. and Carroll, R. (1993). Semiparametric quasi-likelihood and variance estimation in measurement error models. J. Econometrics 58 223-256. · Zbl 0780.62038 · doi:10.1016/0304-4076(93)90120-T
[32] Shen, X. (1997). On methods of sieves and penalization. Ann. Statist. 25 2555-2591. · Zbl 0895.62041 · doi:10.1214/aos/1030741085
[33] Shen, X. and Wong, W. (1994). Convergence rates of sieve estimates. Ann. Statist. 22 580-615. · Zbl 0805.62008 · doi:10.1214/aos/1176325486
[34] Tarozzi, A. (2007). Calculating comparable statistics from incomparable surveys, with an application to poverty in India. J. Business and Economic Statistics 25 314-336.
[35] Wang, Q., Linton, O. and Hardle, W. (2004). Semiparametric regression analysis for missing response data. J. Amer. Statist. Assoc. 99 334-345. · Zbl 1117.62441 · doi:10.1198/016214504000000449
[36] Wooldridge, J. (2002). Inverse probability weighted M-estimators for sample selection, attrition and stratification. Portuguese Economic J. 1 117-139.
[37] Wooldridge, J. (2003). Inverse probability weighted estimation for general missing data problems. Manuscript, Michigan State Univ. · Zbl 1418.62545
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.