A penalized likelihood estimation approach to semiparametric sample selection binary response modeling.

*(English)*Zbl 1337.62170Summary: Sample selection models are employed when an outcome of interest is observed for a restricted non-randomly selected sample of the population. We consider the case in which the response is binary and continuous covariates have a nonlinear relationship to the outcome. We introduce two statistical methods for the estimation of two binary regression models involving semiparametric predictors in the presence of non-random sample selection. This is achieved using a multiple-stage procedure, and a newly developed simultaneous equation estimation scheme. Both approaches are based on the penalized likelihood estimation framework. The problems of identification and inference are also discussed. The empirical properties of the proposed approaches are studied through a simulation study. The methods are then illustrated using data from the American National Election Study where the aim is to quantify public support for school integration. If non-random sample selection is neglected then the predicted probability of giving, for instance, a supportive response may be biased, an issue that can be tackled using the proposed tools.

##### MSC:

62J07 | Ridge regression; shrinkage estimators (Lasso) |

62D05 | Sampling theory, sample surveys |

62G08 | Nonparametric regression and quantile regression |

62P20 | Applications of statistics to economics |

##### Keywords:

binary responses; bivariate probit; non-random sample selection; penalized regression spline
PDF
BibTeX
XML
Cite

\textit{G. Marra} and \textit{R. Radice}, Electron. J. Stat. 7, 1432--1455 (2013; Zbl 1337.62170)

**OpenURL**

##### References:

[1] | T. BĂ¤rnighausen, J. Bor, S. Wandira-Kazibw, and D. Canning. Correcting hiv prevalence estimates for survey nonparticipation using heckman-type selection models. Epidemiology , 22:27-35, 2011. |

[2] | A. J. Berinsky. The two faces of public opinion. American Journal of Political Science , 43:1209-1230, 1999. |

[3] | W. J. Boyes, D. L. Hoffman, and S. A. Low. An econometric analysis of the bank credit scoring problem. Journal of Econometrics , 40:3-14, 1989. |

[4] | M. Bratti and A. Miranda. Endogenous treatment effects for count data models with endogenous participation or sample selection. Health Economics , 20:90-1109, 2011. |

[5] | S. Chib and E. Greenberg. Semiparametric modeling and estimation of instrumental variable models. Journal of Computational and Graphical Statistics , 16:86-114, 2007. |

[6] | P. Craven and G. Wahba. Smoothing noisy data with spline functions. Numerische Mathematik , 31:377-403, 1979. · Zbl 0377.65007 |

[7] | G. Cuddeback, E. Wilson, J. G. Orme, and T. Combs-Orme. Detecting and statistically correcting sample selection bias. Journal of Social Service Research , 30:19-33, 2004. |

[8] | W. P. M. M. Van de Ven and B. M. S. Van Praag. The demand for deductibles in private health insurance: a probit model with sample selection. Journal of Econometrics , 17:229-252, 1981. |

[9] | J. A. Dubin and D. Rivers. Selection bias in linear regression, logit and probit models. Sociological Methods and Research , 18:360-390, 1990. |

[10] | Paul H. C. Eilers and Brian D. Marx. Flexible smoothing with B-splines and penalties. Statistical Science , 11(2):89-121, 1996. · Zbl 0955.62562 |

[11] | W. H. Greene. Econometric Analysis . Prentice Hall, New York, 2012. |

[12] | R. M. Groves, D. A. Dillman, J. L. Eltinge, and R. J. A. Little. Survey Nonresponse . Wiley, New York, 2001. · Zbl 0976.00027 |

[13] | C. Gu. Cross validating non-gaussian data. Journal of Computational and Graphical Statistics , 1:169-179, 1992. |

[14] | C. Gu. Smoothing Spline ANOVA Models . London: Springer-Verlag, 2002. · Zbl 1051.62034 |

[15] | T. Hastie and R. Tibshirani. Varying-coefficient models. Journal of the Royal Statistical Society Series B , 55:757-796, 1993. · Zbl 0796.62060 |

[16] | J. J. Heckman. Sample selection bias as a specification error. Econometrica , 47:153-162, 1979. · Zbl 0392.62093 |

[17] | R. Klein and R. Spady. An efficient semiparametric estimator of the binary choice model. Econometrica , 61:387-421, 1993. · Zbl 0783.62100 |

[18] | L. F. Lee. Generalized econometric models with selectivity. Econometrica , 51:507-512, 1983. · Zbl 0516.62094 |

[19] | L. F. Lee. Tests for the bivariate normal distribution in econometric models with selectivity. Econometrica , 52:843-863, 1984. · Zbl 0557.62095 |

[20] | S. F. Leung and S. Yu. Collinearity and two-step estimation of sample selection models: problems, origins, and remedies. Computational Economics , 15:173-199, 2000. · Zbl 1013.91101 |

[21] | P. Li and M. A. Rahman. Bayesian analysis of multivariate sample selection models using gaussian copulas. In D. M. Drukker, editor, Missing Data Methods: Cross-sectional Methods and Applications. Volume 27 of Advances in Econometrics , pages 269-288. Emerald Group Publishing Limited, 2011. |

[22] | G. S. Maddala. Limited Dependent and Qualitative Variables in Econometrics . Cambridge University Press, Cambridge, 1983. · Zbl 0527.62098 |

[23] | G. Marra and R. Radice. Estimation of a semiparametric recursive bivariate probit model in the presence of endogeneity. Canadian Journal of Statistics , 39:259-279, 2011. · Zbl 1219.62068 |

[24] | G. Marra and R. Radice. A flexible instrumental variable approach. Statistical Modelling , 11:581-279, 2011. · Zbl 1219.62068 |

[25] | G. Marra and R. Radice. Estimation of a regression spline sample selection model. Computational Statistics and Data Analysis , 2013. · Zbl 1337.62170 |

[26] | G. Marra and R. Radice. SemiParBIVProbit: Semiparametric Bivariate Probit Modelling , 2013. R package version 3.2-6. |

[27] | G. Marra and S. N. Wood. Coverage properties of confidence intervals for generalized additive model components. Scandinavian Journal of Statistics , 39:53-74, 2012. · Zbl 1246.62058 |

[28] | W. E. Miller, R. D. R. Kinder, S. J. Rosenstone, and National Election Studies. American national election study, 1992: Pre- and post-election survey [enhanced with 1990 and 1991 data]. Technical report, Inter-university Consortium for Political and Social Research [distributor], 1999. |

[29] | A. Miranda and S. Rabe-Hesketh. Maximum likelihood estimation of endogenous switching and sample selection models for binary, ordinal, and count variables. Stata Journal , 6:285-308, 2006. |

[30] | C. Montmarquettea, S. Mahseredjiana, and R. Houle. The determinants of university dropouts: a bivariate probability model with sample selection. Economics of Education Review , 20:475-484, 2001. |

[31] | J. Nocedal and S. J. Wright. Numerical Optimization . New York: Springer-Verlag, 2006. · Zbl 1104.65059 |

[32] | P. A. Puhani. The heckman correction for sample selection and its critique. Journal of Economic Surveys , 14:53-68, 2000. |

[33] | R Development Core Team. R: A Language and Environment for Statistical Computing . R Foundation for Statistical Computing, Vienna, Austria, 2013. ISBN 3-900051-07-0. |

[34] | D. Ruppert, M. P. Wand, and R. J. Carroll. Semiparametric Regression . Cambridge University Press, New York, 2003. · Zbl 1038.62042 |

[35] | M. D. Smith. Modelling sample selection using archimedean copulas. Econometrics Journal , 6:99-123, 2003. · Zbl 1037.62047 |

[36] | J. V. Terza. Estimating count data models with endogenous switching: Sample selection and endogenous treatment effects. Journal of Econometrics , 84:129-154, 1998. · Zbl 1049.62516 |

[37] | F. Vella. Estimating models with sample selection bias: a survey. Journal of Human Resources , 33:127-169, 1998. |

[38] | S. Verba, K. L. Schlozman, and H. E. Brady. Voice and Equality: Civic Voluntarism in American Politics . Cambridge: Harvard University Press, 1995. |

[39] | G. Wahba. Bayesian ‘confidence intervals’ for the cross-validated smoothing spline. Journal of the Royal Statistical Society Series B , 45:133-150, 1983. · Zbl 0538.65006 |

[40] | M. Wiesenfarth and T. Kneib. Bayesian geoadditive sample selection models. Journal of the Royal Statistical Society Series C , 59:381-404, 2011. |

[41] | S. N. Wood. Stable and efficient multiple smoothing parameter estimation for generalized additive models. Journal of the American Statistical Association , 99:673-686, 2004. · Zbl 1117.62445 |

[42] | S. N. Wood. Generalized Additive Models: An Introduction With R . London: Chapman & Hall/CRC, 2006. · Zbl 1087.62082 |

[43] | D. M. Zimmer and P. K. Trivedi. Using trivariate copulas to model sample selection and treatment effects: Application to family health care demand. Journal of Business and Economic Statistics , 24:63-76, 1983. |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.