Automatic grouping using smooth-threshold estimating equations. (English) Zbl 1274.62470

Summary: Use of redundant statistical model is often the case with practical data analysis. Redundancy widely investigated is inclusion of irrelevant predictors which is resolved by setting their coefficients to zero. On the other hand, it is also useful to consider overlapping parameters of which the values are similar. Grouping by regarding a set of parameters as a single parameter contributes to building intimate parameterization and increasing estimation accuracy by dimension reduction.
The paper proposes a data adaptive automatic grouping of parameters, which simultaneously enables variable selection that can yield sparse solution, by applying the smooth-thresholding. The new procedure is applicable to several estimation equation-based methods, and is shown to possess the oracle property. No convex optimization is needed for its implementation. Numerical examinations including large \(p\) small \(n\) situation are performed. Proposed automatic grouping applies to interaction modeling for Ohio wheeze data and for credit scoring data.


62J07 Ridge regression; shrinkage estimators (Lasso)
62J10 Analysis of variance and covariance (ANOVA)
62P20 Applications of statistics to economics
62P05 Applications of statistics to actuarial sciences and financial mathematics


Full Text: DOI Euclid


[1] Bondell, H. D. and Reich, B. J. (2009). Simultaneous factor selection and collapsing levels in ANOVA., Biometrics 65 169-177. · Zbl 1159.62048 · doi:10.1111/j.1541-0420.2008.01061.x
[2] Buckley, J. J. and James, I. R. (1979). Linear regression with censored data., Biometrika 66 429-36. · Zbl 0425.62051 · doi:10.1093/biomet/66.3.429
[3] Chipman, H. (1996). Bayesian variable selection with related predictors., Canadian Journal of Statistics 24 407-499. · Zbl 0849.62032 · doi:10.2307/3315687
[4] Choi, N. H., Li, W. and Zhu, J. (2010). Variable seletion with the strong heredity constraint and its oracle property., Journal of the American Statistical Association 105 354-364. · Zbl 1320.62171 · doi:10.1198/jasa.2010.tm08281
[5] Fahrmeier, L. and Tutz, G. (2001)., Multivariate Statistical Modelling Based on Generalized Linear Model, 2nd Edition. New York: Springer.
[6] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties., Journal of the American Statistical Association 96 1348-1360. · Zbl 1073.62547 · doi:10.1198/016214501753382273
[7] Fitzmaurice, G. M. and Laird, N. M. (1993). A likelihood-based method for analysing longitudinal binary responses., Biometrika 80 141-151. · Zbl 0775.62296 · doi:10.1093/biomet/80.1.141
[8] Fu, W. J. (2003). Penalized estimating equations., Biometrics 59 126-32. · Zbl 1210.62016 · doi:10.1111/1541-0420.00015
[9] Hamada, M. and Wu, C. (1992). Analisis of designed experiments with complex aliasing., Journal of Quality Technology 24 130-137.
[10] Jiang, W. and Liu, X. (2004). Consistent model selection based on parameter estimates., Journal of Statistical Planning and Inference 121 265-283. · Zbl 1036.62016 · doi:10.1016/S0378-3758(03)00112-5
[11] Johnson, B. A., Lin, D. Y. and Zeng, D. (2008). Penalized estimating functions and variable selection in semiparametric regression models., Journal of the American Statistical Association 103 672-680. · Zbl 1471.62330 · doi:10.1198/016214508000000184
[12] Joseph, V. (2006). A Bayesian approach to the design and analysis of fractionated experiments., Technometrics 48 219-229. · doi:10.1198/004017005000000652
[13] Lai, T. L. and Ying, Z. (1991). Large sample theory of a modified Buckley-James estimator for regression analysis with censored data., Annals of Statistics 19 1370-1402. · Zbl 0742.62043 · doi:10.1214/aos/1176348253
[14] Liang, K. and Zeger, S. (1986). Longitudinal data analysis using generalized linear models., Biometrika 73 13-22. · Zbl 0595.62110 · doi:10.1093/biomet/73.1.13
[15] Rockafellar, R. T. (1979)., Convex Analysis . Princeton University Press. · Zbl 0419.90024
[16] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso., Journal of the Royal Statistical Society, Series B 58 267-288. · Zbl 0850.62538
[17] Tsiatis, A. A. (2006)., Semiparametric Theory and Missing Data . New York: Springer. · Zbl 1105.62002 · doi:10.1007/0-387-37345-4
[18] Ueki, M. (2009). A note on automatic variable selection using smooth-threshold estimationg equations., Biometirka 96 1005-1011. · Zbl 1437.62634 · doi:10.1093/biomet/asp060
[19] van der Vaart, A. W. (1998)., Asymptotic Statistics . New York: Cambridge University Press. · Zbl 0910.62001 · doi:10.1017/CBO9780511802256
[20] Wang, H. and Leng, C. (2007). Unified lasso estimation via least squares approximation., Journal of the American Statistical Association 102 1039-48. · Zbl 1306.62167 · doi:10.1198/016214507000000509
[21] Wang, H., Li, R. and Tsai, C. L. (2007). Tuning parameter selectors for the smoothly clipped absolute deviation method., Biometrika 94 553-68. · Zbl 1135.62058 · doi:10.1093/biomet/asm053
[22] Ware, J. H., Dockery, D. W., Spiro, A. III, Speizer, F. E. and Fenis, B. G. Jr (1984). Passive smoking, gas cooking and respiratory health in children living in six cities., American Review of Respiratory Disease 129 366-374.
[23] Zeger, S. L., Liang, K. Y. and Albert, P. A. (1988). Models for longitudinal data: a generalized estimating equation approach., Biometrics 44 1049-1060. · Zbl 0715.62136 · doi:10.2307/2531734
[24] Zheng, X. and Loh, W. Y. (1995). Consistent variable selection in linear models., Journal of the American Statistical Association 90 151-156. · Zbl 0818.62060 · doi:10.2307/2291138
[25] Zou, H. (2006). The adaptive lasso and its oracle properties., Journal of the American Statistical Association 101 1418-1429. · Zbl 1171.62326 · doi:10.1198/016214506000000735
[26] Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net., Journal of the Royal Statistical Society, Series B 67 301-320. · Zbl 1069.62054 · doi:10.1111/j.1467-9868.2005.00503.x
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.