# zbMATH — the first resource for mathematics

Dimension reduction for the conditional mean in regressions with categorical predictors. (English) Zbl 1042.62037
Summary: Consider the regression of a response $$Y$$ on a vector of quantitative predictors $${\mathbf X}$$ and a categorical predictor $$W$$. We describe a first method for reducing the dimension of $${\mathbf X}$$ without loss of information on the conditional mean $$E(Y\mid{\mathbf X}, W)$$ and without requiring a prespecified parametric model. The method, which allows for, but does not require, parametric versions of the subpopulation mean functions $$E(Y\mid{\mathbf X}, W= w)$$, includes a procedure for inference about the dimension of $${\mathbf X}$$ after reduction.
This work integrates previous studies on dimension reduction for the conditional mean $$E(Y\mid{\mathbf X})$$ in the absence of categorical predictors and dimension reduction for the full conditional distribution of $$Y\mid({\mathbf X}, W)$$. The methodology we describe may be particularly useful for constructing low-dimensional summary plots to aid in model-building at the outset of an analysis. Our proposals provide an often parsimonious alternative to the standard technique of modeling with interaction terms to adapt a mean function for different subpopulations determined by the levels of $$W$$. Examples illustrating this and other aspects of the development are presented.

##### MSC:
 62G08 Nonparametric regression and quantile regression 62H05 Characterization and structure theory for multivariate probability distributions; copulas 62G09 Nonparametric statistical resampling methods
##### Keywords:
analysis of covariance; central subspace; graphics; OLS; SIR; PHD; SAVE; diabetes
ARC
Full Text:
##### References:
 [1] Bentler, P. M. and Xie, J. (2000). Corrections to test statistics in principal Hessian directions. Statist. Probab. Lett. 47 381–389. · Zbl 1157.62412 [2] Bura, E. and Cook, R. D. (2001). Estimating the structural dimension of regressions via parametric inverse regression. J. R. Stat. Soc. Ser. B Stat. Methodol. 63 393–410. · Zbl 0979.62041 [3] Chiaromonte, F. and Cook, R. D. (2002). Sufficient dimension reduction and graphics in regression. Ann. Inst. Statist. Math. 54 768–795. · Zbl 1047.62066 [4] Chiaromonte, F., Cook, R. D. and Li, B. (2002). Sufficient dimension reduction in regressions with categorical predictors. Ann. Statist. 30 475–497. · Zbl 1012.62036 [5] Cook, R. D. (1996). Graphics for regressions with a binary response. J. Amer. Statist. Assoc. 91 983–992. · Zbl 0882.62060 [6] Cook, R. D. (1998a). Regression Graphics . Wiley, New York. · Zbl 0903.62001 [7] Cook, R. D. (1998b). Principal Hessian directions revisited. J. Amer. Statist. Assoc. 93 84–100. · Zbl 0922.62057 [8] Cook, R. D. and Lee, H. (1999). Dimension reduction in binary response regression. J. Amer. Statist. Assoc. 94 1187–1200. · Zbl 1072.62619 [9] Cook, R. D. and Li, B. (2002). Dimension reduction for conditional mean in regression. Ann. Statist. 30 455–474. · Zbl 1012.62035 [10] Cook, R. D. and Weisberg, S. (1991). Discussion of “Sliced inverse regression for dimension reduction.” J. Amer. Statist. Assoc. 86 28–33. · Zbl 1353.62037 [11] Cook, R. D. and Weisberg, S. (1999). Applied Regression Including Computing and Graphics. Wiley, New York. · Zbl 0928.62045 [12] Eaton, M. L. and Tyler, D. E. (1994). The asymptotic distribution of singular values with applications to canonical correlations and correspondence analysis. J. Multivariate Anal. 50 238–264. · Zbl 0805.62020 [13] Fouladi, R. T. (1997). Type I error control of some covariance structure analysis technique under conditions of multivariate non-normality. Comput. Statist. Data Anal. 29 526–532. [14] Li, K.-C. (1991). Sliced inverse regression for dimension reduction (with discussion). J. Amer. Statist. Assoc. 86 316–342. · Zbl 0742.62044 [15] Li, K.-C. (1992). On principal Hessian directions for data visualization and dimension reduction: Another application of Stein’s lemma. J. Amer. Statist. Assoc. 87 1025–1039. · Zbl 0765.62003 [16] Li, K.-C. and Duan, N. (1989). Regression analysis under link violation. Ann. Statist. 17 1009–1052. JSTOR: · Zbl 0753.62041 [17] Satterthwaite, F. E. (1941). Synthesis of variance. Psychometrika 6 309–316. · Zbl 0063.06742 [18] Smith, J. W., Everhart, J. E., Dickson, W. C., Knowler, W. C. and Johannes, R. S. (1988). Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In Proc. Twelfth Annual Symposium on Computer Applications in Medical Care 261–265. IEEE Computer Society Press, New York.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.