Projections of a general binary model on a logistic regression. (English) Zbl 1403.62136

Summary: We consider a general binary model for which conditional probability of success given vector of predictors \(\mathbf{X}\) equals \(q(\beta_1^T\mathbf{X},\ldots,\beta_k^T\mathbf{X})\) and a family of possibly misspecified logistic regressions fitted to it. In the case when \(\mathbf{X}\) satisfies linearity condition we show that their algebraic structure is uniquely determined and that the vector \(\beta^\ast\) corresponding to Kullback-Leibler projection on this family is a linear combination of \(\beta_1,\ldots,\beta_k\). This generalizes the known result proved by P. A. Ruud [Econometrica 51, 225–228 (1983; Zbl 0513.62071)] for \(k=1\) which says that the true and projected vectors are collinear. It also follows that the projected vector has the same direction as the first canonical vector which justifies frequent observations that logistic fit yields well performing classifiers even if misspecification is expected. In the special case of additive binary model with multivariate normal predictors and when response function \(q\) is a convex combination of univariate responses we show that the variance of \(\beta^{\ast T}\mathbf{X}\) is not larger than the maximal variance of the projected linear combinations for the corresponding univariate problems. In the case of balanced additive logistic model it follows that the contribution of \(\beta_i\) to \(\beta^\ast\) is bounded by the corresponding coefficient in the convex representation of response function \(q\).


62J12 Generalized linear models (logistic models)
62J05 Linear regression; mixed models


Zbl 0513.62071
Full Text: DOI


[1] Li, K., Sliced inverse regression for dimension reduction, J. Amer. Statist. Assoc., 86, 414, 316-327 (1991) · Zbl 0742.62044
[2] Zheng, P.; Zhu, J., An integral transformation method for estimating central mean and central subspaces, J. Multivariate Anal., 101, 271-290 (2010) · Zbl 1177.62054
[3] Duan, N.; Li, K., Slicing regression: a link free method, Ann. Statist., 19, 505-530 (1991) · Zbl 0738.62070
[4] Jiang, B.; Liu, J., Variable selection for general index models via sliced inverse regression, Ann. Statist., 42, 1751-1786 (2014) · Zbl 1305.62234
[5] Ruud, P. A., Sufficient conditions for the consistency of maximum likelihood estimation despite misspecification of distribution in multinomial discrete choice models, Econometrica, 51, 1, 225-228 (1983) · Zbl 0513.62071
[6] Brillinger, D., A generalized linear model with ‘Gaussian’ regressor variables, (Festschirift for Erich Lehmann (1983)), 97-114
[7] Li, K.; Duan, N., Regression analysis under link violation, Ann. Statist., 17, 3, 1009-1052 (1989) · Zbl 0753.62041
[8] Kubkowski, M.; Mielniczuk, J., Active sets of predictors for misspecified logistic regression, Statistics, 51, 5, 1023-1045 (2017) · Zbl 1440.62259
[9] Cambanis, S.; Huang, S.; Simons, G., On the theory of elliptically contoured distributions, J. Multivariate Anal., 11, 368-385 (1981) · Zbl 0469.60019
[10] Hall, P.; Li, K., On almost linearity of low dimensional projection from high dimensional data, Ann. Statist., 21, 2, 867-889 (1993) · Zbl 0782.62065
[11] Chen, C.; Li, K., Can SIR be as popular as multiple linear regression?, Statist. Sinica, 8, 289-316 (1998) · Zbl 0897.62069
[12] Stein, C., Estimation of the mean of the multivariate normal distribution, Ann. Statist., 9, 1135-1151 (1981) · Zbl 0476.62035
[13] Gail, M.; Tan, W.; Piantadosi, S., Test for no treatment effect in randomized critical trials, Biometrika, 75, 57-64 (1988) · Zbl 0635.62108
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.