##
**Multinomial logit models with implicit variable selection.**
*(English)*
Zbl 1306.62169

Summary: The multinomial logit model is the most widely used model for the unordered multi-category responses. However, applications are typically restricted to the use of few predictors because in the high-dimensional case maximum likelihood estimates frequently do not exist. In this paper we are developing a boosting technique called multinomBoost that performs variable selection and fits the multinomial logit model also when predictors are high-dimensional. Since in multi-category models the effect of one predictor variable is represented by several parameters one has to distinguish between variable selection and parameter selection. A special feature of the approach is that, in contrast to existing approaches, it selects variables not parameters. The method can also distinguish between mandatory predictors and optional predictors. Moreover, it adapts to metric, binary, nominal and ordinal predictors. Regularization within the algorithm allows to include nominal and ordinal variables which have many categories. In the case of ordinal predictors the order information is used. The performance of boosting technique with respect to mean squared error, prediction error and the identification of relevant variables is investigated in a simulation study. The method is applied to the national Indonesia contraceptive prevalence survey and the identification of glass. Results are also compared with the Lasso approach which selects parameters.

### MSC:

62J07 | Ridge regression; shrinkage estimators (Lasso) |

62J12 | Generalized linear models (logistic models) |

### Keywords:

false alarm rate; hit rate; likelihood-based boosting; logistic regression; multinomial logit; penalization; side constraints; variable selection
PDF
BibTeX
XML
Cite

\textit{F. M. Zahid} and \textit{G. Tutz}, Adv. Data Anal. Classif., ADAC 7, No. 4, 393--416 (2013; Zbl 1306.62169)

### References:

[1] | Blake C, Keogh E, Merz CJ (1998) UCI repository of machine learning databases |

[2] | Bühlmann P (2006) Boosting for high-dimensional linear models. Ann Stat 34:559–583 · Zbl 1095.62077 |

[3] | Bühlmann P, Hothorn T (2007) Boosting algorithms: regularization, prediction and model fitting (with discussion). Stat Sci 22:477–505 · Zbl 1246.62163 |

[4] | Bühlmann P, Yu B (2003) Boosting with l2 loss: regression and classification. J Am Stat Assoc 98:324–339 · Zbl 1041.62029 |

[5] | Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. Machine Learning: Proceedings of the 13th international conference, pp 148–156 |

[6] | Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33:1–22 |

[7] | Friedman JH, Hastie T, Tibshirani R (2000) Additive logistic regression: a statistical view of boosting. Ann Stat 28:337–407 · Zbl 1106.62323 |

[8] | Gertheiss J, Tutz G (2009) Penalized regression with ordinal predictors. Int Stat Rev 77:345–365 |

[9] | Krishnapuram B, Carin L, Figueiredo MA, Hartemink AJ (2005) Sparse multinomial logistic regression: fast algorithms and generalization bounds. IEEE Trans Pattern Anal Mach Intell 27:957–968 · Zbl 05111576 |

[10] | Lim TS, Loh WY, Shih YS (2000) A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Mach Learn 40:203–229 · Zbl 0969.68669 |

[11] | Meier L, van de Geer S, Bühlmann P (2008) The group lasso for logistic regression. J R Stat Soc 70:53–71 · Zbl 1400.62276 |

[12] | Nyquist H (1991) Restricted estimation of generalized linear models. J Appl Stat 40:133–141 · Zbl 0825.62612 |

[13] | Park MY, Hastie T (2007) L1-regularization path algorithm for generalized linear models. J R Stat Soc B 69:659–677 |

[14] | Schapire RE (1990) The strength of weak learnability. Mach Learn 5:197–227 |

[15] | Segerstedt B (1992) On ordinary ridge regression in generalized linear models. Commun Stat Theory Methods 21:2227–2246 · Zbl 0775.62185 |

[16] | Tibshirani R (1996) Regression shrinkage and selection via lasso. J R Stat Soc B 58:267–288 · Zbl 0850.62538 |

[17] | Tutz G, Binder H (2006) Generalized additive modelling with implicit variable selection by likelihood based boosting. Biometrics 62:961–971 · Zbl 1116.62075 |

[18] | Tutz G, Binder H (2007) Boosting ridge regression. Comput Stat Data Anal 51:6044–6059 · Zbl 1330.62294 |

[19] | Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc B 68:49–67 · Zbl 1141.62030 |

[20] | Zahid FM, Tutz G (2013) Ridge estimation for multinomial logit models with symmetric side constraints. Comput Stat 28(3): 1017–1034 · Zbl 1305.65087 |

[21] | Zhao P, Rocha G, Yu B (2009) The composite absolute penalties family for grouped and hierarchical variable selection. Ann Stat 37:3468–3497 · Zbl 1369.62164 |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.