Robust feature screening procedures for single and mixed types of data. (English) Zbl 07194333
Summary: Feature screening procedures aim to reducing the dimensionality of data with exponentially-growing dimensions. Existing procedures all focused on a single type of predictors, which are either all continuous or all discrete. They cannot address mixed types of variables, outliers, or nonlinear trends. In this paper we first propose new feature screening procedure(s) for different continuous/discrete combinations of response and predictor variables. They are respectively based on marginal Spearman correlation, marginal ANOVA test, marginal Kruskal-Wallis test, Kolmogorov-Smirnov test, Mann-Whitney test, and smoothing splines modeling. Extensive simulation studies are performed to compare the new and existing procedures, with the aim of identifying a best robust screening procedure for each single type of data. Then we combine these best screening procedures to form the robust feature screening procedure for mixed type of data. We demonstrate its robustness against outliers and model misspecification through simulation studies and a real example.
62F07 Statistical ranking and selection procedures
62G10 Nonparametric hypothesis testing
