A weight function method for selection of proteins to predict an outcome using protein expression data. (English) Zbl 1461.62195
Summary: There are multiple feature selection methods available in the literature for removing unwanted features from modelling. The existing techniques have drawbacks of reproducibility due to random selection of training and validation datasets. In this study, we propose a new resampling approach for feature selection, which helps resolve this drawback. The method will allocate a weight value for all the features in the dataset, and candidate features are selected by placing a cut-off value for the feature weight. The illustrated example shows that the method could select ten features from a set of 254. Results are used to develop a predictive model with a predictive accuracy of 92.3% represented in terms of area under the ROC curve. The results show that the method can successfully select the relevant features which result in an excellent predictive model building compared to commonly used L1, L2, and elastic net regularisation.
62P10 Applications of statistics to biology and medical sciences; meta analysis
62M20 Inference from stochastic processes and prediction
62D05 Sampling theory, sample surveys
92C40 Biochemistry, molecular biology
glmnet; pROC; rda
