Effect of heavy tails on ultra high dimensional variable ranking methods. (English) Zbl 1257.62057

Summary: Contemporary problems involving sparse, high-dimensional feature selection are becoming rapidly more challenging through substantial increases in dimension. This places ever more stress on methods for analysis, since the effects of even moderately heavy-tailed feature distributions become more significant as the number of features diverges. Data transformations have a significant role to play, reducing noise and enabling an increase in dimension, and for this reason they are increasingly used.
We examine the performance of a typical transformation of this type, and study the extent to which it preserves the main attributes that lead to reliable feature selection. We show both numerically and theoretically that, in the presence of heavy-tailed data, the size of the dimension for which effective variable selection is possible can be increased dramatically, from a low-degree polynomial function of sample size to one that is exponentially large.


62G32 Statistics of extreme values; tail inference
62P10 Applications of statistics to biology and medical sciences; meta analysis
92C40 Biochemistry, molecular biology
92D10 Genetics and epigenetics
65C60 Computational problems in statistics (MSC2010)


Full Text: DOI Link