## Detecting rare and faint signals via thresholding maximum likelihood estimators.(English)Zbl 1392.62163

Summary: Motivated by the analysis of RNA sequencing (RNA-seq) data for genes differentially expressed across multiple conditions, we consider detecting rare and faint signals in high-dimensional response variables. We address the signal detection problem under a general framework, which includes generalized linear models for count-valued responses as special cases. We propose a test statistic that carries out a multi-level thresholding on maximum likelihood estimators (MLEs) of the signals, based on a new Cramér-type moderate deviation result for multidimensional MLEs. Based on the multi-level thresholding test, a multiple testing procedure is proposed for signal identification. Numerical simulations and a case study on maize RNA-seq data are conducted to demonstrate the effectiveness of the proposed approaches on signal detection and identification.

### MSC:

 62H15 Hypothesis testing in multivariate analysis 62G20 Asymptotic properties of nonparametric inference 62G32 Statistics of extreme values; tail inference

DEseq; QuasiSeq
Full Text:

### References:

 [1] Anders, S. and Huber, W. (2010). Differential expression analysis for sequence count data. Genome Biol.11 R106. [2] Arias-Castro, E., Candès, E. J. and Plan, Y. (2011). Global testing under sparse alternatives: ANOVA, multiple comparisons and the higher criticism. Ann. Statist.39 2533-2556. · Zbl 1231.62136 [3] Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B. Stat. Methodol.57 289-300. · Zbl 0809.62014 [4] Bradley, R. C. (2005). Basic properties of strong mixing conditions. A survey and some open questions. Probab. Surv.2 107-144. · Zbl 1189.60077 [5] Chen, S. X. and Qin, Y.-L. (2010). A two-sample test for high-dimensional data with applications to gene-set testing. Ann. Statist.38 808-835. · Zbl 1183.62095 [6] Delaigle, A., Hall, P. and Jin, J. (2011). Robustness and accuracy of methods for high dimensional data analysis based on Student’s $$t$$-statistic. J. R. Stat. Soc. Ser. B. Stat. Methodol.73 283-301. · Zbl 1411.62222 [7] Donoho, D. and Jin, J. (2004). Higher criticism for detecting sparse heterogeneous mixtures. Ann. Statist.32 962-994. · Zbl 1092.62051 [8] Fan, Y., Jin, J. and Yao, Z. (2013). Optimal classification in sparse Gaussian graphic model. Ann. Statist.41 2537-2571. · Zbl 1294.62061 [9] Fan, J. and Song, R. (2010). Sure independence screening in generalized linear models with NP-dimensionality. Ann. Statist.38 3567-3604. · Zbl 1206.68157 [10] Genovese, C. R. and Wasserman, L. (2006). Exceedance control of the false discovery proportion. J. Amer. Statist. Assoc.101 1408-1417. · Zbl 1171.62338 [11] Goeman, J. J., van Houwelingen, H. C. and Finos, L. (2011). Testing against a high-dimensional alternative in the generalized linear model: Asymptotic type I error control. Biometrika 98 381-390. · Zbl 1215.62068 [12] Guo, B. and Chen, S. X. (2016). Tests for high dimensional generalized linear models. J. R. Stat. Soc. Ser. B. Stat. Methodol.78 1079-1102. · Zbl 1414.62328 [13] Hall, P. and Jin, J. (2008). Properties of higher criticism under strong dependence. Ann. Statist.36 381-402. · Zbl 1139.62049 [14] Hall, P. and Jin, J. (2010). Innovated higher criticism for detecting sparse signals in correlated noise. Ann. Statist.38 1686-1732. · Zbl 1189.62080 [15] Inglot, T. and Kallenberg, W. C. M. (2003). Moderate deviations of minimum contrast estimators under contamination. Ann. Statist.31 852-879. · Zbl 1028.62012 [16] Ingster, Yu. I. (1997). Some problems of hypothesis testing leading to infinitely divisible distributions. Math. Methods Statist.6 47-69. · Zbl 0878.62005 [17] Jensen, J. L. and Wood, A. T. A. (1998). Large deviation and other results for minimum contrast estimators. Ann. Inst. Statist. Math.50 673-695. · Zbl 0954.62070 [18] Ji, P. and Jin, J. (2012). UPS delivers optimal phase diagram in high-dimensional variable selection. Ann. Statist.40 73-103. · Zbl 1246.62160 [19] Lehmann, E. L. (1959). Testing Statistical Hypotheses. Wiley, New York. · Zbl 0089.14102 [20] Lund, S. P., Nettleton, D., McCarthy, D. J. and Smyth, G. K. (2012). Detecting differential expression in RNA-sequence data using quasi-likelihood with shrunken dispersion estimates. Stat. Appl. Genet. Mol. Biol.11 Art. 8. · Zbl 1296.92187 [21] McCulloch, C. E., Searle, S. R. and Neuhaus, J. M. (2008). Generalized, Linear, and Mixed Models, 2nd ed. Wiley, Hoboken, NJ. · Zbl 1165.62050 [22] Paschold, A., Larson, N. B., Marcon, C., Schnable, J. C., Yeh, C. T., Lanz, C., Nettleton, D., Piepho, H.-P., Schnable, P. S. and Hochholdinger, F. (2017). Non-syntenic genes frive highly dynamic complementation of gene expression in maize hybrids. Plant Cell. To appear. [23] Petrov, V. V. (1995). Limit Theorems of Probability Theory: Sequences of Independent Random Variables. Clarendon Press, London. · Zbl 0826.60001 [24] Qiu, Y., Chen, S. X and Nettleton, D. (2018). Supplement to “Detecting rare and faint signals via thresholding maximum likelihood estimators.” DOI:10.1214/17-AOS1574SUPP. · Zbl 1392.62163 [25] Robinson, M. D. and Smyth, G. K. (2007). Moderated statistical tests for assessing differences in tag abundance. Bioinformatics 23 2881-2887. [26] Robinson, M. D. and Smyth, G. K. (2008). Small-sample estimation of negative binomial dispersion, with applications to SAGE data. Biostatistics 9 321-332. · Zbl 1143.62312 [27] Saulis, L. and Statulevičius, V. A. (1991). Limit Theorems for Large Deviations. Mathematics and Its Applications (Soviet Series) 73. Kluwer Academic, Dordrecht. [28] van de Geer, S., Bühlmann, P., Ritov, Y. and Dezeure, R. (2014). On asymptotically optimal confidence regions and tests for high-dimensional models. Ann. Statist.42 1166-1202. · Zbl 1305.62259 [29] van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics 3. Cambridge Univ. Press, Cambridge. [30] Zhang, C.-H. and Zhang, S. S. (2014). Confidence intervals for low dimensional parameters in high dimensional linear models. J. R. Stat. Soc. Ser. B. Stat. Methodol.76 217-242. · Zbl 1411.62196 [31] Zhong, P.-S. and Chen, S. X. (2011). Tests for high-dimensional regression coefficients with factorial designs. J. Amer. Statist. Assoc.106 260-274. · Zbl 1396.62110 [32] Zhong, P.-S., Chen, S. X. and Xu, M. (2013). Tests alternative to higher criticism for high-dimensional means under sparsity and column-wise dependence. Ann. Statist.41 2820-2851. · Zbl 1294.62128
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.