zbMATH — the first resource for mathematics

Bayes multiple decision functions. (English) Zbl 1336.62040
Summary: This paper deals with the problem of simultaneously making many (\(M\)) binary decisions based on one realization of a random data matrix \(\mathbf{X}\). \(M\) is typically large and \(\mathbf{X}\) will usually have \(M\) rows associated with each of the \(M\) decisions to make, but for each row the data may be low dimensional. Such problems arise in many practical areas such as the biological and medical sciences, where the available dataset is from microarrays or other high-throughput technology and with the goal being to decide which among of many genes are relevant with respect to some phenotype of interest; in the engineering and reliability sciences; in astronomy; in education; and in business. A Bayesian decision-theoretic approach to this problem is implemented with the overall loss function being a cost-weighted linear combination of Type I and Type II loss functions. The class of loss functions considered allows for use of the false discovery rate (FDR), false nondiscovery rate (FNR), and missed discovery rate (MDR) in assessing the quality of decision. Through this Bayesian paradigm, the Bayes multiple decision function (BMDF) is derived and an efficient algorithm to obtain the optimal Bayes action is described. In contrast to many works in the literature where the rows of the matrix \(\mathbf{X}\) are assumed to be stochastically independent, we allow a dependent data structure with the associations obtained through a class of frailty-induced Archimedean copulas. In particular, non-Gaussian dependent data structure, which is typical with failure-time data, can be entertained. The numerical implementation of the determination of the Bayes optimal action is facilitated through sequential Monte Carlo techniques. The theory developed could also be extended to the problem of multiple hypotheses testing, multiple classification and prediction, and high-dimensional variable selection. The proposed procedure is illustrated for the simple versus simple hypotheses setting and for the composite hypotheses setting through simulation studies. The procedure is also applied to a subset of a microarray data set from a colon cancer study.

62C25 Compound decision problems in statistical decision theory
62C10 Bayesian problems; characterization of Bayes procedures
62J15 Paired and multiple comparisons; multiple testing
62P10 Applications of statistics to biology and medical sciences; meta analysis
Full Text: DOI Euclid arXiv
[1] Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 , 1, 289-300. · Zbl 0809.62014
[2] Berger, J. O. (1985). Statistical decision theory and Bayesian analysis , Second ed. Springer Series in Statistics. Springer-Verlag, New York. · Zbl 0572.62008
[3] Bogdan, M., Chakrabarti, A., Frommlet, F., and Ghosh, J. K. (2011). Asymptotic Bayes-optimality under sparsity of some multiple testing procedures. Ann. Statist. 39 , 3, 1551-1579. . · Zbl 1221.62012
[4] Bogdan, M., Ghosh, J. K., and Tokdar, S. T. (2008). A comparison of the Benjamini-Hochberg procedure with some Bayesian rules for multiple testing. In Beyond parametrics in interdisciplinary research: Festschrift in honor of Professor Pranab K. Sen . Inst. Math. Stat. Collect., Vol. 1 . Inst. Math. Statist., Beachwood, OH, 211-230. .
[5] Casella, G. and Berger, R. L. (2001). Statistical Inference, 2nd ed. Duxbury Press. · Zbl 0699.62001
[6] Efron, B. (2008). Microarrays, empirical Bayes and the two-groups model. Statist. Sci. 23 , 1, 1-22. · Zbl 1327.62046
[7] Efron, B. (2010a). Large-scale inference . Institute of Mathematical Statistics Monographs, Vol. 1 . Cambridge University Press, Cambridge. · Zbl 1277.62016
[8] Efron, B. (2010b). The Future of Indirect Evidence. Statistical Science 25 , 2, 145-157. · Zbl 1328.62043
[9] Gordon, N. J. and Smith, A. F. M. (1993). Approximate non-Gaussian Bayesian estimation and modal consistency. J. Roy. Statist. Soc. Ser. B 55 , 4, 913-918. · Zbl 0782.62078
[10] Hastie, T., Tibshirani, R., and Friedman, J. (2009). The elements of statistical learning , Second ed. Springer Series in Statistics. Springer, New York. · Zbl 1273.62005
[11] Hougaard, P. (2000). Analysis of multivariate survival data . Statistics for Biology and Health. Springer-Verlag, New York. . · Zbl 0962.62096
[12] Knuth, D. E. (1973). The art of computer programming. Volume 3 . Addison-Wesley Publishing Co., Reading, Mass.-London-Don Mills, Ont. · Zbl 0191.17903
[13] Liu, J. S. (2001). Monte Carlo strategies in scientific computing . Springer Series in Statistics. Springer-Verlag, New York. · Zbl 0991.65001
[14] Müller, P., Parmigiani, G., and Rice, K. (2007). FDR and Bayesian multiple comparisons rules. In Bayesian statistics 8 . Oxford Sci. Publ. Oxford Univ. Press, Oxford, 349-370. · Zbl 1252.62025
[15] Müller, P., Parmigiani, G., Robert, C., and Rousseau, J. (2004). Optimal sample size for multiple testing: the case of gene expression microarrays. J. Amer. Statist. Assoc. 99 , 468, 990-1001. · Zbl 1055.62127
[16] Nelsen, R. B. (1999). An introduction to copulas . Lecture Notes in Statistics, Vol. 139 . Springer-Verlag, New York. · Zbl 0909.62052
[17] Neutial, P. and Roquain, E. On false discovery rate thresholding for classification under sparsity. To appear in Ann. Statist. . · Zbl 1373.62315
[18] Peña, E. A., Habiger, J., and Wu, W. (2011). Power-enhanced multiple decision functions controlling family-wise error and false discovery rates. Annals of Statistics 39 , 1, 556-583. · Zbl 1274.62143
[19] Ripley, B. D. (1987). Stochastic simulation . Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics. John Wiley & Sons Inc., New York. · Zbl 0613.65006
[20] Sarkar, S. K., Zhou, T., and Ghosh, D. (2008). A general decision theoretic formulation of procedures controlling FDR and FNR from a Bayesian perspective. Statist. Sinica 18 , 3, 925-945. · Zbl 1149.62003
[21] Scott, J. G. and Berger, J. O. (2006). An exploration of aspects of Bayesian multiple testing. J. Statist. Plann. Inference 136 , 7, 2144-2162. . · Zbl 1087.62039
[22] Storey, J. (2003). The positive false discovery rate: a Bayesian interpretation and the q-value. The Annals of Statistics 31 , 2012 - 2035. · Zbl 1042.62026
[23] Sun, W. and Cai, T. T. (2007). Oracle and adaptive compound decision rules for false discovery rate control. J. Amer. Statist. Assoc. 102 , 479, 901-912. . · Zbl 05564419
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.