Siegmund, David; Yakir, Benjamin; Zhang, Nancy R. Detecting simultaneous variant intervals in aligned sequences. (English) Zbl 1223.62166 Ann. Appl. Stat. 5, No. 2A, 645-668 (2011). Summary: Given a set of aligned sequences of independent noisy observations, we are concerned with detecting intervals where the mean values of the observations change simultaneously in a subset of the sequences. The intervals of changed means are typically short relative to the length of the sequences, the subset where the change occurs, the “carriers,” can be relatively small, and the sizes of the changes can vary from one sequence to another. This problem is motivated by the scientific problem of detecting inherited copy number variants in aligned DNA samples. We suggest a statistic based on the assumption that for any given interval of changed means there is a given fraction of samples that carry the change. We derive an analytic approximation for the false positive error probability of a scan, which is shown by simulations to be reasonably accurate. We show that the new method usually improves on methods that analyze a single sample at a time and on our earlier multi-sample method, which is most efficient when the carriers form a large fraction of the set of sequences. The proposed procedure is also shown to be robust with respect to the assumed fraction of carriers of the changes. Cited in 27 Documents MSC: 62P10 Applications of statistics to biology and medical sciences; meta analysis 92C40 Biochemistry, molecular biology 65C60 Computational problems in statistics (MSC2010) Keywords:scan statistics; change-point detection; segmentation; DNA copy number Software:PennCNV × Cite Format Result Cite Review PDF Full Text: DOI arXiv References: [1] Bignell, G. R., Huang, J., Greshock, J., Watt, S., Butler, A., West, S., Grigorova, M., Jones, K. W., Wei, W., Stratton, M. R., Futreal, P. A., Weber, B., Shapero, M. H. and Wooster, R. (2004). High-resolution analysis of DNA copy number using oligonucleotide microarrays. Genome Res. 14 287-295. [2] Colella, S., Yau, C., Taylor, J. M., Mirza, G., Butler, H., Clouston, P., Bassett, A. S., Seller, A., Holmes, C. C. and Ragoussis, J. (2007). QuantiSNP: An objective Bayes hidden-Markov model to detect and accurately map copy number variation using SNP genotyping data. Nucl. Acids Res. 35 2013-2025. [3] Diskin, S. J., Li, M., Hou, C., Yang, S., Glessner, J., Hakonarson, H., Bucan, M., Maris, J. M. and Wang, K. (2008). Adjustment of genomic waves in signal intensities from whole-genome SNP genotyping platforms. Nucl. Acids Res. 36 e126+. [4] Göring, H. H., Curran, J. E., Johnson, M. P., Dyer, T. D., Charlesworth, J., Cole, S. A., Jowett, J. B. M., Abraham, L. J., Rainwater, D. L., Comuzzie, A. G., Mahaney, M. C., Almasy, L., MacCluer, J. W., Kissebah, A. H., Collier, G. R., Moses, E. K. and Blangero, J. (2007). Discovery of expression QTLs using large-scale transcriptional profiling in human lymphocytes. Nat. Genet. 39 1208-1216. [5] Lai, W. R., Johnson, M. D., Kucherlapati, R. and Park, P. J. (2005). Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data. Bioinformatics 21 3763-3770. [6] McCarroll, S. A. (2008). Extending genome-wide association studies to copy-number variation. Hum. Mol. Genet. 17 R135-R142. [7] McCarroll, S. A. A., Kuruvilla, F. G. G., Korn, J. M. M., Cawley, S., Nemesh, J., Wysoker, A., Shapero, M. H. H., de Bakker, P. I. W. I., Maller, J. B. B., Kirby, A., Elliott, A. L. L., Parkin, M., Hubbell, E., Webster, T., Mei, R., Veitch, J., Collins, P. J. J., Handsaker, R., Lincoln, S., Nizzari, M., Blume, J., Jones, K. W. W., Rava, R., Daly, M. J. J., Gabriel, S. B. B. and Altshuler, D. (2008). Integrated detection and population-genetic analysis of SNPs and copy number variation. Nat. Genet. 40 1166-1174. [8] Morley, M., Molony, C. M., Teresa, M., Weber, T. M., Devlin, J. L., Ewens, W. G., Spielman, R. S. and Cheung, V. G. (2004). Genetic analysis of genome-wide variation in human gene expression. Nature 430 743-747. · Zbl 1069.92506 · doi:10.1063/1.166289 [9] Olshen, A. B., Venkatraman, E. S., Lucito, R. and Wigler, M. (2004). Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics 5 557-572. · Zbl 1155.62478 · doi:10.1093/biostatistics/kxh008 [10] Peiffer, D. A., Le, J. M., Steemers, F. J., Chang, W., Jenniges, T., Garcia, F., Haden, K., Li, J., Shaw, C. A., Belmont, J., Cheung, S. W., Shen, R. M., Barker, D. L. and Gunderson, K. L. (2006). High-resolution genomic profiling of chromosomal aberrations using infinium whole-genome genotyping. Genome Res. 16 1136-1148. [11] Pinkel, D. and Albertson, D. G. (2005). Array comparative genomic hybridization and its applications in cancer. Nat. Genet. 37 S11-S17. [12] Pinkel, D., Segraves, R., Sudar, D., Clark, S., Poole, I., Kowbel, D., Collins, C., Kuo, W. L., Chen, C., Zhai, Y., Dairkee, S. H., Ljung, B. M., Gray, J. W. and Albertson, D. G. (1998). High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays. Nat. Genet. 20 207-211. [13] Pollack, J. R., Perou, C. M., Alizadeh, A. A., Eisen, M. B., Pergamenschikov, A., Williams, C. F., Jeffrey, S. S., Botstein, D. and Brown, P. O. (1999). Genome-wide analysis of DNA copy-number changes using cDNA microarrays. Nat. Genet. 23 41-46. [14] Purdom, E. and Holmes, S. P. (2005). Error distribution for gene expression data. Statist. Appl. Genet. Mol. Biol. 4 16. · Zbl 1083.62114 · doi:10.2202/1544-6115.1070 [15] Redon, R., Ishikawa, S., Fitch, K. R., Feuk, L., Perry, G. H., Andrews, D. T., Fiegler, H., Shapero, M. H., Carson, A. R., Chen, W., Cho, E. K., Dallaire, S., Freeman, J. L., Gonzalez, J. R., Gratacos, M., Huang, J., Kalaitzopoulos, D., Komura, D., Macdonald, J. R., Marshall, C. R., Mei, R., Montgomery, L., Nishimura, K., Okamura, K., Shen, F., Somerville, M. J., Tchinda, J., Valsesia, A., Woodwark, C., Yang, F., Zhang, J., Zerjal, T., Zhang, J., Armengol, L., Conrad, D. F., Estivill, X., Tyler-Smith, C., Carter, N. P., Aburatani, H., Lee, C., Jones, K. W., Scherer, S. W. and Hurles, M. E. (2006). Global variation in copy number in the human genome. Nature 444 444-454. [16] Shi, J., Siegmund, D. and Levinson, D. F. (2007). Statistical corrections of linkage data suggest predominantly cis regulations of gene expression. In Proceedings of the 2006 Genetic Analysis Workshop, BMCC Proceedings I S145. [17] Siegmund, D. O. and Yakir, B. (2000). Tail probabilities for the null distribution of scanning statistics. Bernoulli 6 191-213. · Zbl 0976.62048 · doi:10.2307/3318574 [18] Siegmund, D. O. and Yakir, B. (2007). The Statistics of Gene Mapping . Springer, New York. · Zbl 1280.62012 [19] Siegmund, D. O., Yakir, B. and Zhang, N. R. (2010). Tail approximations for maxima of random fields by likelihood ratio transformations. Sequential Anal. 29 245-262. · Zbl 1200.62090 · doi:10.1080/07474946.2010.487428 [20] Snijders, A. M., Nowak, N., Segraves, R., Blackwood, S., Brown, N., Conroy, J., Hamilton, G., Hindle, A. K., Huey, B., Kimura, K., Law, S., Myambo, K., Palmer, J., Ylstra, B., Yue, J. P., Gray, J. W., Jain, A. N., Pinkel, D. and Albertson, D. G. (2001). Assembly of microarrays for genome-wide measurement of DNA copy number. Nat. Genet. 29 263-264. [21] Tartakovsky, A. and Polunchenko, A. S. (2007). Decentralized quickest change detection in distributed sensor systems with applications to information assurance and counter terrorism. In Proceedings of the 13th Annual Army Conference on Applied Statistics , Houston, TX. [22] Walsh, T., McClellan, J. M., McCarthy, S. E., Addington, A. M., Pierce, S. B., Cooper, G. M., Nord, A. S., Kusenda, M., Malhotra, D., Bhandari, A., Stray, S. M., Rippey, C. F., Roccanova, P., Makarov, V., Lakshmi, B., Findling, R. L., Sikich, L., Stromberg, T., Merriman, B., Gogtay, N., Butler, P., Eckstrand, K., Noory, L., Gochman, P., Long, R., Chen, Z., Davis, S., Baker, C., Eichler, E. E., Meltzer, P. S., Nelson, S. F., Singleton, A. B., Lee, M. K., Rapoport, J. L., King, M.-C. and Sebat, J. (2008). Rare structural variants disrupt multiple genes in neurodevelopmental pathways in schizophrenia. Science 320 539-543. [23] Wang, K., Li, M., Hadley, D., Liu, R., Glessner, J., Grant, S. F. A., Hakonarson, H. and Bucan, M. (2007). PennCNV: An integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res. 17 1665-1674. [24] Willenbrock, H. and Fridlyand, J. (2005). A comparison study: Applying segmentation to arrayCGH data for downstream analyses. Bioinformatics 21 4084-4091. [25] Yakir, B. and Pollak, M. (1998). A new representation for a renewal-theoretic constant appearing in asymptotic approximations of large deviations. Ann. Appl. Probab. 8 749-774. · Zbl 0937.60082 · doi:10.1214/aoap/1028903449 [26] Zhang, N. R. (2010). DNA copy number profiling in normal and tumor genomes. In Frontiers in Computational and Systems Biology ( J. Feng, W. Fu and F. Sun, eds.) 259-281. Springer, London. [27] Zhang, N. R., Senbabaoglu, Y. and Li, J. Z. (2010). Joint estimation of DNA copy number from multiple platforms. Bioinformatics 26 153-160. [28] Zhang, N. R. and Siegmund, D. O. (2007). A modified Bayes information criterion with applications to the analysis of comparative genomic hybridization data. Biometrics 63 22-32. · Zbl 1206.62174 · doi:10.1111/j.1541-0420.2006.00662.x [29] Zhang, N. R., Siegmund, D. O., Ji, H. and Li, J. Z. (2010). Detecting simultaneous change-points in multiple sequences. Biometrika . 97 631-644. · Zbl 1195.62168 · doi:10.1093/biomet/asq025 This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.