Approximate $$p$$-values for local sequence alignments.(English)Zbl 1105.62377

Summary: Assume that two sequences from a finite alphabet are optimally aligned according to a scoring system that rewards similarities according to a general scoring scheme and penalizes gaps (insertions and deletions). Under the assumption that the letters in each sequence are independent and identically distributed and the two sequences are also independent, approximate $$p$$-values are obtained for the optimal local alignment when either (i) there are at most a fixed number of gaps, or (ii) the gap initiation cost is sufficiently large. In the latter case the approximation can be written in the same form as the well-known case of ungapped alignments.

MSC:

 62M99 Inference from stochastic processes 62P10 Applications of statistics to biology and medical sciences; meta analysis 92C40 Biochemistry, molecular biology
Full Text:

References:

 [1] Altschul, S. F., Gish, W., Miller, W., Myers, E. W. and Lipman, D. J. (1990). Basic local alignment search tool. J. Molecular Biol. 215 403-410. [2] Altschul, S. F. and Gish, W. (1996). Local alignment statistics. Methods in Enzymology 266 460-480. Altschul, S. F., Madden, T. L., Schäffer, A. A., Zhang, J., Zhang, Z., Miller, W. and Lipman, [3] D. J. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25 3389-3402. [4] Arratia, R., Goldstein, L. and Gordon L. (1989). Two moments suffice for Poisson approximation: the Chen-Stein method. Ann. Probab. 17 9-25. · Zbl 0675.60017 [5] Arratia, R., Gordon, L. and Waterman, M. S. (1990). The Erdös-Rényi Law in distribution for coin tossing and sequence matching. Ann. Statist. 18 539-570. · Zbl 0712.92016 [6] Asmussen, S. (1989). Risk theory in a Markovian environment. Scand. Actuarial J. 69-100. · Zbl 0684.62073 [7] Athreya, K. B., McDonald, D. and Ney. P. (1978). Limit theorems for semi-Markov processes and renewal theory for Markov chains. Ann. Probab. 6 788-797. · Zbl 0397.60052 [8] Chung, K. L. (1974). A Course in Probability Theory. Academic Press, New York. · Zbl 0345.60003 [9] Dembo, A., Karlin, S. and Zeitouni, O. (1994). Limit distribution of maximal non-aligned twosequence segmental score. Ann. probab. 22 2022-2039. · Zbl 0836.60023 [10] Durbin, R., Eddy, S., Krogh, A. and Mitchison, G. (1998). Biological Sequence Analysis. Cambridge Univ. Press. · Zbl 0929.92010 [11] Durrett, R. (1990). Probability: Theory and Examples. Duxbury Press, Belmont, CA. · Zbl 0709.60002 [12] Hogan, M. and Siegmund, D. (1986). Large deviations for the maximum of some random fields, Adv. in Appl. Math. 7 2-22. · Zbl 0612.60029 [13] Karlin, S. and Dembo, A. (1992). Limit distributions of maximal segmental score among Markovdependent parital sums. Adv. in Appl. Probab. 24 113-140. JSTOR: · Zbl 0767.60017 [14] Lezaud, P. (1998). Chernoff-type bound for finite Markov chains. Ann. Appl. Probab. 8 849-867. · Zbl 0938.60027 [15] Mott, R. and Tribe, R. (1999). Approximate statistics of gapped alignments. J. Comput. Biol. 6 91-112. [16] Neuhauser, C. (1994). A Poisson approximation for sequence comparisons with insertions and deletions. Ann. Statist. 22 1603-1629. · Zbl 0817.62013 [17] Pearson, W. R. (1995). Comparison of methods for searching protein databases. Protein Sci. 4 1145-1160. [18] Siegmund, D. (1985). Sequential Analysis: Tests and Confidence Intervals. Springer, New York. · Zbl 0573.62071 [19] Siegmund, D. and Yakir, B. (2000). Tail probabilities for the null distribution of scanning statistics. Bernoulli 6 191-213. · Zbl 0976.62048 [20] Smith, T. F. and Waterman, M. S. (1981). Identification of common molecular subsequences. J. Molecular Biol. 147 195-197. [21] Waterman, M. (1995). Introduction to Computational Biology: Maps, Sequences and Genomes. Chapman and Hall, London. · Zbl 0831.92011 [22] Waterman, M. and Vingron, M. (1994). Sequence comparison and Poisson approximation. Statist. Sci. 9 367-381. · Zbl 0955.92501 [23] Williams, D. (1991). Probability and Martingales. Cambridge Univ. Press. · Zbl 0722.60001 [24] Woodroofe, M. (1982). Nonlinear Renewal Theory in Sequential Analysis. SIAM, Philadelphia. · Zbl 0487.62062 [25] Yakir, B. and Pollak, M. (1998). A new representation for a renewal-theoretic constant appearing in asymptotic approximations of large deviations. Ann. Appl. Probab. 8 749-774. · Zbl 0937.60082
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.