A phase transition for the score in matching random sequences allowing deletions. (English) Zbl 0809.62008

The paper examines a sequence matching problem (known as string matching in the computer science literature) involving the optional alignment score for contiguous subsequences, rewarding matches and penalizing for deletions and mismatches. This score is used by biologists comparing pairs of DNA or protein sequences. It is proved that for two sequences of length \(n\), as \(n \to \infty\), there is a phase transition between linear growth in \(n\), when the penalty parameters are small, and logarithmic growth in \(n\), when the penalties are large.
The results are valid for independent sequences with iid or Markov letters. The phase transition is also established for more general scoring schemes allowing general letter-to-letter alignment penalties and block deletion penalties. A general method for applying the bounded increments martingale method to Lipschitz functionals of marked processes is given.
Reviewer: V.P.Gupta (Jaipur)


62E20 Asymptotic distribution theory in statistics
62P10 Applications of statistics to biology and medical sciences; meta analysis
62P99 Applications of statistics
Full Text: DOI