zbMATH — the first resource for mathematics

Phasing of 2-SNP genotypes based on non-random mating model. (English) Zbl 1155.92311
Alexandrov, Vassil N. (ed.) et al., Computational science – ICCS 2006. 6th international conference, Reading, UK, May 28–31, 2006. Proceedings, Part II. Berlin: Springer (ISBN 3-540-34381-4/pbk). Lecture Notes in Computer Science 3992, 767-774 (2006).
Summary: Emerging microarray technologies allow genotyping of long genome sequences resulting in huge amount of data. A key challenge is to provide an accurate phasing of very long single nucleotide polymorphism (SNP) sequences. In this paper we explore phasing of genotypes with 2 SNPs adjusted to the non-random mating model and then apply it to the haplotype inference of complete genotypes using maximum spanning trees. The runtime of the algorithm is \(O(nm(n+m))\), where \(n\) and \(m\) are the number of genotypes and SNPs, respectively. The proposed phasing algorithm (2SNP) can be used for comparatively accurate phasing of large number of very long genome sequences. On datasets across 79 regions from HapMap 2SNP is several orders of magnitude faster than GERBIL and PHASE while matching them in quality measured by the number of correctly phased genotypes, single-site and switching errors. For example, 2SNP requires 41 s on Pentium 4 2Ghz processor to phase 30 genotypes with 1381 SNPs (ENm010.7p15:2 data from HapMap) versus GERBIL and PHASE requiring more than a week of runtime and admitting no less errors than 2SNP. 2SNP software is publicly available at http://alla.cs.gsu.edu/~software/2SNP.
For the entire collection see [Zbl 1107.68003].
92C40 Biochemistry, molecular biology
65Y20 Complexity and performance of numerical algorithms
92D10 Genetics and epigenetics
05C90 Applications of graph theory
Full Text: DOI