Stochastic local search for large-scale instances of the haplotype inference problem by pure parsimony.

*(English)*Zbl 1151.68389Summary: Haplotype Inference is a challenging problem in bioinformatics that consists in inferring the basic genetic constitution of diploid organisms on the basis of their genotype. This information allows researchers to perform association studies for the genetic variants involved in diseases and the individual responses to therapeutic agents. A notable approach to the problem is to encode it as a combinatorial problem (under certain hypotheses, such as the pure parsimony criterion) and to solve it using off-the-shelf combinatorial optimization techniques. The main methods applied to Haplotype Inference are either simple greedy heuristic or exact methods (Integer Linear Programming, Semidefinite Programming, SAT and pseudo-boolean encoding) that, at present, are adequate only for moderate size instances. In this paper, we present and discuss an approach based on the combination of local search metaheuristics and a reduction procedure based on an analysis of the problem structure. Some relevant design issues are first described, then a family of local search metaheuristics is defined to tackle the Haplotype Inference. Results on common Haplotype Inference benchmarks show that the approach achieves a good trade-off between solution quality and execution time.

##### MSC:

68P10 | Searching and sorting |

68T05 | Learning and adaptive systems in artificial intelligence |

68W05 | Nonnumerical algorithms |

90C05 | Linear programming |

92D10 | Genetics and epigenetics |