×

Sequencing from compomers: The puzzle. (English) Zbl 1153.91367

Summary: The board game Fragmind\(^{\text{TM}}\) poses the following problem: The player has to reconstruct an (unknown) string \(s\) over the alphabet \(\Sigma\). To this end, the game reports the following information to the player, for every character \(x \in \Sigma\): First, the string s is cleaved wherever the character \(x\) is found in \(s\). Second, every resulting fragment \(y\) is scrambled by a random permutation so that the only information left is how many times \(y\) contains each character \(\sigma \in \Sigma\). These scrambled fragments are then reported to the player.
Clearly, distinct strings can show identical cleavage patterns for all cleavage characters. In fact, even short strings of length \(30+\) usually have non-unique cleavage patterns. To this end, we introduce a generalization of the game setup called Sequencing from Compomers. We also generate those fragments of \(s\) that contain up to \(k\) uncleaved characters \(x\), for some small and fixed threshold \(k\). This modification dramatically increases the length of strings that can be uniquely reconstructed. We show that it is NP-hard to decide whether there exists some string compatible with the input data, but we also present a branch-and-bound runtime heuristic to find all such strings: The input data are transformed into subgraphs of the de Bruijn graph, and we search for walks in these subgraphs simultaneously.
The above problem stems from the analysis of mass spectrometry data from base-specific cleavage of DNA sequences, and gives rise to a completely new approach to DNA de-novo sequencing.

MSC:

91A43 Games involving graphs
92D20 Protein sequences, DNA sequences
05A05 Permutations, words, matrices
68Q25 Analysis of algorithms and problem complexity
PDFBibTeX XMLCite
Full Text: DOI