Joint correction of cross-talk and peak spreading in DNA electropherograms
Tonazzini A., Bedini L.
Blind source separation
In automated DNA sequencing, the final algorithmic phase, referred to as basecalling, consists of the translation of four time signals in the form of peak sequences (electropherogram) to the corresponding sequence of bases. The most popular basecaller, Phred, detects the peaks based on heuristics, and is very efficient when the peaks are well distinct and quite regular in spread, amplitude and spacing. Unfortunately, in the practice the data is subject to several degradations, particularly near the end of the sequence. The most frequent ones are peak superposition, peak merging and signal leakage, resulting in secondary peaks. In these conditions the experiment must be repeated and the human intervention is required. Recently, there have been attempts to provide methodological foundations to the problem and use statistical models to solve it. In this paper, we propose exploiting a priori information and Bayesian estimation to remove degradations and recover the signals in an impulsive form which makes the task of basecalling straightforward.
Source: RECOMB 2006. The 10th Annual International Conference on Research in Computational Molecular Biology, Venice, 01-04/04/2006Back to previous page