2023
Journal article  Open Access

Matchtigs: minimum plain text representation of k-mer sets

Schmidt S., Khan S., Alanko J. N., Pibiri G. E., Tomescu A. I.

k-mer sets  Plain text compression  Graph algorithm  Sequence analysis  Genomic sequences  Minimum-cost flow  Chinese postman problem 

We propose a polynomial algorithm computing a minimum plain-text representation of k-mer sets, as well as an efficient near-minimum greedy heuristic. When compressing read sets of large model organisms or bacterial pangenomes, with only a minor runtime increase, we shrink the representation by up to 59% over unitigs and 26% over previous work. Additionally, the number of strings is decreased by up to 97% over unitigs and 90% over previous work. Finally, a small representation has advantages in downstream applications, as it speeds up SSHash-Lite queries by up to 4.26× over unitigs and 2.10× over previous work.

Source: Genome biology (Online) 24 (2023). doi:10.1186/s13059-023-02968-z

Publisher: BioMed Central Ltd., London , Regno Unito


Metrics



Back to previous page
BibTeX entry
@article{oai:it.cnr:prodotti:484067,
	title = {Matchtigs: minimum plain text representation of k-mer sets},
	author = {Schmidt S. and Khan S. and Alanko J. N. and Pibiri G. E. and Tomescu A. I.},
	publisher = {BioMed Central Ltd., London , Regno Unito},
	doi = {10.1186/s13059-023-02968-z},
	journal = {Genome biology (Online)},
	volume = {24},
	year = {2023}
}

MobiDataLab
Labs for prototyping future Mobility Data sharing cloud solutions

SAFEBIO
Safe and Complete Algorithms for Bioinformatics


OpenAIRE