2014
Conference article  Restricted

Bicriteria data compression: efficient and usable

Farruggia A., Ferragina P., Venturini R.

Data compression  E.4 CODING AND INFORMATION THEORY 

Lempel-Ziv's LZ77 algorithm is the de facto choice for compressing massive datasets (see e.g., Snappy in BigTable, Lz4 in Cassandra) because its algorithmic structure is flexible enough to guarantee very fast decompression speed at reasonable compressed-space occupancy. Recent theoretical results have shown how to design a bit-optimal LZ77-compressor which minimizes the compress size and how to deploy it in order to design a bicriteria data compressor, namely an LZ77-compressor which trades compressed-space occupancy versus its decompression time in a smoothed and principled way. Preliminary experiments were promising but raised many algorithmic and engineering questions which have to be addressed in order to turn these algorithmic results into an effective and practical tool. In this paper we address these issues by first designing a novel bit-optimal LZ77-compressor which is simple, cache-aware and asymptotically optimal. We benchmark our approach by investigating several algorithmic and implementation issues over many dataset types and sizes, and against an ample class of classic (LZ-based, PPM-based and BWT-based) as well as engineered compressors (Snappy, Lz4, and Lzma2). We conclude noticing how our novel bicriteria LZ77-compressor improves the state-of-the-art of fast (de) compressors Snappy and Lz4.

Source: ESA 2014 - Algorithms. 22th Annual European Symposium, pp. 406–417, Wroclaw, Poland, 8-10 September 2014


Metrics



Back to previous page
BibTeX entry
@inproceedings{oai:it.cnr:prodotti:305202,
	title = {Bicriteria data compression: efficient and usable},
	author = {Farruggia A. and Ferragina P. and Venturini R.},
	doi = {10.1007/978-3-662-44777-2_34},
	booktitle = {ESA 2014 - Algorithms. 22th Annual European Symposium, pp. 406–417, Wroclaw, Poland, 8-10 September 2014},
	year = {2014}
}