2014
Journal article  Open Access

Fast compressed tries through path decompositions

Grossi R., Ottaviano G.

Monotone perfect hash functions  String dictionaries  Succinct data structures  FOS: Computer and information sciences  Computer Science - Data Structures and Algorithms  Theoretical Computer Science  Tries  Data Structures and Algorithms (cs.DS) 

Tries are popular data structures for storing a set of strings, where common prefixes are represented by common root-to-node paths. More than 50 years of usage have produced many variants and implementations to overcome some of their limitations.We explore new succinct representations of path-decomposed tries and experimentally evaluate the corresponding reduction in space usage and memory latency, comparing with the state of the art.We study the following applications: compressed string dictionary and monotone minimal perfect hash for strings. In compressed string dictionary, we obtain data structures that outperform other state-of-the-art compressed dictionaries in space efficiency while obtaining predictable query times that are competitive with data structures preferred by the practitioners. On real-world datasets, our compressed tries obtain the smallest space (except for one case) and have the fastest lookup times, whereas access times are within 20% slower than the best-known solutions. In monotone minimal perfect hash for strings, our compressed tries perform several times faster than other trie-based monotone perfect hash functions while occupying nearly the same space. On real-world datasets, our tries are approximately 2 to 5 times faster than previous solutions, with a space occupancy less than 10% larger. Categories and Subject Descriptors: H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval; E.1 [Data Structures]: Trees General Terms: Algorithms, Experimentation, Performance.

Source: ACM journal of experimental algorithmics 19 (2014). doi:10.1145/2656332

Publisher: ACM, New York , Stati Uniti d'America


[1] A. Acharya, H. Zhu, and K. Shen. Adaptive algorithms for cache-efficient trie search. In M. T. Goodrich and C. C. McGeoch, editors, Algorithm Engineering and Experimentation, International Workshop ALENEX '99, Baltimore, MD, USA, January 15-16, 1999, Selected Papers, volume 1619 of Lecture Notes in Computer Science, pages 296-311. Springer, 1999.
[2] AOL search data. http://www.gregsadetsky.com/aol-data/.
[3] D. Arroyuelo, R. Cánovas, G. Navarro, and K. Sadakane. Succinct trees in practice. In ALENEX, pages 84-97, 2010.
[4] D. Belazzougui, P. Boldi, R. Pagh, and S. Vigna. Monotone minimal perfect hashing: searching a sorted table with O(1) accesses. In SODA, pages 785-794, 2009.
[5] D. Belazzougui, P. Boldi, R. Pagh, and S. Vigna. Theory and practise of monotone minimal perfect hashing. In ALENEX, pages 132-144, 2009.
[6] M. A. Bender, M. Farach-Colton, and B. C. Kuszmaul. Cache-oblivious string B-trees. In S. Vansummeren, editor, Proceedings of the Twenty-Fifth ACM SIGACT-SIGMODSIGART Symposium on Principles of Database Systems, June 26-28, 2006, Chicago, Illinois, USA, pages 233-242. ACM, 2006.
[7] D. Benoit, E. D. Demaine, J. I. Munro, R. Raman, V. Raman, and S. S. Rao. Representing trees of higher degree. Algorithmica, 43(4):275-292, 2005.
[8] D. K. Blandford and G. E. Blelloch. Compact dictionaries for variable-length keys and data with applications. ACM Transactions on Algorithms, 4(2), 2008.
[9] P. Boldi, B. Codenotti, M. Santini, and S. Vigna. Ubicrawler: A scalable fully distributed web crawler. Software: Practice & Experience, 34(8):711-726, 2004.
[10] N. R. Brisaboa, R. Cánovas, F. Claude, M. A. Martínez-Prieto, and G. Navarro. Compressed string dictionaries. In SEA, pages 136-147, 2011.
[11] G. S. Brodal and R. Fagerberg. Cache-oblivious string dictionaries. In SODA, pages 581-590. ACM Press, 2006.
[12] S.-Y. Chiu, W.-K. Hon, R. Shah, and J. S. Vitter. I/O-efficient compressed text indexes: From theory to practice. In J. A. Storer and M. W. Marcellin, editors, 2010 Data Compression Conference (DCC 2010), 24-26 March 2010, Snowbird, UT, USA, pages 426-434. IEEE Computer Society, 2010.
[13] D. R. Clark. Compact pat trees. PhD thesis, University of Waterloo, Waterloo, Ont., Canada, Canada, 1998. UMI Order No. GAXNQ-21335.
[14] F. Claude and G. Navarro. Fast and compact web graph representations. TWEB, 4(4), 2010.
[15] P. Elias. Efficient storage and retrieval by content and address of static files. Journal of the ACM (JACM), 21(2):246-260, 1974.
[16] R. Fano. On the number of bits required to implement an associative memory. Memorandum 61. Computer Structures Group, Project MAC, MIT, Cambridge, Mass., nd, 1971.
[17] P. Ferragina and R. Grossi. The string B-tree: A new data structure for string search in external memory and its applications. J. ACM, 46(2):236-280, 1999.
[18] P. Ferragina, R. Grossi, A. Gupta, R. Shah, and J. S. Vitter. On searching compressed string collections cache-obliviously. In PODS, pages 181-190, 2008.
[19] P. Ferragina, F. Luccio, G. Manzini, and S. Muthukrishnan. Compressing and indexing labeled trees, with applications. J. ACM, 57(1), 2009.
[20] R. Grossi and J. S. Vitter. Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM J. Comput., 35(2):378-407, 2005.
[21] G. Jacobson. Space-efficient static trees and graphs. In FOCS, pages 549-554, 1989.
[22] D. E. Knuth. The Art of Computer Programming, Volume 4, Fascicle 1: Bitwise Tricks & Techniques; Binary Decision Diagrams. Addison-Wesley Professional, 12th edition, 2009.
[24] N. J. Larsson and A. Moffat. Offline dictionary-based compression. In Data Compression Conference, pages 296-305, 1999.
[25] J. I. Munro and V. Raman. Succinct representation of balanced parentheses, static trees and planar graphs. In FOCS, pages 118-126, 1997.
[26] J. I. Munro and V. Raman. Succinct representation of balanced parentheses and static trees. SIAM Journal on Computing, 31(3):762-776, June 2001.
[27] D. Okanohara and K. Sadakane. Practical entropy-compressed rank/select dictionary. In ALENEX, 2007.
[28] K. Sadakane and G. Navarro. Fully-functional succinct trees. In SODA, pages 134-149, 2010.
[29] D. D. Sleator and R. E. Tarjan. A data structure for dynamic trees. In STOC, pages 114-122, 1981.
[30] Sux: Implementing Succinct Data Structures. http://sux.dsi.unimi.it/.

Metrics



Back to previous page
BibTeX entry
@article{oai:it.cnr:prodotti:303200,
	title = {Fast compressed tries through path decompositions},
	author = {Grossi R. and Ottaviano G.},
	publisher = {ACM, New York , Stati Uniti d'America},
	doi = {10.1145/2656332 and 10.48550/arxiv.1111.5220},
	journal = {ACM journal of experimental algorithmics},
	volume = {19},
	year = {2014}
}