Document - Cache-oblivious peeling of random hypergraphs

2014

Conference article Open Access

Cache-oblivious peeling of random hypergraphs

Belazzougui D., Boldi P., Ottaviano G., Venturini R., Vigna S.

Perfect hash functions FOS: Computer and information sciences Cache-oblivious algorithms Computer Science - Data Structures and Algorithms Hypergraph algorithms Data Structures and Algorithms (cs.DS)

The computation of a peeling order in a randomly generated hypergraph is the most time- consuming step in a number of constructions, such as perfect hashing schemes, random r-SAT solvers, error-correcting codes, and approximate set encodings. While there exists a straightforward linear time algorithm, its poor I/O performance makes it impractical for hypergraphs whose size exceeds the available internal memory. We show how to reduce the computation of a peeling order to a small number of sequential scans and sorts, and analyze its I/O complexity in the cache-oblivious model. The resulting algorithm requires O(sort(n)) I/Os and O(n log n) time to peel a random hypergraph with n edges. We experimentally evaluate the performance of our implementation of this algorithm in a real- world scenario by using the construction of minimal perfect hash functions (MPHF) as our test case: our algorithm builds a MPHF of 7.6 billion keys in less than 21 hours on a single machine. The resulting data structure is both more space-efficient and faster than that obtained with the current state-of-the-art MPHF construction for large-scale key sets.

Source: DCC 2014 - Data Compression Conference, pp. 352–361, Snowbird, Utah, USA, 26-28 March 2014

Citations

[1] Djamal Belazzougui, Paolo Boldi, Rasmus Pagh, and Sebastiano Vigna. Monotone minimal perfect hashing: Searching a sorted table with O(1) accesses. In SODA, pages 785-794, 2009.
[2] Djamal Belazzougui, Paolo Boldi, Rasmus Pagh, and Sebastiano Vigna. Fast prefix search in little space, with applications. In ESA (1), pages 427-438, 2010.
[3] Djamal Belazzougui, Paolo Boldi, Rasmus Pagh, and Sebastiano Vigna. Theory and practice of monotone minimal perfect hashing. ACM Journal of Experimental Algorithmics, 16, 2011.
[4] Djamal Belazzougui and Gonzalo Navarro. Alphabet-independent compressed text indexing. In ESA (1), pages 748-759, 2011.
[5] Djamal Belazzougui and Rossano Venturini. Compressed static functions with applications. In SODA, pages 229-240, 2013.
[6] Fabiano C. Botelho, Rasmus Pagh, and Nivio Ziviani. Practical perfect hashing in nearly optimal space. Inf. Syst., 38(1):108-131, 2013.
[7] Fabiano C. Botelho, Nicholas Wormald, and Nivio Ziviani. Cores of random r-partite hypergraphs. Information Processing Letters, 112(8-9):314 - 319, 2012.
[8] Gerth Stølting Brodal and Rolf Fagerberg. On the limits of cache-obliviousness. In STOC, pages 307-315, 2003.
[9] Denis Charles and Kumar Chellapilla. Bloomier filters: A second look. In ESA, pages 259-270, 2008.
[10] Bernard Chazelle, Joe Kilian, Ronitt Rubinfeld, and Ayellet Tal. The Bloomier filter: an efficient data structure for static support lookup tables. In SODA, pages 30-39, 2004.
[11] Zbigniew J. Czech, George Havas, and Bohdan S. Majewski. Perfect hashing. Theor. Comput. Sci., 182(1-2):1-143, 1997.
[12] Martin Dietzfelbinger, Andreas Goerdt, Michael Mitzenmacher, Andrea Montanari, Rasmus Pagh, and Michael Rink. Tight thresholds for cuckoo hashing via xorsat. In ICALP (1), pages 213-225, 2010.
[13] Peter Elias. Universal codeword sets and representations of the integers. IEEE Transactions on Information Theory, 21(2):194-203, 1975.
[14] Matteo Frigo, Charles E. Leiserson, Harald Prokop, and Sridhar Ramachandran. Cache-oblivious algorithms. ACM Transactions on Algorithms, 8(1):4, 2012.
[15] Michael T. Goodrich and Michael Mitzenmacher. Invertible Bloom lookup tables. In CCC, pages 792-799, 2011.
[16] Torben Hagerup and Torsten Tholey. Efficient minimal perfect hashing in nearly minimal space. In STACS, pages 317-326. 2001.
[17] G. Jacobson. Space-efficient static trees and graphs. In FOCS, pages 549-554, 1989.
[18] Jiayang Jiang, Michael Mitzenmacher, and Justin Thaler. Parallel peeling algorithms. CoRR, abs/1302.7014, 2013.
[19] Donald E. Knuth. The Art of Computer Programming. Pre-Fascicle 1A. Draft of Section 7.1.3: Bitwise Tricks and Techniques, 2007.
[20] Michael Luby, Michael Mitzenmacher, Mohammad Amin Shokrollahi, and Daniel A. Spielman. Efficient erasure correcting codes. IEEE Transactions on Information Theory, 47(2):569-584, 2001.
[21] Bohdan S. Majewski, Nicholas C. Wormald, George Havas, and Zbigniew J. Czech. A family of perfect hashing methods. Comput. J., 39(6):547-554, 1996.
[22] Kurt Mehlhorn. On the program size of perfect and universal hash functions. In FOCS, pages 170-175, 1982.
[23] Michael Mitzenmacher and George Varghese. Biff (Bloom filter) codes: Fast error correction for large data sets. In ISIT, pages 483-487, 2012.
[24] Michael Molloy. The pure literal rule threshold and cores in random hypergraphs. In SODA, pages 672-681, 2004.
[25] Michael Molloy. Cores in random hypergraphs and Boolean formulas. Random Structures and Algorithms, 27(1):124-135, 2005.

Metrics

Back to previous page

Cite as

BibTeX entry

@inproceedings{oai:it.cnr:prodotti:282742,
	title = {Cache-oblivious peeling of random hypergraphs},
	author = {Belazzougui D. and Boldi P. and Ottaviano G. and Venturini R. and Vigna S.},
	doi = {10.1109/dcc.2014.48 and 10.48550/arxiv.1312.0526},
	booktitle = {DCC 2014 - Data Compression Conference, pp. 352–361, Snowbird, Utah, USA, 26-28 March 2014},
	year = {2014}
}