2020
Article  Open Access

Compressed Indexes for Fast Search of Semantic Data

Pibiri G. E., Perego R., Venturini R.

Computational Theory and Mathematics  Computer Science - Information Retrieval  Computer Science Applications  Computer Science - Databases  RDF  Efficiency  Search  Information Systems  Triple Indexing  data structures  compression 

The sheer increase in volume of RDF data demands efficient solutions for the triple indexing problem, that is to devise a compressed data structure to compactly represent RDF triples by guaranteeing, at the same time, fast pattern matching operations. This problem lies at the heart of delivering good practical performance for the resolution of complex SPARQL queries on large RDF datasets. In this work, we propose a trie-based index layout to solve the problem and introduce two novel techniques to reduce its space of representation for improved effectiveness. The extensive experimental analysis, conducted over a wide range of publicly available real-world datasets, reveals that our best space/time trade-off configuration substantially outperforms existing solutions at the state-of-the-art, by taking 30 - 60% less space and speeding up query execution by a factor of 2-81× .

Source: IEEE transactions on knowledge and data engineering (Print) (2020): 1–11. doi:10.1109/TKDE.2020.2966609

Publisher: Institute of Electrical and Electronics Engineers,, New York, NY , Stati Uniti d'America


Citations

[1] Güneş Aluç, Olaf Hartig, M Tamer Özsu, and Khuzaima Daudjee. 2014. Diversified stress testing of RDF data management systems. In International Semantic Web Conference. Springer, 197-212.
[2] Sandra Álvarez-García, Nieves Brisaboa, Javier D Fernández, Miguel A Martínez-Prieto, and Gonzalo Navarro. 2015. Compressed vertical partitioning for eficient RDF management. Knowledge and Information Systems 44, 2 (2015), 439-474.
[3] Medha Atre, Vineet Chaoji, Mohammed J Zaki, and James A Hendler. 2010. Matrix Bit loaded: a scalable lightweight join query processor for RDF data. In Proceedings of the 19th international conference on World wide web. ACM, 41-50.
[4] Sören Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. 2007. Dbpedia: A nucleus for a web of open data. (2007), 722-735.
[5] Tim Berners-Lee, James Hendler, and Ora Lassila. 2001. The Semantic Web. Scientific American 284, 5 (2001), 34-43.
[6] Nieves R Brisaboa, Ana Cerdeira-Pena, Antonio Farina, and Gonzalo Navarro. 2015. A compact RDF store using sufix arrays. In International Symposium on String Processing and Information Retrieval. Springer, 103-115.
[7] Nieves R Brisaboa, Susana Ladra, and Gonzalo Navarro. 2009. k 2-trees for compact web graph representation. In International Symposium on String Processing and Information Retrieval. Springer, 18-30.
[8] DBLP. 2017. Computer Science Bibliography. http://www.rdfhdt.org/datasets. (2017).
[9] Peter Elias. 1974. Eficient Storage and Retrieval by Content and Address of Static Files. Journal of the ACM (JACM) 21, 2 (1974), 246-260.
[10] Robert Mario Fano. 1971. On the number of bits required to implement an associative memory. Memorandum 61, Computer Structures Group, MIT (1971).
[11] Antonio Fariña, Nieves R Brisaboa, Gonzalo Navarro, Francisco Claude, Ángeles S Places, and Eduardo Rodríguez. 2012. Word-based self-indexes for natural language text. ACM Transactions on Information Systems (TOIS) 30, 1 (2012), 1.
[12] Javier D Fernández, Miguel A Martínez-Prieto, and Claudio Gutierrez. 2010. Compact representation of large RDF data sets for publishing and exchange. In International Semantic Web Conference. Springer, 193-208.
[13] geonames.org. 2012. Geonames. http://www.rdfhdt.org/datasets. (2012).
[14] Google. 2013. Freebase Data Dumps. https://developers.google.com/freebase. (2013).
[15] Roberto Grossi, Ankur Gupta, and Jefrey Scott Vitter. 2003. High-order entropy-compressed text indexes. In Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms. 841-850.
[16] LUBM. 2019. Lehigh University Benchmark. http://swat.cse.lehigh.edu/projects/lubm. (2019).
[17] Miguel A Martínez-Prieto, Mario Arias Gallego, and Javier D Fernández. 2012. Exchange and consumption of huge RDF data. In Extended Semantic Web Conference. Springer, 437-452.
[18] Thomas Neumann and Gerhard Weikum. 2010. The RDF-3X engine for scalable management of RDF data. The VLDB JournalâĂŤThe International Journal on Very Large Data Bases 19, 1 (2010), 91-113.
[19] Thomas Neumann and Gerhard Weikum. 2010. x-RDF-3X: fast querying, high update rates, and consistency for RDF databases. Proceedings of the VLDB Endowment 3, 1-2 (2010), 256-263.


Back to previous page