2016
Conference article  Open Access

RDF graph summarization based on approximate patterns

Zneika M., Lucchese C., Vodislav D., Kotzinos D.

federated query  approximate patterns  RDF query  [INFO.INFO-DB]Computer Science [cs]/Databases [cs.DB]  RDF graph summary  Approximate patterns  Federated query  Linked Open Data 

The Linked Open Data (LOD) cloud brings together information described in RDF and stored on the web in (possibly distributed) RDF Knowledge Bases (KBs). The data in these KBs are not necessarily described by a known schema and many times it is extremely time consuming to query all the interlinked KBs in order to acquire the necessary information. But even when the KB schema is known, we need actually to know which parts of the schema are used. We solve this problem by summarizing large RDF KBs using top-K approximate RDF graph patterns, which we transform to an RDF schema that describes the contents of the KB. This schema describes accurately the KB, even more accurately than an existing schema because it describes the actually used schema, which corresponds to the existing data. We add information on the number of various instances of the patterns, thus allowing the query to estimate the expected results. That way we can then query the RDF graph summary to identify whether the necessary information is present and if it is present in significant numbers whether to be included in a federated query result.

Source: 10th International Workshop on Information Search, Integration and Personalization (ISIP), pp. 69–87, Grand Forks, ND, USA, 1-2 October, 2015


1. Micah Adler and Michael Mitzenmacher. Towards compressing web graphs. In Data Compression Conference, 2001. Proceedings. DCC 2001., pages 203{212. IEEE, 2001.
2. Charu C Aggarwal and Haixun Wang. Managing and mining graph data, volume 40. Springer, 2010.
3. Anas Alzogbi and Georg Lausen. Similar structures inside rdf-graphs. In LDOW, 2013.
4. Stephane Campinas, Thomas E Perry, Diego Ceccarelli, Renaud Delbru, and Giovanni Tummarello. Introducing rdf graph summary with application to assisted sparql formulation. In Database and Expert Systems Applications (DEXA), 2012 23rd International Workshop on, pages 261{266. IEEE, 2012.
5. Francois Goasdoue and Ioana Manolescu. Query-oriented summarization of rdf graphs. Proceedings of the VLDB Endowment, 8(12), 2015.
6. Shahan Khatchadourian and Mariano Consens. Explod: Summary-based exploration of interlinking and rdf usage in the linked open data cloud. The Semantic Web: Research and Applications, pages 272{287, 2010.
7. Shahan Khatchadourian and Mariano P Consens. Exploring rdf usage and interlinking in the linked open data cloud using explod. In LDOW, 2010.
8. Shahan Khatchadourian and Mariano P Consens. Understanding billions of triples with usage summaries. Semantic Web Challenge, 2011.
9. Mathias Konrath, Thomas Gottron, and Ansgar Scherp. Schemex{web-scale indexed schema extraction of linked open data. Semantic Web Challenge, Submission to the Billion Triple Track, pages 52{58, 2011.
10. Mathias Konrath, Thomas Gottron, Ste en Staab, and Ansgar Scherp. Schemexe cient construction of a data catalogue by stream-based indexing of linked data. Web Semantics: Science, Services and Agents on the World Wide Web, 16:52{58, 2012.
11. Amine Louati, Marie-Aude Aufaure, Yves Lechevallier, and France ChatenayMalabry. Graph aggregation: Application to social networks. In HDSDA, pages 157{177, 2011.
12. Claudio Lucchese, Salvatore Orlando, and Ra aele Perego. Mining top-k patterns from binary datasets in presence of noise. In SDM, pages 165{176. SIAM, 2010.
13. Claudio Lucchese, Salvatore Orlando, and Ra aele Perego. A unifying framework for mining approximate top-k binary patterns. IEEE TKDE, 26:2900{2913, 2014.
14. Claudio Lucchese, Salvatore Orlando, and Ra aele Perego. Supervised evaluation of top-k itemset mining algorithms. In Big Data Analytics and Knowledge Discovery, pages 82{94. Springer, 2015.
15. P. Miettinen, T. Mielikainen, A. Gionis, G. Das, and H. Mannila. The discrete basis problem. IEEE TKDE, 20(10):1348{1362, Oct. 2008.
16. Pauli Miettinen and Jilles Vreeken. Model order selection for boolean matrix factorization. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 51{59, 2011.
17. Saket Navlakha, Rajeev Rastogi, and Nisheeth Shrivastava. Graph summarization with bounded error. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 419{432. ACM, 2008.
18. Sriram Raghavan and Hector Garcia-Molina. Representing web graphs. In Data Engineering, 2003. Proceedings. 19th International Conference on, pages 405{416. IEEE, 2003.
19. Jorma Rissanen. Modeling by shortest data description. Automatica, 14(5):465{ 471, 1978.
20. Alexander Schatzle, Antony Neu, Georg Lausen, and Martin Przyjaciel-Zablocki. Large-scale bisimulation of rdf graphs. In Proceedings of the Fifth Workshop on Semantic Web Information Management, page 1. ACM, 2013.
21. Yan Sun, Kongfa Hu, Zhipeng Lu, Li Zhao, and Ling Chen. A graph summarization algorithm based on r d logistics. Physics Procedia, 24:1707{1714, 2012.
22. Yuanyuan Tian, Richard A Hankins, and Jignesh M Patel. E cient aggregation for graph summarization. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 567{580. ACM, 2008.
23. Yuanyuan Tian and Jignesh M Patel. Interactive graph summarization. In Link Mining: Models, Algorithms, and Applications, pages 389{409. Springer, 2010.
24. Hannu Toivonen, Fang Zhou, Aleksi Hartikainen, and Atte Hinkka. Compression of weighted graphs. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 965{973. ACM, 2011.
25. Yang Xiang, Ruoming Jin, David Fuhry, and Feodor F. Dragan. Summarizing transactional databases with overlapped hyperrectangles. Data Min. Knowl. Discov., 23(2):215{251, September 2011.
26. Mohammed J. Zaki and Ching-Jui Hsiao. E cient algorithms for mining closed itemsets and their lattice structure. IEEE TKDE, 17(4):462{478, April 2005.
27. Haiwei Zhang, Yuanyuan Duan, Xiaojie Yuan, and Ying Zhang. Assg: adaptive structural summary for rdf graph data. ISWC, 2014.
28. Ning Zhang, Yuanyuan Tian, and Jignesh M Patel. Discovery-driven graph summarization. In Data Engineering (ICDE), 2010 IEEE 26th International Conference on, pages 880{891. IEEE, 2010.
29. Fang Zhou and Hannu Toivonen. Methods for network abstraction. PhD thesis, The Department of Computer Science at the University of Helsinki, 2012.

Metrics



Back to previous page
BibTeX entry
@inproceedings{oai:it.cnr:prodotti:424353,
	title = {RDF graph summarization based on approximate patterns},
	author = {Zneika M. and Lucchese C. and Vodislav D. and Kotzinos D.},
	doi = {10.1007/978-3-319-43862-7_4},
	booktitle = {10th International Workshop on Information Search, Integration and Personalization (ISIP), pp. 69–87, Grand Forks, ND, USA, 1-2 October, 2015},
	year = {2016}
}