Atzori C., Manghi P., Bardi A.
open science Scholarly communication Graph entity deduplication Deduplication duplicate identification OpenAIRE Big data
The OpenAIRE infrastructure populates a scholarly communication big graph interlinking metadata objects of publications, datasets, software, organizations, funders, and projects. In order to de-duplicate this graph, OpenAIRE has developed GDup, an integrated, scalable, general-purpose system for entity deduplication over big information graphs. GDup offers functionalities to realize a hilly-fledged entity deduplication workflow over a generic input graph, inclusive of Ground Truth support, end-user feedback, and strategies for identifying and merging duplicates to obtain an output disambiguated graph.
Source: e-science 2018 - 14th IEEE International Conference on e-Science (e-Science), pp. 372–373, Amsterdam, the Netherlands, 29 October - 01 November 2018
@inproceedings{oai:it.cnr:prodotti:402402, title = {De-duplicating the OpenAIRE scholarly communication big graph}, author = {Atzori C. and Manghi P. and Bardi A.}, doi = {10.1109/escience.2018.00104 and 10.5281/zenodo.1489139 and 10.5281/zenodo.1489140}, booktitle = {e-science 2018 - 14th IEEE International Conference on e-Science (e-Science), pp. 372–373, Amsterdam, the Netherlands, 29 October - 01 November 2018}, year = {2018} }
10.1109/escience.2018.00104
10.5281/zenodo.1489139
10.5281/zenodo.1489140
OpenAIRE2020
Open Access Infrastructure for Research in Europe 2020
OpenAIRE-Advance
OpenAIRE Advancing Open Scholarship