Atzori C, Manghi P, Bardi A
open science Scholarly communication Graph entity deduplication Deduplication duplicate identification OpenAIRE Big data
The OpenAIRE infrastructure populates a scholarly communication big graph interlinking metadata objects of publications, datasets, software, organizations, funders, and projects. In order to de-duplicate this graph, OpenAIRE has developed GDup, an integrated, scalable, general-purpose system for entity deduplication over big information graphs. GDup offers functionalities to realize a hilly-fledged entity deduplication workflow over a generic input graph, inclusive of Ground Truth support, end-user feedback, and strategies for identifying and merging duplicates to obtain an output disambiguated graph.
@inproceedings{oai:it.cnr:prodotti:402402, title = {De-duplicating the OpenAIRE scholarly communication big graph}, author = {Atzori C and Manghi P and Bardi A}, doi = {10.1109/escience.2018.00104 and 10.5281/zenodo.1489139 and 10.5281/zenodo.1489140}, year = {2018} }
Bibliographic record
Deposited version
Deposited version
10.1109/escience.2018.00104
10.5281/zenodo.1489139
10.5281/zenodo.1489140
OpenAIRE2020
Open Access Infrastructure for Research in Europe 2020
OpenAIRE-Advance
OpenAIRE Advancing Open Scholarship