Journal article  Open Access

The data-literature interlinking service: towards a common infrastructure for sharing data-article links

Burton A., Koers H., Manghi P., La Bruzzo S., Aryani A., Diepenbroek M., Schindler U.

scholexplorer  Scholarly communication  Data-literature links  research data  Data-publication interlinking  Library and Information Sciences  Information Systems  OpenAIRE  semantic relationships  RDA  scholix  Data citation 

Purpose Research data publishing is today widely regarded as crucial for reproducibility, proper assessment of scientific results, and as a way for researchers to get proper credit for sharing their data. However, several challenges need to be solved to fully realize its potential, one of them being the development of a global standard for links between research data and literature. Current linking solutions are mostly based on bilateral, ad hoc agreements between publishers and data centers. These operate in silos so that content cannot be readily combined to deliver a network graph connecting research data and literature in a comprehensive and reliable way. The Research Data Alliance (RDA) Publishing Data Services Working Group (PDS-WG) aims to address this issue of fragmentation by bringing together different stakeholders to agree on a common infrastructure for sharing links between datasets and literature. The paper aims to discuss these issues. Design/methodology/approach This paper presents the synergic effort of the RDA PDS-WG and the OpenAIRE infrastructure toward enabling a common infrastructure for exchanging data-literature links by realizing and operating the Data-Literature Interlinking (DLI) Service. The DLI Service populates and provides access to a graph of data set-literature links (at the time of writing close to five million, and growing) collected from a variety of major data centers, publishers, and research organizations. Findings To achieve its objectives, the Service proposes an interoperable exchange data model and format, based on which it collects and publishes links, thereby offering the opportunity to validate such common approach on real-case scenarios, with real providers and consumers. Feedback of these actors will drive continuous refinement of the both data model and exchange format, supporting the further development of the Service to become an essential part of a universal, open, cross-platform, cross-discipline solution for collecting, and sharing data set-literature links. Originality/value This realization of the DLI Service is the first technical, cross-community, and collaborative effort in the direction of establishing a common infrastructure for facilitating the exchange of data set-literature links. As a result of its operation and underlying community effort, a new activity, name Scholix, has been initiated involving the technological level stakeholders such as DataCite and CrossRef.

Source: Program (Lond., 1966) 51 (2017): 75–100. doi:10.1108/PROG-06-2016-0048

Publisher: Emerald, Bradford, Regno Unito


Aalbersberg, I.J.J., Dunham, J. and Koers, H. (2011), “Connecting scientific articles with research data: new directions in online scholarly publishing”, Proceedings of the 1st ICSU World Data Systems Conference.
Artini, M., Atzori, C. and Manghi, P. (2014), “Keeping your aggregative infrastructure under control”, IEEE/ACM Joint Conference on Digital Libraries ( JCDL), IEEE.
Artini, M., Atzori, C., Bardi, A., La Bruzzo, S., Manghi, P. and Mannocci, A. (2015), “The OpenAIRE literature broker service for institutional repositories”, D-Lib Magazine, Vol. 21 No. 11, p. 3.
Atzori, C. (2015), “gDup: an integrated and scalable graph de-duplication system”, PhD thesis, Department of Informatics and Engineering, University of Pisa, available at: https://etd.adm. unipi.it/t/etd-05092016-090250
Bardi, A., Manghi, P. and Zoppi, F. (2014), “Coping with interoperability and sustainability in cultural heritage aggregative data infrastructures”, International Journal of Metadata, Semantics and Ontologies, Vol. 9 No. 2, pp. 138-154.
Burton, A. and Koers, H. (2016), “Interoperability framework recommendations”, available at: https://sites.google.com/a/scholix.org/scholix/guidelines
Burton, A., Koers, H., Manghi, P., La Bruzzo, S., Aryani, A., Diepenbroek, M. and Schindler, U. (2015), “On bridging data centers and publishers: the data-literature interlinking service”, Metadata and Semantics Research, Springer International Publishing, pp. 324-335.
Callaghan, S., Tedds, J., Lawrence, R., Murphy, F., Roberts, T. and Wilcox, W. (2014), “Cross-linking between journal publications and data repositories: a selection of examples”, International Journal of Digital Curation, doi: 10.2218/ijdc.v9i1.310.
Castelli, D., Manghi, P. and Thanos, C. (2013), “A vision towards scientific communication infrastructures”, International Journal on Digital Libraries, Vol. 13 Nos 3-4, pp. 155-169.
Hanson, K.L., DiLauro, T. and Donoghue, M. (2015), “The RMap project: capturing and preserving associations amongst multi-part distributed publications”, Proceedings of the 15th ACM/IEEECE on Joint Conference on Digital Libraries, ACM, pp. 281-282.
Klein, M., Van de Sompel, H., Sanderson, R., Shankar, H., Balakireva, L., Zhou, K. et al. (2014), “Scholarly context not found: one in five articles suffers from reference rot”, PLoS ONE, Vol. 9 No. 12, p. e115253, doi: 10.1371/journal.pone.0115253.
Kobourov, S.G. (2012), “Spring embedders and force directed graph drawing algorithms”, arXiv preprint arXiv: 1201.3011.
Manghi, P., Mikulicic, M. and Atzori, C. (2012), “De-duplication of aggregation authority files”, International Journal of Metadata, Semantics and Ontologies, Vol. 7 No. 2, pp. 114-130.
Manghi, P., Bolikowski, L., Manold, N., Schirrwagen, J. and Smith, T. (2012), “Openaireplus: the European scholarly communication data infrastructure”, D-Lib Magazine, Vol. 18 No. 9, p. 1.
Manghi, P., Artini, M., Atzori, C., Bardi, A., Mannocci, A., La Bruzzo, S., Candela, L., Castelli, D. and Pagano, P. (2014), “The D-NET software toolkit: a framework for the realization, maintenance, and operation of aggregative infrastructures”, Program: Electronic Library and Information Systems, Vol. 48 No. 4, pp. 322-354.
Mannocci, A. and Manghi, P. (2016), “DataQ: a data flow quality monitoring system for aggregative data infrastructures”, 20th International Conference on Theory and Practice of Digital Libraries, TPDL 2016, Proceedings. Lecture Notes in Computer Science, Springer, Hannover.
Pepe, A., Goodman, A., Muench, A., Crosas, M. and Erdmann, E. (2014), “How do astronomers share data? Reliability and persistence of datasets linked in AAS publications and a qualitative study of data practices among US astronomers”, PLOS One, doi: 10.1371/journal.pone.0104798.
Publishing Data Services Working Group Case Statement. available at: https://www.rd-alliance.org/ filedepot/folder/114?fid=239
Smit, E. (2011), “Abelard and Héloise: why data and publications belong together”, D-Lib Magazine, Vol. 17, doi: 10.1045/january2011-smit.
Vahdati, S., Karim, F., Huang, J.Y. and Lange, C. (2015), “Mapping large scale research metadata to linked data: a performance comparison of HBase, CSV and XML”, Research Conference on Metadata and Semantics Research, Springer International Publishing, pp. 261-273.

Back to previous page
Projects (via OpenAIRE)

Open Access Infrastructure for Research in Europe 2020

RDA Europe – the European plug-in to the global Research Data Alliance (RDA)

RDA Europe
Research Data Alliance - Europe 3

Research Data Alliance Europe

BibTeX entry
	title = {The data-literature interlinking service: towards a common infrastructure for sharing data-article links},
	author = {Burton A. and Koers H. and Manghi P. and La Bruzzo S. and Aryani A. and Diepenbroek M. and Schindler U.},
	publisher = {Emerald, Bradford, Regno Unito},
	doi = {10.1108/prog-06-2016-0048 and 10.5281/zenodo.3776087 and 10.5281/zenodo.3776086},
	journal = {Program (Lond., 1966)},
	volume = {51},
	pages = {75–100},
	year = {2017}