A primer on open science-driven repository platforms Bardi A., Manghi P., Mannocci A., Ottonello E., Pavone G. Following Open Science mandates, institutions and communities increasingly demand repositories with native support for publishing scientific literature together with research data, software, and other research products. Such repositories may be thematic or general-purpose and are deeply integrated with the scholarly communication ecosystem to ensure versioning, persistent identifiers, data curation, usage stats, and so on. Identifying the most suitable off-the-shelf repository platform is often a non-trivial task as the choice depends on functional requirements, programming and technical skills, and infrastructure resources. This work analyses four state-of-the-art Open Source repository platforms, namely Dryad, Dataverse, DSpace, and InvenioRDM, from both a functional and a software perspective. This work intends to provide an overview serving as a primer for choosing repository platform solutions in different application scenarios. Moreover, this paper highlights how these platforms reacted to some key Open Science demands, moving away from the original and old-fashioned concept of a repository serving as a static container of files and metadata.Source: Metadata and Semantic Research, edited by Garoufallou E., Vlachidis A., pp. 222–234, 2023 DOI: 10.1007/978-3-031-39141-5_19 Project(s): OpenAIRE Nexus Metrics:
(Semi)automated disambiguation of scholarly repositories Baglioni M., Mannocci A., Pavone G., De Bonis M., Manghi P. The full exploitation of scholarly repositories is pivotal in modern Open Science, and scholarly repository registries are kingpins in enabling researchers and research infrastructures to list and search for suitable repositories. However, since multiple registries exist, repository managers are keen on registering multiple times the repositories they manage to maximise their traction and visibility across different research communities, disciplines, and applications. These multiple registrations ultimately lead to information fragmentation and redundancy on the one hand and, on the other, force registries' users to juggle multiple registries, profiles and identifiers describing the same repository. Such problems are known to registries, which claim equivalence between repository profiles whenever possible by cross-referencing their identifiers across different registries. However, as we will see, this "claim set" is far from complete and, therefore, many replicas slip under the radar, possibly creating problems downstream. In this work, we combine such claims to create duplicate sets and extend them with the results of an automated clustering algorithm run over repository metadata descriptions. Then we manually validate our results to produce an "as accurate as possible" de-duplicated dataset of scholarly repositories.Source: IRCDL 2023 - 19th conference on Information and Research Science Connecting to Digital and Library Science, pp. 47–59, Bari, Italy, 23-24/02/2023 Project(s): OpenAIRE Nexus
A novel curated scholarly graph connecting textual and data publications Irrera O., Mannocci A., Manghi P., Silvello G. In the last decade, scholarly graphs became fundamental to storing and managing scholarly knowledge in a structured and machine-readable way. Methods and tools for discovery and impact assessment of science rely on such graphs and their quality to serve scientists, policymakers, and publishers. Since research data became very important in scholarly communication, scholarly graphs started including dataset metadata and their relationships to publications. Such graphs are the foundations for Open Science investigations, data-article publishing workflows, discovery, and assessment indicators. However, due to the heterogeneity of practices (FAIRness is indeed in the making), they often lack the complete and reliable metadata necessary to perform accurate data analysis; e.g., dataset metadata is inaccurate, author names are not uniform, and the semantics of the relationships is unknown, ambiguous or incomplete.
This work describes an open and curated scholarly graph we built and published as a training and test set for data discovery, data connection, author disambiguation, and link prediction tasks. Overall the graph contains 4,047 publications, 5,488 datasets, 22 software, 21,561 authors; 9,692 edges interconnect publications to datasets and software and are labeled with semantics that outline whether a publication is citing, referencing, documenting, supplementing another product.
To ensure high-quality metadata and semantics, we relied on the information extracted from PDFs of the publications and the datasets and software webpages to curate and enrich nodes metadata and edges semantics. To the best of our knowledge, this is the first ever published resource, including publications and datasets with manually validated and curated metadata.Source: ACM journal of data and information quality (Online) 15 (2023). doi:10.1145/3597310 DOI: 10.1145/3597310 Project(s): OpenAIRE Nexus Metrics:
Will open science change authorship for good? Towards a quantitative analysis Mannocci A., Irrera O., Manghi P. Authorship of scientific articles has profoundly changed from early science until now.
If once upon a time a paper was authored by a handful of authors, scientific collaborations are much more prominent on average nowadays.
As authorship (and citation) is essentially the primary reward mechanism according to the traditional research evaluation frameworks, it turned to be a rather hot-button topic from which a significant portion of academic disputes stems.
However, the novel Open Science practices could be an opportunity to disrupt such dynamics and diversify the credit of the different scientific contributors involved in the diverse phases of the lifecycle of the same research effort.
In fact, a paper and research data (or software) contextually published could exhibit different authorship to give credit to the various contributors right where it feels most appropriate.
We argue that this can be computationally analysed by taking advantage of the wealth of information in model Open Science Graphs.
Such a study can pave the way to understand better the dynamics and patterns of authorship in linked literature, research data and software, and how they evolved over the years.Source: IRCDL 2022 - 18th Italian Research Conference on Digital Libraries, Padua, Italy, 24-25/02/2022 Project(s): OpenAIRE Nexus
InfraScience research activity report 2021 Artini M., Assante M., Atzori C., Baglioni M., Bardi A., Bove P., Candela L., Casini G., Castelli D., Cirillo R., Coro G., De Bonis M., Debole F., Dell'Amico A., Frosini L., La Bruzzo S., Lazzeri E., Lelii L., Manghi P., Mangiacrapa F., Mangione D., Mannocci A., Ottonello E., Pagano P., Panichi G., Pavone G., Piccioli T., Sinibaldi F., Straccia U. InfraScience is a research group of the National Research Council of Italy - Institute of Information Science and Technologies (CNR - ISTI) based in Pisa, Italy. This report documents the research activity performed by this group in 2021 to highlight the major results. In particular, the InfraScience group confronted with research challenges characterising Data Infrastructures, eScience, and Intelligent Systems. The group activity is pursued by closely connecting research and development and by promoting and supporting open science. In fact, the group is leading the development of two large scale infrastructures for Open Science, i.e. D4Science and OpenAIRE.
During 2021 InfraScience members contributed to the publishing of 25 papers, to the research and development activities of 18 research projects (15 funded by EU), to the organization of conferences and training events, to several working groups and task forces.Source: ISTI Annual report, 2022 DOI: 10.32079/isti-ar-2022/001 Project(s): ARIADNEplus , Blue Cloud , PerformFISH , EOSC-Pillar , DESIRA , EOSC Future , EOSCsecretariat.eu , EcoScope , RISIS 2 , OpenAIRE-Advance , OpenAIRE Nexus , SoBigData-PlusPlus Metrics:
BIP! scholar: a service to facilitate fair researcher assessment Vergoulis T., Chatzopoulos S., Vichos K., Kanellos I., Mannocci A., Manola N., Manghi P. In recent years, assessing the performance of researchers has become a burden due to the extensive volume of the existing research output. As a result, evaluators often end up relying heavily on a selection of performance indicators like the h-index. However, over-reliance on such indicators may result in reinforcing dubious research practices, while overlooking important aspects of a researcher's career, such as their exact role in the production of particular research works or their contribution to other important types of academic or research activities (e.g., production of datasets, peer reviewing). In response, a number of initiatives that attempt to provide guidelines towards fairer research assessment frameworks have been established. In this work, we present BIP! Scholar, a Web-based service that offers researchers the opportunity to set up profiles that summarise their research careers taking into consideration well-established guidelines for fair research assessment, facilitating the work of evaluators who want to be more compliant with the respective practices.Source: JCDL'22 - 22nd ACM/IEEE Joint Conference on Digital Libraries, Cologne, Germany, 20-24/06/2022 DOI: 10.1145/3529372.3533296 DOI: 10.48550/arxiv.2205.03152 Project(s): OpenAIRE Nexus Metrics:
Sci-K 2022 - International Workshop on Scientific Knowledge: Representation, Discovery, and Assessment Manghi P., Mannocci A., Osborne F., Sacharidis D., Salatino A., Vergoulis T. In this paper we present the 2nd edition of the Scientific Knowledge: Representation, Discovery, and Assessment (Sci-K 2022) workshop. Sci-K aims to explore innovative solutions and ideas for the generation of approaches, data models, and infrastructures (e.g., knowledge graphs) for supporting, directing, monitoring and assessing the scientific knowledge and progress. This edition is also a reflection point as the community is seeking alternative solutions to the now-defunct Microsoft Academic Graph (MAG).Source: WWW 2022 - The ACM Web Conference 2022, pp. 735–738, Lyon, France (Online), 25-29/04/2022 DOI: 10.1145/3487553.3524883 Metrics:
"Knock Knock! Who's There?" A study on scholarly repositories' availability Mannocci A., Baglioni M., Manghi P. Scholarly repositories are the cornerstone of modern open science, and their availability is vital for enacting its practices. To this end, scholarly registries such as FAIRsharing, re3data, OpenDOAR and ROAR give them presence and visibility across different research communities, disciplines, and applications by assigning an identifier and persisting their profiles with summary metadata. Alas, like any other resource available on the Web, scholarly repositories, be they tailored for literature, software or data, are quite dynamic and can be frequently changed, moved, merged or discontinued. Therefore, their references are prone to link rot over time, and their availability often boils down to whether the homepage URLs indicated in authoritative repository profiles within scholarly registries respond or not. For this study, we harvested the content of four prominent scholarly registries and resolved over 13 thousand unique repository URLs. By performing a quantitative analysis on such an extensive collection of repositories, this paper aims to provide a global snapshot of their availability, which bewilderingly is far from granted.Source: TPDL 2022 - 26th International Conference on Theory and Practice of Digital Libraries, pp. 306–312, Padua, Italy, 20-23/09/2022 DOI: 10.1007/978-3-031-16802-4_26 Project(s): OpenAIRE Nexus Metrics:
Open Science and authorship of supplementary material. Evidence from a research community Mannocci A., Irrera O., Manghi P. While, in early science, most of the papers were authored by a handful of scientists, modern science is characterised by more extensive collaborations, and the average number of authors per article has increased across many disciplines (Baethge, 2008; Cronin, 2001; Fernandes & Monteiro, 2017; Frandsen & Nicolaisen, 2010; Wren et al., 2007). Indeed, in some fields of science (e.g., High Energy Physics), it is not infrequent to encounter hundreds or thousands of authors co-participating in the same piece of research. Such intricate collaboration patterns make it difficult to establish a correct relationship between contributor and scientific contribution and hence get an accurate and fair reward during research evaluation (Brand, Allen, Altman, Hlava, & Scott, 2015; Vasilevsky et al., 2021; Vergoulis et al., 2022). Thus, as widely known, scientific authorship tends to be a rather hot-button topic in academia, as roughly one-fifth of academic disputes among authors stem from this (Dance, 2012). Open Science, however, has the potential to disrupt such traditional mechanisms by injecting into the "academic market" new kinds of "currency" for credit attribution, merit and impact assessment (Mooney & Newton, 2012; Silvello, 2018). To this end, the new practices of supplementary research data (and software) deposition and citation could be perceived as an opportunity to diversify the attribution portfolio and eventually give credit to the different contributors involved in the diverse phases of the lifecycle within the same research endeavour (Bierer, Crosas, & Pierce, 2017; Brand et al., 2015). While, on the one hand, it is known that authors' ordering tells little or nothing about authors' roles and contributions (Kosmulski, 2012), on the other hand, we argue that variations of any kind in author sets of paired publications and supplementary material can be indicative. Despite being unclear the actual reason behind such a variation, the presence of a fracture between the publication and research data realms might suggest once more that current practices for research assessment and reward should be revised and updated to capture such peculiarities as well. In (Mannocci, Irrera, & Manghi, 2022), we argue that modern Open Science Graphs (OSGs) can be used to analyse whether this is the case or not and understand if the opportunity has been seized already. By offering extensive metadata descriptions of both literature, research data, software, and their semantic relations, OSGs constitute a fertile ground to analyse this phenomenon computationally and thus analyse the emergence of significant patterns. As a preliminary study, in this paper, we conduct a focused analysis on a subset of publications with supplementary material drawn from the European Marine Science3 (MES) research community. The results are promising and suggest our hypothesis is worth exploring further. Indeed, in 702 cases out of 3,075 (22.83%), there are substantial variations between the authors participating in the publication and the authors participating in the supplementary dataset (or software), thus posing the premises for a longitudinal, large-scale analysis of the phenomenon.Source: STI 2022 - 26th International Conference on Science, Technology and Innovation Indicators, Granada, Spain, 7-9/09/2022 DOI: 10.5281/zenodo.6975411 Project(s): OpenAIRE Nexus Metrics:
Data model description of the OpenAIRE Research Graph La Bruzzo S. F., Artini M., Atzori C., Bardi A., Baglioni M., De Bonis M., Mannocci A., Manghi P., Pavone G. The OpenAIRE Graph (formerly known as the OpenAIRE Research Graph) is one of the largest open scholarly record collections worldwide, key to fostering Open Science and establishing its practices in daily research activities. Conceived as a public and transparent good, populated out of data sources trusted by scientists, the Graph aims at bringing discovery, monitoring, and assessment of science back into the hands of the scientific community.
Imagine a vast collection of research products all linked together, contextualized, and openly available. For the past years, OpenAIRE has been working to gather this valuable record. It is a massive collection of metadata and links between scientific products such as articles, datasets, software, and other research products, entities like organizations, funders, funding streams, projects, communities, and data sources. This technical Report describes the public data model adopted by the OpenAIRE Graph.Source: ISTI Technical Report, ISTI-2022-TR/031, 2022 DOI: 10.32079/isti-tr-2022/031 Metrics:
OpenAIRE Research Graph: aggregation workflow La Bruzzo S. F., Artini M., Atzori C., Bardi A., Baglioni M., De Bonis M., Dell'Amico A., Mannocci A., Manghi P., Pavone G. The OpenAIRE Graph (formerly the OpenAIRE Research Graph) is one of the largest open scholarly record collections worldwide. It is key in fostering Open Science and establishing its practices in daily research activities. Conceived as a public and transparent good, populated out of data sources trusted by scientists, the Graph aims at bringing discovery, monitoring, and assessment of science back into the hands of the scientific community. OpenAIRE collects metadata records from more than 70K scholarly communication sources worldwide, including Open Access institutional repositories, data archives, and journals. All the metadata records (i.e., descriptions of research products) are put together in a data lake with records from Crossref, Unpaywall, ORCID, ROR, and information about projects provided by national and international funders. This technical Report describes the main Aggregation Workflow to orchestrate the data aggregation and the implemented mapping from some of the main datasources into the OpenAIRE research graph data model.Source: ISTI Technical Report, ISTI-2022-TR/033, 2022 DOI: 10.32079/isti-tr-2022/033 Project(s): OpenAIRE-Advance , OpenAIRE Nexus Metrics:
OpenAIRE Research Graph deduplication workflow La Bruzzo S. F., Artini M., Atzori C., Bardi A., Baglioni M., De Bonis M., Mannocci A., Manghi P., Pavone G. The OpenAIRE aggregation workflow can collect metadata records from different providers about the same scholarly work. Each metadata record can carry different information because, for example, some providers are not aware of links to projects, keywords, or other details. Another typical case is when OpenAIRE collects one metadata record from a repository about a pre-print and another from a journal about the published article. To provide correct statistics, OpenAIRE must identify those cases and "merge" the two metadata records so that the scholarly work is counted only once in the statistics OpenAIRE produces. This technical Report describes the Deduplication workflow and technique adopted to deduplicate the OpenAIRE Graph.Source: ISTI Technical Report, ISTI-2022-TR/032, 2022 DOI: 10.32079/isti-tr-2022/032 Project(s): OpenAIRE-Connect , OpenAIRE Nexus Metrics:
InfraScience research activity report 2022 Artini M., Assante M., Atzori C., Baglioni M., Bardi A., Bove P., Candela L., Casini G., Castelli D., Cirillo R., Coro G., De Bonis M., Debole F., Dell'Amico A., Frosini L., La Bruzzo S., Lelii L., Manghi P., Mangiacrapa F., Mangione D., Mannocci A., Ottonello E., Pagano P., Panichi G., Pavone G., Piccioli T., Sinibaldi F., Straccia U., Zoppi F. InfraScience is a research group of the National Research Council of Italy - Institute of Information Science and Technologies (CNR - ISTI) based in Pisa, Italy. This report documents the research activity performed by this group in 2022 to highlight the major results. In particular, the InfraScience group confronted with research challenges characterising Data Infrastructures, e-Science, and Intelligent Systems. The group activity is pursued by closely connecting research and development and by promoting and supporting open science. In fact, the group is leading the development of two large scale infrastructures for Open Science, i.e. D4Science and OpenAIRE. During 2022 InfraScience members contributed to the publishing of several papers, to the research and development activities of 18 research projects (15 funded by EU), to the organization of conferences and training events, to several working groups and task forces.Source: ISTI Annual reports, 2022 Project(s): ARIADNEplus , Blue Cloud , EOSC-Pillar , DESIRA , EOSC Future , RISIS 2 , TAILOR , SoBigData-PlusPlus
Reflections on the misuses of ORCID iDs Baglioni M., Mannocci A., Manghi P., Atzori C., Bardi A., La Bruzzo S. Since 2012, the "Open Researcher and Contributor Identification Initiative" (ORCID) has been successfully running a worldwide registry, with the aim of unequivocally pinpoint researchers and the body of knowledge they contributed to. In practice, ORCID clients, e.g., publishers, repositories, and CRIS systems, make sure their metadata can refer to iDs in the ORCID registry to associate authors and their work unambiguously. However, the ORCID infrastructure still suffers from several "service misuses", which put at risk its very mission and should be therefore identified and tackled. In this paper, we classify and qualitatively document such misuses, occurring from both users (researchers and organisations) of the ORCID registry and the ORCID clients. We conclude providing an outlook and a few recommendations aiming at improving the exploitation of the ORCID infrastructure.Source: IRCDL 2021 - 17th Italian Research Conference on Digital Libraries, pp. 117–125, Online conference, 18-19/02/2021 Project(s): OpenAIRE-Advance
BIP! DB: a dataset of impact measures for scientific publications Vergoulis T., Kanellos I., Atzori C., Mannocci A., Chatzopoulos S., La Bruzzo S., Manola N., Manghi P. The growth rate of the number of scientific publications is constantly increasing, creating important challenges in the identification of valuable research and in various scholarly data management applications, in general. In this context, measures which can effectively quantify the scientific impact could be invaluable. In this work, we present BIP! DB, an open dataset that contains a variety of impact measures calculated for a large collection of more than 100 million scientific publications from various disciplines.Source: WWW 2021 - Companion of the World Wide Web Conference, pp. 456–460, Online conference, 13/04/2021 DOI: 10.1145/3442442.3451369 DOI: 10.48550/arxiv.2101.12001 Project(s): OpenAIRE-Advance , OpenAIRE Nexus Metrics:
OpenAIRE research graph: dumps for research communities and initiatives Manghi P., Atzori C., Bardi A., Baglioni M., Schirrwagen J., Dimitropoulos H., La Bruzzo S., Foufoulas I., Lohden A., Backer A., Mannocci A., Horst M., Czerniak A., Kiatropoulou K., Kokogiannaki A., De Bonis M., Artini M., Ottonello E., Lempesis A., Ioannidis A., Summan F. This dataset contains dumps of the OpenAIRE Research Graph containing metadata records relevant for the research communities and initiatives collaborating with OpenAIRE. Each dataset is a tar file containing gzip files with one json per line. Each json is compliant to the schema available at DOI: 10.5281/zenodo.3974226DOI: 10.5281/zenodo.3974604 Project(s): RISIS 2 , BE OPEN , OpenAIRE-Advance Metrics:
InfraScience Research Activity Report 2020 Artini M., Assante M., Atzori C., Baglioni M., Bardi A., Candela L., Casini G., Castelli D., Cirillo R., Coro G., Debole F., Dell'Amico A., Frosini L., La Bruzzo S., Lazzeri E., Lelii L., Manghi P., Mangiacrapa F., Mannocci A., Pagano P., Panichi G., Piccioli T., Sinibaldi F., Straccia U. InfraScience is a research group of the National Research Council of Italy - Institute of Information Science and Technologies (CNR - ISTI) based in Pisa, Italy. This report documents the research activity performed by this group in 2020 to highlight the major results. In particular, the InfraScience group confronted with research challenges characterising Data Infrastructures, e\-Sci\-ence, and Intelligent Systems. The group activity is pursued by closely connecting research and development and by promoting and supporting open science. In fact, the group is leading the development of two large scale infrastructures for Open Science, \ie D4Science and OpenAIRE. During 2020 InfraScience members contributed to the publishing of 30 papers, to the research and development activities of 12 research projects (11 funded by EU), to the organization of conferences and training events, to several working groups and task forces.Source: ISTI Annual Report, ISTI-2021-AR/002, pp.1–20, 2021 DOI: 10.32079/isti-ar-2021/002 Project(s): ARIADNEplus , Blue Cloud , PerformFISH , EOSC-Pillar , DESIRA , EOSCsecretariat.eu , RISIS 2 , TAILOR , I-GENE , MOVING , OpenAIRE-Advance , SoBigData-PlusPlus Metrics:
We can make a better use of ORCID: five observed misapplications Baglioni M., Manghi P., Mannocci A., Bardi A. Since 2012, the "Open Researcher and Contributor ID" organisation (ORCID) has been successfully running a worldwide registry, with the aim of "providing a unique, persistent identifier for individuals to use as they engage in research, scholarship, and innovation activities". Any service in the scholarly communication ecosystem (e.g., publishers, repositories, CRIS systems, etc.) can contribute to a non-ambiguous scholarly record by including, during metadata deposition, referrals to iDs in the ORCID registry.
The OpenAIRE Research Graph is a scholarly knowledge graph that aggregates both records from the ORCID registry and publication records with ORCID referrals from publishers and repositories worldwide to yield research impact monitoring and Open Science statistics. Graph data analytics revealed "anomalies" due to ORCID registry "misapplications", caused by wrong ORCID referrals and misexploitation of the ORCID registry. Albeit these affect just a minority of ORCID records, they inevitably affect the quality of the ORCID infrastructure and may fuel the rise of detractors and scepticism about the service.
In this paper, we classify and qualitatively document such misapplications, identifying five ORCID registrant-related and ORCID referral-related anomalies to raise awareness among ORCID users. We describe the current countermeasures taken by ORCID and, where applicable, provide recommendations. Finally, we elaborate on the importance of a community-steered Open Science infrastructure and the benefits this approach has brought and may bring to ORCID.Source: Data science journal 20 (2021): 1–12. doi:10.5334/dsj- 2021-038 DOI: 10.5334/dsj-2021-038 Project(s): OpenAIRE-Connect Metrics:
Detection, analysis, and prediction of research topics with scientific knowledge graphs Salatino A., Mannocci A., Osborne F. Analysing research trends and predicting their impact on academia and industry is crucial to gain a deeper understanding of the advances in a research field and to inform critical decisions about research funding and technology adoption. In the last years, we saw the emergence of several publicly-available and large-scale Scientific Knowledge Graphs fostering the development of many data-driven approaches for performing quantitative analyses of research trends. This chapter presents an innovative framework for detecting, analysing, and forecasting research topics based on a large-scale knowledge graph characterising research articles according to the research topics from the Computer Science Ontology. We discuss the advantages of a solution based on a formal representation of topics and describe how it was applied to produce bibliometric studies and innovative tools for analysing and predicting research dynamics.Source: Predicting the Dynamics of Research Impact, edited by Manolopoulos Y., Vergoulis T., pp. 225–252, 2021 DOI: 10.1007/978-3-030-86668-6_11 Metrics: