61 result(s)
Page Size: 10, 20, 50
Export: bibtex, xml, json, csv
Order by:

CNR Author operator: and / or
more
Typology operator: and / or
Language operator: and / or
Date operator: and / or
more
Rights operator: and / or
2025 Other Restricted
InfraScience research activity report 2024
Angioni S., Artini M., Assante M., Atzori C., Baglioni M., Bardi A., Bosio C., Bove P., Calanducci A., Candela L., Casini G., Castelli D., Cirillo R., Coro G., De Bonis M., Debole F., Dell'Amico A., Frosini L., Ibrahim Ahmed, La Bruzzo S., Lelii L., Manghi P., Mangiacrapa F., Mangione D., Mannocci A., Molinaro E., Oliviero A., Pagano P., Panichi G., Teresa M. T., Pavone G., Peccerillo B., Piccioli T., Procaccini M., Straccia U., Vannini G. L., Versienti L.
InfraScience is a research group within the Institute of Information Science and Technologies (ISTI) of the National Research Council of Italy (CNR), based in Pisa. This activity report outlines the group's research achievements and initiatives throughout 2024. InfraScience focused its efforts on key challenges in the areas of Data Infrastructures, e-Science, and Intelligent Systems, maintaining a strong synergy between research and development and a firm commitment to open science principles. In 2024, the group played a leading role in the development and evolution of two major Open Science infrastructures: D4Science and OpenAIRE. InfraScience researchers contributed significantly to the scientific community through the publication of peer-reviewed papers, active participation in EU-funded research projects, organization of international conferences and training activities, and engagement in various working groups and task forces. This report highlights these contributions and underscores the group's ongoing dedication to advancing open, collaborative, and impactful science.DOI: 10.32079/isti-ar-2025/001
Metrics:


See at: CNR IRIS Restricted | CNR IRIS Restricted | CNR IRIS Restricted


2024 Conference article Open Access OPEN
FDup framework: a general-purpose solution for efficient entity deduplication of record collections
De Bonis M., Atzori C., La Bruzzo S., Manghi P.
Deduplication is a technique aimed at identifying and resolving duplicate metadata records in a collection with a special focus on the performances of the approach. This paper describes FDup(Flat Collections Deduper), a general-purpose software framework supporting a complete deduplication workflow to manage big data record collections: metadata record data model definition, identification of candidate duplicates, identification of duplicates. FDup brings two main innovations: first, it delivers a full deduplication framework in a single easy-to-use software package based on Apache Spark Hadoop framework, where developers can customize the optimal and parallel workflow steps of blocking, sliding windows, and similarity matching function via an intuitive configuration file; second, it introduces a novel approach to improve performance, beyond the known techniques of “blocking” and “sliding window”, by introducing a smart similarity-matching function T-match. T-match is engineered as a decision tree that drives the comparisons of the fields of two records as branches of predicates and allows for successful or unsuccessful early exit strategies. The efficacy of the approach is proved by experiments performed over big data collections of metadata records in the OpenAIRE Graph, a known open-access knowledge base in Scholarly communication.Source: CEUR WORKSHOP PROCEEDINGS, vol. 3741, pp. 624-632. Villasimius, Italy, 23-26/06/2024
Project(s): FAIRCORE4EOSC via OpenAIRE

See at: ceur-ws.org Open Access | CNR IRIS Open Access | CNR IRIS Restricted


2024 Dataset Open Access OPEN
OpenAIRE Graph Dataset v8.0.0 (July 2024)
Manghi P., Atzori C., Bardi A., Baglioni M., Dimitropoulos H., La Bruzzo S., Foufoulas I., Mannocci A., Horst M., Iatropoulou K., Kokogiannaki A., De Bonis M., Artini M., Lempesis A., Ioannidis A., Manola N., Principe P., Vergoulis T., Chatzopoulos S.
The OpenAIRE Graph is a large and rich collection of open and linked scholarly records from trusted data sources, such as journals, repositories, and registries. It aims to foster Open Science practices and enable the scientific community to discover, monitor, and evaluate science. The Graph is cleaned, deduplicated, enriched, and full-text mined to generate statistics and insights. The Graph is accessible via various services, such as OpenAIRE MONITOR, EXPLORE, ScholeXplorer (Scholix API for the retrieval of literature-data links), search APIs and snapshots in json format updated every six months. The Graph data are openly available with CC-BY license for third-parties to reuse and create added value services. The documentation is available at: https://graph.openaire.euDOI: 10.5281/zenodo.12819872
Project(s): FAIRCORE4EOSC via OpenAIRE, SciLake via OpenAIRE, EOSC Beyond via OpenAIRE, GraspOS via OpenAIRE, OSTrails via OpenAIRE
Metrics:


See at: CNR IRIS Open Access | zenodo.org Open Access | CNR IRIS Restricted


2023 Other Open Access OPEN
InfraScience research activity report 2023
Artini M., Assante M., Atzori C., Baglioni M., Bardi A., Bosio C., Bove P., Calanducci A., Candela L., Casini G., Castelli D., Cirillo R., Coro G., De Bonis M., Debole F., Dell'Amico A., Frosini L., Ibrahim A. S. T., La Bruzzo S., Lelii L., Manghi P., Mangiacrapa F., Mangione D., Mannocci A., Molinaro E., Pagano P., Panichi G., Paratore M. T., Pavone G., Piccioli T., Sinibaldi F., Straccia U., Vannini G. L.
InfraScience is a research group of the National Research Council of Italy - Institute of Information Science and Technologies (CNR - ISTI) based in Pisa, Italy. This report documents the research activity performed by this group in 2023 to highlight the major results. In particular, the InfraScience group engaged in research challenges characterising Data Infrastructures, e-Science, and Intelligent Systems. The group activity is pursued by closely connecting research and development and by promoting and supporting open science. In fact, the group is leading the development of two large scale infrastructures for Open Science, i.e. D4Science and OpenAIRE. During 2023 InfraScience members contributed to the publishing of several papers, to the research and development activities of several research projects (primarily funded by EU), to the organization of conferences and training events, to several working groups and task forces.DOI: 10.32079/isti-ar-2023/002
Project(s): Blue Cloud via OpenAIRE, EOSC Future via OpenAIRE, TAILOR via OpenAIRE
Metrics:


See at: CNR IRIS Open Access | CNR IRIS Restricted


2022 Other Open Access OPEN
Open Science repository platforms
Manghi P, Artini M, La Bruzzo S, Ottonello E, Pavone G
Institutional and thematic repositories today play a key role in scholarly communication and more broadly in scientific workflows. Many institutions and communities have set the ambitious goal of providing an open access repository for their community of users. However, given the amount of expectations from their users, choosing the right solution is often a non-trivial choice. Some platforms may be served out-of-the-box, to be put in operation after straightforward configurations, but are in general less customizable to adhere to specific functional, non-functional, or contextual needs. Other platforms may be instead extremely customizable and flexible but require skilled personnel for their adaptation and deployment. This report performs an analysis of existing state-of-the-art Open Source repository solutions from the functional, operational, and software perspectives. As a result of the analysis, it will factor out the pros and cons of such solutions and identify typical scenarios of adoption.DOI: 10.32079/isti-tr-2022/009
Project(s): OpenAIRE Nexus via OpenAIRE
Metrics:


See at: CNR IRIS Open Access | ISTI Repository Open Access | CNR IRIS Restricted


2022 Other Open Access OPEN
Bioschemas data sources aggregation to OpenAIRE Research Graph
Ottonello E, Artini M, La Bruzzo S, Pavone G
In this report we propose an extended Hadoop-based aggregator for the harvesting of Bioschemas data sources. In this extended hadoop-based aggregator, the downloaded data will be processed according to the consolidated data flow: the original contents will be mapped onto an internal representation that will make them eligible to be integrated in the OpenAIRE research graph.DOI: 10.32079/isti-tr-2022/010
Project(s): EOSC Future via OpenAIRE, OpenAIRE Nexus via OpenAIRE
Metrics:


See at: CNR IRIS Open Access | ISTI Repository Open Access | CNR IRIS Restricted


2022 Other Open Access OPEN
InfraScience research activity report 2021
Artini M, Assante M, Atzori C, Baglioni M, Bardi A, Bove P, Candela L, Casini G, Castelli D, Cirillo R, Coro G, De Bonis M, Debole F, Dell'Amico A, Frosini L, La Bruzzo S, Lazzeri E, Lelii L, Manghi P, Mangiacrapa F, Mangione D, Mannocci A, Ottonello E, Pagano P, Panichi G, Pavone G, Piccioli T, Sinibaldi F, Straccia U
InfraScience is a research group of the National Research Council of Italy - Institute of Information Science and Technologies (CNR - ISTI) based in Pisa, Italy. This report documents the research activity performed by this group in 2021 to highlight the major results. In particular, the InfraScience group confronted with research challenges characterising Data Infrastructures, eScience, and Intelligent Systems. The group activity is pursued by closely connecting research and development and by promoting and supporting open science. In fact, the group is leading the development of two large scale infrastructures for Open Science, i.e. D4Science and OpenAIRE. During 2021 InfraScience members contributed to the publishing of 25 papers, to the research and development activities of 18 research projects (15 funded by EU), to the organization of conferences and training events, to several working groups and task forces.DOI: 10.32079/isti-ar-2022/001
Project(s): ARIADNEplus via OpenAIRE, Blue Cloud via OpenAIRE, PerformFISH via OpenAIRE, EOSC-Pillar via OpenAIRE, DESIRA via OpenAIRE, EOSC Future via OpenAIRE, EOSCsecretariat.eu via OpenAIRE, EcoScope via OpenAIRE, RISIS 2 via OpenAIRE, OpenAIRE-Advance via OpenAIRE, OpenAIRE Nexus via OpenAIRE, SoBigData-PlusPlus via OpenAIRE
Metrics:


See at: CNR IRIS Open Access | ISTI Repository Open Access | CNR IRIS Restricted


2022 Software Metadata Only Access
dnet-dedup framework
Artini M., Atzori C., Bardi A., Baglioni M., De Bonis M., Dell'Amico A., La Bruzzo S. F., Mannocci A., Manghi P.
The GDup Software enables an integrated, scalable, general-purpose system for entity deduplication over big information graphs. GDup supports practitioners with the functionalities needed to realize a fully-fledged entity deduplication workflow over a generic input graph, including Ground Truth support, end-user feedback, and strategies for identifying and merging duplicates to obtain an output disambiguated graph. GDup is today one of the core components of the OpenAIRE infrastructure production system, monitoring Open Science trends on behalf of the European Commission.Project(s): OpenAIRE-Advance via OpenAIRE, OpenAIRE Nexus via OpenAIRE

See at: github.com Restricted | CNR IRIS Restricted


2022 Software Metadata Only Access
Scholexplorer-API
La Bruzzo Sf
The Scholix API allows clients to run REST queries over the Scholexplorer index in order to fetch links matching given criteria. In the current version, clients can search for: Links whose source object has a given PID or PID type; Links whose source object has been published by a given data source ("data source as publisher") Links that were collected from a given data source ("data source as provider").Project(s): OpenAIRE-Advance via OpenAIRE, OpenAIRE Nexus via OpenAIRE

See at: github.com Restricted | CNR IRIS Restricted


2022 Other Open Access OPEN
Data model description of the OpenAIRE Research Graph
La Bruzzo Sf, Artini M, Atzori C, Bardi A, Baglioni M, De Bonis M, Mannocci A, Manghi P, Pavone G
The OpenAIRE Graph (formerly known as the OpenAIRE Research Graph) is one of the largest open scholarly record collections worldwide, key to fostering Open Science and establishing its practices in daily research activities. Conceived as a public and transparent good, populated out of data sources trusted by scientists, the Graph aims at bringing discovery, monitoring, and assessment of science back into the hands of the scientific community. Imagine a vast collection of research products all linked together, contextualized, and openly available. For the past years, OpenAIRE has been working to gather this valuable record. It is a massive collection of metadata and links between scientific products such as articles, datasets, software, and other research products, entities like organizations, funders, funding streams, projects, communities, and data sources. This technical Report describes the public data model adopted by the OpenAIRE Graph.DOI: 10.32079/isti-tr-2022/031
Metrics:


See at: CNR IRIS Open Access | ISTI Repository Open Access | CNR IRIS Restricted


2022 Other Open Access OPEN
OpenAIRE Research Graph: aggregation workflow
La Bruzzo Sf, Artini M, Atzori C, Bardi A, Baglioni M, De Bonis M, Dell'Amico A, Mannocci A, Manghi P, Pavone G
The OpenAIRE Graph (formerly the OpenAIRE Research Graph) is one of the largest open scholarly record collections worldwide. It is key in fostering Open Science and establishing its practices in daily research activities. Conceived as a public and transparent good, populated out of data sources trusted by scientists, the Graph aims at bringing discovery, monitoring, and assessment of science back into the hands of the scientific community. OpenAIRE collects metadata records from more than 70K scholarly communication sources worldwide, including Open Access institutional repositories, data archives, and journals. All the metadata records (i.e., descriptions of research products) are put together in a data lake with records from Crossref, Unpaywall, ORCID, ROR, and information about projects provided by national and international funders. This technical Report describes the main Aggregation Workflow to orchestrate the data aggregation and the implemented mapping from some of the main datasources into the OpenAIRE research graph data model.DOI: 10.32079/isti-tr-2022/033
Project(s): OpenAIRE-Advance via OpenAIRE, OpenAIRE Nexus via OpenAIRE
Metrics:


See at: CNR IRIS Open Access | ISTI Repository Open Access | CNR IRIS Restricted


2022 Other Open Access OPEN
OpenAIRE Research Graph deduplication workflow
La Bruzzo Sf, Artini M, Atzori C, Bardi A, Baglioni M, De Bonis M, Mannocci A, Manghi P, Pavone G
The OpenAIRE aggregation workflow can collect metadata records from different providers about the same scholarly work. Each metadata record can carry different information because, for example, some providers are not aware of links to projects, keywords, or other details. Another typical case is when OpenAIRE collects one metadata record from a repository about a pre-print and another from a journal about the published article. To provide correct statistics, OpenAIRE must identify those cases and "merge" the two metadata records so that the scholarly work is counted only once in the statistics OpenAIRE produces. This technical Report describes the Deduplication workflow and technique adopted to deduplicate the OpenAIRE Graph.DOI: 10.32079/isti-tr-2022/032
Project(s): OpenAIRE-Connect via OpenAIRE, OpenAIRE Nexus via OpenAIRE
Metrics:


See at: CNR IRIS Open Access | ISTI Repository Open Access | CNR IRIS Restricted


2022 Other Open Access OPEN
OpenOrgs: a tool for the disambiguation of organizations
Artini M, La Bruzzo Sf, De Bonis M, Pavone G
Organizations appear all over the Research & Innovation ecosystem in different shapes and formats: the same organization may appear with different metadata fields, different names - e.g., full legal name, short or alternative names, acronym. The ambiguity of organizations results in a huge deficiency in the exchange of information, the findability of research products, the monitoring of activities, and ultimately building a linked open scholarly communication system. OpenOrgs combines an automated process and human curation to compensate for the lack of information available and improve the organization's discoverability.DOI: 10.32079/isti-tr-2022/034
Project(s): OpenAIRE-Advance via OpenAIRE, OpenAIRE Nexus via OpenAIRE
Metrics:


See at: CNR IRIS Open Access | ISTI Repository Open Access | CNR IRIS Restricted


2022 Other Open Access OPEN
Scholexplorer activity report 2022
La Bruzzo Sf, Manghi P
Scholexplorer is a service that accepts publications-data or data-data links from validated sources, builds a de-duplicated graph and provides access to it. ScholExplorer is an implementation of the Scholix initiative (an RDA and WDS). This document is a report on the Scholexplorer installations operation activity after two years of operation, including a detailed set of indicators.DOI: 10.32079/isti-tr-2022/035
Project(s): OpenAIRE Nexus via OpenAIRE
Metrics:


See at: CNR IRIS Open Access | ISTI Repository Open Access | CNR IRIS Restricted


2022 Other Open Access OPEN
InfraScience research activity report 2022
Artini M, Assante M, Atzori C, Baglioni M, Bardi A, Bove P, Candela L, Casini G, Castelli D, Cirillo R, Coro G, De Bonis M, Debole F, Dell'Amico A, Frosini L, La Bruzzo S, Lelii L, Manghi P, Mangiacrapa F, Mangione D, Mannocci A, Ottonello E, Pagano P, Panichi G, Pavone G, Piccioli T, Sinibaldi F, Straccia U, Zoppi F
InfraScience is a research group of the National Research Council of Italy - Institute of Information Science and Technologies (CNR - ISTI) based in Pisa, Italy. This report documents the research activity performed by this group in 2022 to highlight the major results. In particular, the InfraScience group confronted with research challenges characterising Data Infrastructures, e-Science, and Intelligent Systems. The group activity is pursued by closely connecting research and development and by promoting and supporting open science. In fact, the group is leading the development of two large scale infrastructures for Open Science, i.e. D4Science and OpenAIRE. During 2022 InfraScience members contributed to the publishing of several papers, to the research and development activities of 18 research projects (15 funded by EU), to the organization of conferences and training events, to several working groups and task forces.DOI: 10.32079/isti-ar-2022/004
Project(s): ARIADNEplus via OpenAIRE, Blue Cloud via OpenAIRE, EOSC-Pillar via OpenAIRE, DESIRA via OpenAIRE, EOSC Future via OpenAIRE, RISIS 2 via OpenAIRE, TAILOR via OpenAIRE, SoBigData-PlusPlus via OpenAIRE
Metrics:


See at: CNR IRIS Open Access | ISTI Repository Open Access | CNR IRIS Restricted


2021 Conference article Open Access OPEN
Reflections on the misuses of ORCID iDs
Baglioni M, Mannocci A, Manghi P, Atzori C, Bardi A, La Bruzzo S
Since 2012, the "Open Researcher and Contributor Identification Initiative" (ORCID) has been successfully running a worldwide registry, with the aim of unequivocally pinpoint researchers and the body of knowledge they contributed to. In practice, ORCID clients, e.g., publishers, repositories, and CRIS systems, make sure their metadata can refer to iDs in the ORCID registry to associate authors and their work unambiguously. However, the ORCID infrastructure still suffers from several "service misuses", which put at risk its very mission and should be therefore identified and tackled. In this paper, we classify and qualitatively document such misuses, occurring from both users (researchers and organisations) of the ORCID registry and the ORCID clients. We conclude providing an outlook and a few recommendations aiming at improving the exploitation of the ORCID infrastructure.Source: CEUR WORKSHOP PROCEEDINGS, pp. 117-125. Online conference, 18-19/02/2021
Project(s): OpenAIRE-Advance via OpenAIRE

See at: ceur-ws.org Open Access | CNR IRIS Open Access | ISTI Repository Open Access | CNR IRIS Restricted


2021 Conference article Open Access OPEN
BIP! DB: a dataset of impact measures for scientific publications
Vergoulis T, Kanellos I, Atzori C, Mannocci A, Chatzopoulos S, La Bruzzo S, Manola N, Manghi P
The growth rate of the number of scientific publications is constantly increasing, creating important challenges in the identification of valuable research and in various scholarly data management applications, in general. In this context, measures which can effectively quantify the scientific impact could be invaluable. In this work, we present BIP! DB, an open dataset that contains a variety of impact measures calculated for a large collection of more than 100 million scientific publications from various disciplines.DOI: 10.1145/3442442.3451369
DOI: 10.48550/arxiv.2101.12001
Project(s): OpenAIRE-Advance via OpenAIRE, OpenAIRE Nexus via OpenAIRE
Metrics:


See at: arXiv.org e-Print Archive Open Access | arxiv.org Open Access | dl.acm.org Open Access | CNR IRIS Open Access | ISTI Repository Open Access | doi.org Restricted | doi.org Restricted | CNR IRIS Restricted | CNR IRIS Restricted


2021 Dataset Metadata Only Access
OpenAIRE research graph: dumps for research communities and initiatives
Manghi P, Atzori C, Bardi A, Baglioni M, Schirrwagen J, Dimitropoulos H, La Bruzzo S, Foufoulas I, Lohden A, Backer A, Mannocci A, Horst M, Czerniak A, Kiatropoulou K, Kokogiannaki A, De Bonis M, Artini M, Ottonello E, Lempesis A, Ioannidis A, Summan F
This dataset contains dumps of the OpenAIRE Research Graph containing metadata records relevant for the research communities and initiatives collaborating with OpenAIRE. Each dataset is a tar file containing gzip files with one json per line. Each json is compliant to the schema available at DOI: 10.5281/zenodo.3974226DOI: 10.5281/zenodo.3974604
Project(s): RISIS 2 via OpenAIRE, BE OPEN via OpenAIRE, OpenAIRE-Advance via OpenAIRE
Metrics:


See at: CNR IRIS Restricted


2021 Dataset Open Access OPEN
OpenAIRE Covid-19 publications, datasets, software and projects metadata
Bardi A., Kuchma I., Pavone G., Artini M., Atzori C., Backer A., Baglioni M., Czerniak A., De Bonis M., Dimitropoulos H., Foufoulas I., Horst M., Iatropoulou K., Jacewicz P., Kokogiannaki A., La Bruzzo S., Lazzeri E., Lohden A., Manghi P., Mannocci A., Manola N., Ottonello E., Schirrwagen J.
This dump provides access to the metadata records of publications, research data, software and projects that may be relevant to the Corona Virus Disease (COVID-19) fight. The dump contains records of the OpenAIRE COVID-19 Gateway (https://covid-19.openaire.eu/), identified via full-text mining and inference techniques applied to the OpenAIRE Research Graph (https://explore.openaire.eu/). The Graph is one of the largest Open Access collections of metadata records and links between publications, datasets, software, projects, funders, and organizations, aggregating 12,000+ scientific data sources world-wide, among which the Covid-19 data sources Zenodo COVID-19 Community, WHO (World Health Organization), BIP! FInder for COVID-19, Protein Data Bank, Dimensions, scienceOpen, and RSNA.The dump consists of a gzip file containing one json per line. Each json is compliant to the schema available at https://doi.org/10.5281/zenodo.3974226DOI: 10.5281/zenodo.3980490
Project(s): OpenAIRE-Advance via OpenAIRE
Metrics:


See at: CNR IRIS Open Access | CNR IRIS Restricted


2021 Other Open Access OPEN
InfraScience Research Activity Report 2020
Artini M, Assante M, Atzori C, Baglioni M, Bardi A, Candela L, Casini G, Castelli D, Cirillo R, Coro G, Debole F, Dell'Amico A, Frosini L, La Bruzzo S, Lazzeri E, Lelii L, Manghi P, Mangiacrapa F, Mannocci A, Pagano P, Panichi G, Piccioli T, Sinibaldi F, Straccia U
InfraScience is a research group of the National Research Council of Italy - Institute of Information Science and Technologies (CNR - ISTI) based in Pisa, Italy. This report documents the research activity performed by this group in 2020 to highlight the major results. In particular, the InfraScience group confronted with research challenges characterising Data Infrastructures, e\-Sci\-ence, and Intelligent Systems. The group activity is pursued by closely connecting research and development and by promoting and supporting open science. In fact, the group is leading the development of two large scale infrastructures for Open Science, \ie D4Science and OpenAIRE. During 2020 InfraScience members contributed to the publishing of 30 papers, to the research and development activities of 12 research projects (11 funded by EU), to the organization of conferences and training events, to several working groups and task forces.DOI: 10.32079/isti-ar-2021/002
Project(s): ARIADNEplus via OpenAIRE, Blue Cloud via OpenAIRE, PerformFISH via OpenAIRE, EOSC-Pillar via OpenAIRE, DESIRA via OpenAIRE, EOSCsecretariat.eu via OpenAIRE, RISIS 2 via OpenAIRE, TAILOR via OpenAIRE, I-GENE via OpenAIRE, MOVING via OpenAIRE, OpenAIRE-Advance via OpenAIRE, SoBigData-PlusPlus via OpenAIRE
Metrics:


See at: CNR IRIS Open Access | ISTI Repository Open Access | CNR IRIS Restricted