24 result(s)
Page Size: 10, 20, 50
Export: bibtex, xml, json, csv
Order by:

CNR Author operator: and / or
more
Typology operator: and / or
Language operator: and / or
Date operator: and / or
Rights operator: and / or
2020 Journal article Open Access OPEN
Entity deduplication in big data graphs for scholarly communication
Manghi P., Atzori C., De Bonis M., Bardi A.
Purpose: Several online services offer functionalities to access information from "big research graphs" (e.g. Google Scholar, OpenAIRE, Microsoft Academic Graph), which correlate scholarly/scientific communication entities such as publications, authors, datasets, organizations, projects, funders, etc. Depending on the target users, access can vary from search and browse content to the consumption of statistics for monitoring and provision of feedback. Such graphs are populated over time as aggregations of multiple sources and therefore suffer from major entity-duplication problems. Although deduplication of graphs is a known and actual problem, existing solutions are dedicated to specific scenarios, operate on flat collections, local topology-drive challenges and cannot therefore be re-used in other contexts. Design/methodology/approach: This work presents GDup, an integrated, scalable, general-purpose system that can be customized to address deduplication over arbitrary large information graphs. The paper presents its high-level architecture, its implementation as a service used within the OpenAIRE infrastructure system and reports numbers of real-case experiments. Findings: GDup provides the functionalities required to deliver a fully-fledged entity deduplication workflow over a generic input graph. The system offers out-of-the-box Ground Truth management, acquisition of feedback from data curators and algorithms for identifying and merging duplicates, to obtain an output disambiguated graph. Originality/value: To our knowledge GDup is the only system in the literature that offers an integrated and general-purpose solution for the deduplication graphs, while targeting big data scalability issues. GDup is today one of the key modules of the OpenAIRE infrastructure production system, which monitors Open Science trends on behalf of the European Commission, National funders and institutions.Source: Data technologies and applications 54 (2020): 409–435. doi:10.1108/DTA-09-2019-0163
DOI: 10.1108/dta-09-2019-0163
Project(s): OpenAIRE2020 via OpenAIRE, OpenAIRE-Advance via OpenAIRE
Metrics:


See at: Data Technologies and Applications Open Access | ISTI Repository Open Access | www.emerald.com Open Access | Data Technologies and Applications Open Access | CNR ExploRA


2022 Conference article Open Access OPEN
Towards unsupervised machine learning approaches for knowledge graphs
Minutella F., Falchi F., Manghi P., De Bonis M., Messina N.
Nowadays, a lot of data is in the form of Knowledge Graphs aiming at representing information as a set of nodes and relationships between them. This paper proposes an efficient framework to create informative embeddings for node classification on large knowledge graphs. Such embeddings capture how a particular node of the graph interacts with his neighborhood and indicate if it is either isolated or part of a bigger clique. Since a homogeneous graph is necessary to perform this kind of analysis, the framework exploits the metapath approach to split the heterogeneous graph into multiple homogeneous graphs. The proposed pipeline includes an unsupervised attentive neural network to merge different metapaths and produce node embeddings suitable for classification. Preliminary experiments on the IMDb dataset demonstrate the validity of the proposed approach, which can defeat current state-of-the-art unsupervised methods.Source: IRCDL 2022 - 18th Italian Research Conference on Digital Libraries, Padua, Italy, 24-25/02/2022
Project(s): OpenAIRE Nexus via OpenAIRE

See at: ceur-ws.org Open Access | ISTI Repository Open Access | CNR ExploRA


2022 Conference article Open Access OPEN
A preliminary assessment of the article deduplication algorithm used for the OpenAIRE Research Graph
Vichos K., De Bonis M., Kanellos I., Chatzopoulos S., Atzori C., Manola N., Manghi P., Vergoulis T.
In recent years, a large number of Scholarly Knowledge Graphs (SKGs) have been introduced in the literature. The communities behind these graphs strive to gather, clean, and integrate scholarly metadata from various sources to produce clean and easy-to-process knowledge graphs. In this context, a very important task of the respective cleaning and integration workflows is deduplication. In this paper, we briefly describe and evaluate the accuracy of the deduplication algorithm used for the OpenAIRE Research Graph. Our experiments show that the algorithm has an adequate performance producing a small number of false positives and an even smaller number of false negatives.Source: IRCDL 2022 - 18th Italian Research Conference on Digital Libraries, Padua, Italy, 24-25/02/2022
Project(s): OpenAIRE Nexus via OpenAIRE

See at: ceur-ws.org Open Access | ISTI Repository Open Access | CNR ExploRA


2022 Journal article Open Access OPEN
FDup: a framework for general-purpose and efficient entity deduplication of record collections
De Bonis M., Manghi P., Atzori C.
Deduplication is a technique aiming at identifying and resolving duplicate metadata records in a collection. This article describes FDup (Flat Collections Deduper), a general-purpose software framework supporting a complete deduplication workflow to manage big data record collections: metadata record data model definition, identification of candidate duplicates, identification of duplicates. FDup brings two main innovations: first, it delivers a full deduplication framework in a single easy-to-use software package based on Apache Spark Hadoop framework, where developers can customize the optimal and parallel workflow steps of blocking, sliding windows, and similarity matching function via an intuitive configuration file; second, it introduces a novel approach to improve performance, beyond the known techniques of "blocking" and "sliding window", by introducing a smart similarity matching function T-match. T-match is engineered as a decision tree that drives the comparisons of the fields of two records as branches of predicates and allows for successful or unsuccessful early-exit strategies. The efficacy of the approach is proved by experiments performed over big data collections of metadata records in the OpenAIRE Research Graph, a known open access knowledge base in Scholarly communication.Source: PeerJ Computer Science 8 (2022). doi:10.7717/PEERJ-CS.1058
DOI: 10.7717/peerj-cs.1058
Project(s): OpenAIRE Nexus via OpenAIRE
Metrics:


See at: OpenAIRE Open Access | ISTI Repository Open Access | peerj.com Open Access | CNR ExploRA


2022 Report Open Access OPEN
OpenOrgs: a tool for the disambiguation of organizations
Artini M., La Bruzzo S. F., De Bonis M., Pavone G.
Organizations appear all over the Research & Innovation ecosystem in different shapes and formats: the same organization may appear with different metadata fields, different names - e.g., full legal name, short or alternative names, acronym. The ambiguity of organizations results in a huge deficiency in the exchange of information, the findability of research products, the monitoring of activities, and ultimately building a linked open scholarly communication system. OpenOrgs combines an automated process and human curation to compensate for the lack of information available and improve the organization's discoverability.Source: ISTI Technical Report, ISTI-2022-TR/034, 2022
DOI: 10.32079/isti-tr-2022/034
Project(s): OpenAIRE-Advance via OpenAIRE, OpenAIRE Nexus via OpenAIRE
Metrics:


See at: ISTI Repository Open Access | CNR ExploRA


2023 Journal article Open Access OPEN
Graph-based methods for author name disambiguation: a survey
De Bonis M., Falchi F., Manghi P.
Scholarly knowledge graphs (SKG) are knowledge graphs representing research-related information, powering discovery and statistics about research impact and trends. Author name disambiguation (AND) is required to produce high-quality SKGs, as a disambiguated set of authors is fundamental to ensure a coherent view of researchers' activity. Various issues, such as homonymy, scarcity of contextual information, and cardinality of the SKG, make simple name string matching insufficient or computationally complex. Many AND deep learning methods have been developed, and interesting surveys exist in the literature, comparing the approaches in terms of techniques, complexity, performance, etc. However, none of them specifically addresses AND methods in the context of SKGs, where the entity-relationship structure can be exploited. In this paper, we discuss recent graph-based methods for AND, define a framework through which such methods can be confronted, and catalog the most popular datasets and benchmarks used to test such methods. Finally, we outline possible directions for future work on this topic.Source: PeerJ Computer Science 9 (2023). doi:10.7717/peerj-cs.1536
DOI: 10.7717/peerj-cs.1536
Project(s): EOSC Future via OpenAIRE, OpenAIRE Nexus via OpenAIRE
Metrics:


See at: PeerJ Computer Science Open Access | ISTI Repository Open Access | peerj.com Open Access | CNR ExploRA


2023 Conference article Open Access OPEN
A graph neural network approach for evaluating correctness of groups of duplicates
De Bonis M., Minutella F., Falchi F., Manghi P.
Unlabeled entity deduplication is a relevant task already studied in the recent literature. Most methods can be traced back to the following workflow: entity blocking phase, in-block pairwise comparisons between entities to draw similarity relations, closure of the resulting meshes to create groups of duplicate entities, and merging group entities to remove disambiguation. Such methods are effective but still not good enough whenever a very low false positive rate is required. In this paper, we present an approach for evaluating the correctness of "groups of duplicates", which can be used to measure the group's accuracy hence its likelihood of false-positiveness. Our novel approach is based on a Graph Neural Network that exploits and combines the concept of Graph Attention and Long Short Term Memory (LSTM). The accuracy of the proposed approach is verified in the context of Author Name Disambiguation applied to a curated dataset obtained as a subset of the OpenAIRE Graph that includes PubMed publications with at least one ORCID identifier.Source: TPDL 2023 - 27th International Conference on Theory and Practice of Digital Libraries, pp. 207–219, Zadar, Croatia, 26-29/09/2023
DOI: 10.1007/978-3-031-43849-3_18
Project(s): OpenAIRE Nexus via OpenAIRE
Metrics:


See at: doi.org Open Access | link.springer.com Open Access | ISTI Repository Open Access | CNR ExploRA


2019 Conference article Open Access OPEN
Deep learning techniques for visual food recognition on a mobile app
De Bonis M., Amato G., Falchi F. ., Gennaro C., Manghi P.
The paper provides an efficient solution to implement a mobile application for food recognition using Convolutional Neural Networks (CNNs). Different CNNs architectures have been trained and tested on two datasets available in literature and the best one in terms of accuracy has been chosen. Since our CNN runs on a mobile phone, efficiency measurements have also taken into account both in terms of memory and computational requirements. The mobile application has been implemented relying on RenderScript and the weights of every layer have been serialized in different files stored in the mobile phone memory. Extensive experiments have been carried out to choose the optimal configuration and tuning parameters.Source: 11th International Conference on Multimedia and Network Information Systems, MISSI 2018, pp. 303–312, Wroclaw; Poland, 12-14 September 2018
DOI: 10.1007/978-3-319-98678-4_31
Metrics:


See at: link.springer.com Open Access | ISTI Repository Open Access | doi.org Restricted | CNR ExploRA


2023 Conference article Open Access OPEN
(Semi)automated disambiguation of scholarly repositories
Baglioni M., Mannocci A., Pavone G., De Bonis M., Manghi P.
The full exploitation of scholarly repositories is pivotal in modern Open Science, and scholarly repository registries are kingpins in enabling researchers and research infrastructures to list and search for suitable repositories. However, since multiple registries exist, repository managers are keen on registering multiple times the repositories they manage to maximise their traction and visibility across different research communities, disciplines, and applications. These multiple registrations ultimately lead to information fragmentation and redundancy on the one hand and, on the other, force registries' users to juggle multiple registries, profiles and identifiers describing the same repository. Such problems are known to registries, which claim equivalence between repository profiles whenever possible by cross-referencing their identifiers across different registries. However, as we will see, this "claim set" is far from complete and, therefore, many replicas slip under the radar, possibly creating problems downstream. In this work, we combine such claims to create duplicate sets and extend them with the results of an automated clustering algorithm run over repository metadata descriptions. Then we manually validate our results to produce an "as accurate as possible" de-duplicated dataset of scholarly repositories.Source: IRCDL 2023 - 19th conference on Information and Research Science Connecting to Digital and Library Science, pp. 47–59, Bari, Italy, 23-24/02/2023
Project(s): OpenAIRE Nexus via OpenAIRE

See at: ceur-ws.org Open Access | ISTI Repository Open Access | CNR ExploRA


2018 Conference article Open Access OPEN
OpenAIRE: Advancing open science
Manghi P., Artini M., Atzori C., Baglioni M., Bardi A., La Bruzzo S., De Bonis M., Dimitropoulos H., Foufoulas I., Iatropoulou K., Manola N., Martziou S., Principe P.
OpenAIRE, the point of reference for Open Access in Europe, is now addressing the problem of enabling the Open Science paradigm. To this aim it will provide services to: (i) overcome the limits of today's scientific communication landscape, by allowing research communities and the relative e-infrastructures to fully publish, interlink, package and reuse their research artefacts (e.g. literature, data, and software) and their funding grants within the European and global ecosystem as supported/promoted by OpenAIRE, (ii) enable end-users (e.g. researchers, funder officers) to search and consult a rich and up-to-date knowledge graph of research results and (iii) enable scientific and educational information repositories and publishers to subscribe and be notified of changes in the OpenAIRE knowledge graph. These combined actions will bring long-term and immediate benefits to research communities, research organisations, repository managers, and funders by affecting the way research results are disseminated and reused. On the one hand, publishing the interlinked and packaged research literature, data and software via OpenAIRE drives research communities to an Open Science transition in a consistent and interoperable fashion. On the other hand, the resulting infrastructure concretely enables the construction of Open Science oriented services, supporting practices such as machine-assisted research reproducibility and evaluation.Source: GL19 - Nineteenth International Conference on Grey Literature, pp. 107–112, Roma, Italy, 23-24 October 2017
DOI: 10.26069/greynet-2018-000.010-gg
Project(s): OpenAIRE-Connect via OpenAIRE
Metrics:


See at: greyguide.isti.cnr.it Open Access | ISTI Repository Open Access | CNR ExploRA


2019 Journal article Open Access OPEN
OpenAIRE: Advancing open science
Manghi P., Artini M., Atzori C., Baglioni M., Bardi A., La Bruzzo S., De Bonis M., Dimitropoulos H., Foufoulas I., Iatropoulou K., Manola N., Martziou S., Principe P.
OpenAIRE, the point of reference for Open Access in Europe, is now addressing the problem of enabling the Open Science paradigm. To this aim it will provide services to: (i) overcome the limits of today's scientific communication landscape, by allowing research communities and the relative e-infrastructures to fully publish, interlink, package and reuse their research artefacts (e.g. literature, data, and software) and their funding grants within the European and global ecosystem as supported/promoted by OpenAIRE, (ii) enable end-users (e.g. researchers, funder officers) to search and consult a rich and up-to-date knowledge graph of research results and (iii) enable scientific and educational information repositories and publishers to subscribe and be notified of changes in the OpenAIRE knowledge graph. These combined actions will bring long-term and immediate benefits to research communities, research organisations, repository managers, and funders by affecting the way research results are disseminated and reused. On the one hand, publishing the interlinked and packaged research literature, data and software via OpenAIRE drives research communities to an Open Science transition in a consistent and interoperable fashion. On the other hand, the resulting infrastructure concretely enables the construction of Open Science oriented services, supporting practices such as machine-assisted research reproducibility and evaluation.Source: The Grey journal (Print) 15 (2019): 141–146.
Project(s): OpenAIRE-Connect via OpenAIRE

See at: ISTI Repository Open Access | CNR ExploRA


2017 Contribution to conference Open Access OPEN
OpenAIRE: The OpenScience European Infrastructure
Artini M., Atzori C., Baglioni M., Bardi A., Biagioni S., De Bonis M., Dell'Amico A., La Bruzzo S., Manghi P.
OpenAIRE , the point of reference for Open Access in Europe, is now addressing the problem of enabling the Open Science paradigm. To this aim it will provide services to: (i) overcome the limits of today's scientific communication landscape, by allowing research communities and the relative e-infrastructures to fully publish, interlink, package and reuse their research artefacts (e.g. literature, data, and software) and their funding grants within the European and global ecosystem as supported/promoted by OpenAIRE, (ii) enable end-users (e.g. researchers, funder officers) to search and consult a rich and up-to-date knowledge graph of research results and (iii) enable scientific and educational information repositories and publishers to subscribe and be notified of changes in the OpenAIRE knowledge graph. These combined actions will bring long-term and immediate benefits to research communities, research organisations, repository managers, and funders by affecting the way research results are disseminated and reused. On the one hand, publishing the interlinked and packaged research literature, data and software via OpenAIRE drives research communities to an Open Science transition in a consistent and interoperable fashion. On the other hand, the resulting infrastructure concretely enables the construction of Open Science oriented services, supporting practices such as machine-assisted research reproducibility and evaluation.Source: Nineteenth International Conference on Grey Literature - Public Awareness and Access to Grey Literature, pp. 13–13, Roma, CNR, 23-24 October 2017
Project(s): OpenAIRE-Advance via OpenAIRE

See at: greyguide.isti.cnr.it Open Access | ISTI Repository Open Access | CNR ExploRA


2019 Report Open Access OPEN
The OpenAIRE research graph: third-party publishing APIs
Atzori C., Baglioni M., Bardi A., Manghi P., La Bruzzo S., De Bonis M., Dell'Amico A., Artini M., Mannocci A., Ottonello E.
This work describes the specification of the OpenAIRE publishing APIs that support third-party services at publishing metadata about interlinked and packaged research products into the OpenAIRE Research Graph, in respect of the OpenAIRE interoperability guidelines (https://guidelines.openaire.eu). Research products generated by researchers using services of research infrastructures are today manually published by researchers in a repository external to their research infrastructure. This phase is often considered an extra burden, because researchers have to fill in metadata forms with information that is already available in the scope of the services they used. By using the OpenAIRE publishing APIs, services of research infrastructures can implement an on-demand publishing workflow for any type of research products to support their researchers at improving the FAIRness of their research products and relief them from the tedious step of finding a suitable repository and manually depositing the products in it.Source: ISTI Technical reports, 2019

See at: ISTI Repository Open Access | CNR ExploRA


2019 Conference article Open Access OPEN
The OpenAIRE Research Community Dashboard: On Blending Scientific Workflows and Scientific Publishing
Baglioni M., Bardi A., Kokogiannaki A., Manghi P., Iatropoulou K., Principe P., Vieira A., Nielsen L. H., Dimitropoulos H., Foufoulas I., Manola N., Atzori C., La Bruzzo S., Lazzeri E., Artini M., De Bonis M., Dell'Amico A.
Despite the hype, the effective implementation of Open Science is hindered by several cultural and technical barriers. Researchers embraced digital science, use "digital laboratories" (e.g. research infrastructures, thematic services) to conduct their research and publish research data, but practices and tools are still far from achieving the expectations of transparency and reproducibility of Open Science. The places where science is performed and the places where science is published are still regarded as different realms. Publishing is still a post-experimental, tedious, manual process, too often limited to articles, in some contexts semantically linked to datasets, rarely to software, generally disregarding digital representations of experiments. In this work we present the OpenAIRE Research Community Dashboard (RCD), designed to overcome some of these barriers for a given research community, minimizing the technical efforts and without renouncing any of the community services or practices. The RCD flanks digital laboratories of research communities with scholarly communication tools for discovering and publishing interlinked scientific products such as literature, datasets, and software. The benefits of the RCD are show-cased by means of two real-case scenarios: the European Marine Science community and the European Plate Observing System (EPOS) research infrastructure.Source: 23rd International Conference on Theory and Practice of Digital Libraries, TPDL, pp. 56–69, Oslo, Norway, September 9-12, 2019
DOI: 10.1007/978-3-030-30760-8_5
DOI: 10.5281/zenodo.3467104
DOI: 10.5281/zenodo.3467103
Project(s): OpenAIRE-Connect via OpenAIRE, OpenAIRE-Advance via OpenAIRE
Metrics:


See at: ZENODO Open Access | ZENODO Open Access | Universidade do Minho: RepositoriUM Open Access | ISTI Repository Open Access | repositorium.sdum.uminho.pt Open Access | doi.org Restricted | CNR ExploRA


2019 Dataset Unknown
OpenAIRE Research Graph Dump
Manghi P., Atzori C., Bardi A., Schirrwagen J., Dimitropoulos H., La Bruzzo S., Foufoulas I., Loehden A., Baecker A., Mannocci A., Horst M., Baglioni M., Czerniak A., Kiatropoulou K., Kokogiannaki A., De Bonis M., Artini M., Ottonello E., Lempesis A., Nielsen L. H., Ioannidis A., Bigarella C., Summan F.
The OpenAIRE Research Graph is one of the largest open scholarly record collections worldwide, key in fostering Open Science and establishing its practices in the daily research activities. Conceived as a public and transparent good, populated out of data sources trusted by scientists, the Graph aims at bringing discovery, monitoring, and assessment of science back in the hands of the scientific community. Imagine a vast collection of research products all linked together, contextualised and openly available. For the past ten years OpenAIRE has been working to gather this valuable record. OpenAIRE is pleased to announce the beta release of its Research Graph, a massive collection of metadata and links between scientific products such as articles, datasets, software, and other research products, entities like organisations, funders, funding streams, projects, communities, and data sources. As of today, the OpenAIRE Research Graph aggregates around 450Mi metadata records with links collecting from 10,000 data sources trusted by scientists, including repositories registered in OpenDOAR, Open Access journals registered in DOAJ, Crossref, Unpaywall, ORCID and Microsoft Academic Graph. After cleaning, deduplication, and fine-grained classification processes, they narrow down to ~100Mi publications, ~8Mi datasets, ~200K software research products, 8Mi other products linked together with semantic relations. More than 10Mi full-texts of Open Access publications are mined by algorithms to enrich metadata records with additional properties and links among research products, funders, projects, communities, and organizations. Thanks to the mining algorithm, the graph is completed with 480Mi semantic relations. The OpenAIRE Research graph is available via our BETA Explore Portal and you can download it from Zenodo.DOI: 10.5281/zenodo.3516918
Project(s): OpenAIRE-Advance via OpenAIRE
Metrics:


See at: CNR ExploRA


2021 Dataset Unknown
OpenAIRE research graph: dumps for research communities and initiatives
Manghi P., Atzori C., Bardi A., Baglioni M., Schirrwagen J., Dimitropoulos H., La Bruzzo S., Foufoulas I., Lohden A., Backer A., Mannocci A., Horst M., Czerniak A., Kiatropoulou K., Kokogiannaki A., De Bonis M., Artini M., Ottonello E., Lempesis A., Ioannidis A., Summan F.
This dataset contains dumps of the OpenAIRE Research Graph containing metadata records relevant for the research communities and initiatives collaborating with OpenAIRE. Each dataset is a tar file containing gzip files with one json per line. Each json is compliant to the schema available at DOI: 10.5281/zenodo.3974226DOI: 10.5281/zenodo.3974604
Project(s): RISIS 2 via OpenAIRE, BE OPEN via OpenAIRE, OpenAIRE-Advance via OpenAIRE
Metrics:


See at: CNR ExploRA


2021 Dataset Unknown
OpenAIRE Covid-19 publications, datasets, software and projects metadata
Bardi A., Kuchma I., Pavone G., Artini M., Atzori C., Backer A., Baglioni M., Czerniak A., De Bonis M., Dimitropoulos H., Foufoulas I., Horst M., Iatropoulou K., Jacewicz P., Kokogiannaki A., La Bruzzo S., Lazzeri E., Lohden A., Manghi P., Mannocci A., Manola N., Ottonello E., Schirrwagen J.
This dump provides access to the metadata records of publications, research data, software and projects that may be relevant to the Corona Virus Disease (COVID-19) fight. The dump contains records of the OpenAIRE COVID-19 Gateway (https://covid-19.openaire.eu/), identified via full-text mining and inference techniques applied to the OpenAIRE Research Graph (https://explore.openaire.eu/). The Graph is one of the largest Open Access collections of metadata records and links between publications, datasets, software, projects, funders, and organizations, aggregating 12,000+ scientific data sources world-wide, among which the Covid-19 data sources Zenodo COVID-19 Community, WHO (World Health Organization), BIP! FInder for COVID-19, Protein Data Bank, Dimensions, scienceOpen, and RSNA. The dump consists of a gzip file containing one json per line. Each json is compliant to the schema available at https://doi.org/10.5281/zenodo.3974226DOI: 10.5281/zenodo.3980490
Project(s): OpenAIRE-Advance via OpenAIRE
Metrics:


See at: CNR ExploRA


2022 Software Unknown
dnet-dedup framework
Artini M., Atzori C., Bardi A., Baglioni M., De Bonis M., Dell'Amico A., La Bruzzo S. F., Mannocci A., Manghi P.
The GDup Software enables an integrated, scalable, general-purpose system for entity deduplication over big information graphs. GDup supports practitioners with the functionalities needed to realize a fully-fledged entity deduplication workflow over a generic input graph, including Ground Truth support, end-user feedback, and strategies for identifying and merging duplicates to obtain an output disambiguated graph. GDup is today one of the core components of the OpenAIRE infrastructure production system, monitoring Open Science trends on behalf of the European Commission.Project(s): OpenAIRE-Advance via OpenAIRE, OpenAIRE Nexus via OpenAIRE

See at: github.com | CNR ExploRA


2022 Report Open Access OPEN
Data model description of the OpenAIRE Research Graph
La Bruzzo S. F., Artini M., Atzori C., Bardi A., Baglioni M., De Bonis M., Mannocci A., Manghi P., Pavone G.
The OpenAIRE Graph (formerly known as the OpenAIRE Research Graph) is one of the largest open scholarly record collections worldwide, key to fostering Open Science and establishing its practices in daily research activities. Conceived as a public and transparent good, populated out of data sources trusted by scientists, the Graph aims at bringing discovery, monitoring, and assessment of science back into the hands of the scientific community. Imagine a vast collection of research products all linked together, contextualized, and openly available. For the past years, OpenAIRE has been working to gather this valuable record. It is a massive collection of metadata and links between scientific products such as articles, datasets, software, and other research products, entities like organizations, funders, funding streams, projects, communities, and data sources. This technical Report describes the public data model adopted by the OpenAIRE Graph.Source: ISTI Technical Report, ISTI-2022-TR/031, 2022
DOI: 10.32079/isti-tr-2022/031
Metrics:


See at: ISTI Repository Open Access | CNR ExploRA


2022 Report Open Access OPEN
OpenAIRE Research Graph: aggregation workflow
La Bruzzo S. F., Artini M., Atzori C., Bardi A., Baglioni M., De Bonis M., Dell'Amico A., Mannocci A., Manghi P., Pavone G.
The OpenAIRE Graph (formerly the OpenAIRE Research Graph) is one of the largest open scholarly record collections worldwide. It is key in fostering Open Science and establishing its practices in daily research activities. Conceived as a public and transparent good, populated out of data sources trusted by scientists, the Graph aims at bringing discovery, monitoring, and assessment of science back into the hands of the scientific community. OpenAIRE collects metadata records from more than 70K scholarly communication sources worldwide, including Open Access institutional repositories, data archives, and journals. All the metadata records (i.e., descriptions of research products) are put together in a data lake with records from Crossref, Unpaywall, ORCID, ROR, and information about projects provided by national and international funders. This technical Report describes the main Aggregation Workflow to orchestrate the data aggregation and the implemented mapping from some of the main datasources into the OpenAIRE research graph data model.Source: ISTI Technical Report, ISTI-2022-TR/033, 2022
DOI: 10.32079/isti-tr-2022/033
Project(s): OpenAIRE-Advance via OpenAIRE, OpenAIRE Nexus via OpenAIRE
Metrics:


See at: ISTI Repository Open Access | CNR ExploRA