2022
Other  Open Access

The open ecosystem of e-infrastructures for data discovery: a review

Bardi A., Kraker P., Juty N., Culina A., Colomb J., Widmann H., Goble C., Hiseni V., Flügel A. -L., Mathiak B., Heger T.

Research data discovery  data search  open science  research data  Open ecosystem  e-infrastructures 

Research data are among the fastest growing openly accessible scientific outputs on the web. While we have made great strides when it comes to accessibility of research data, discoverability is still one of the key challenges for open science: in many ways, we cannot cash the cheques written by this movement, if we do not increase the visibility of research outputs. Many research data discovery services have thus emerged, often embracing the principles of openness. They aim to make data discovery more effective, address new user needs, and exploit new technologies. This paper aims to support the conception and design of such tools by providing a descriptive framework of the current open ecosystem for research data discovery. In this framework we define the building blocks of the ecosystem (actors, roles and features of discovery services), describe how those interact with each other, and how they support the different discovery needs of researchers. We analyse the current practices of research data discovery to identify gaps in both the infrastructure and in users’ research strategy. We further analyse opportunities for innovative solutions to address the crisis of research data discoverability, improve data discovery and contribute to the evolution of the open ecosystem.


[1] J. M. Jeschke, S. Lokatis, I. Bartram, and K. Tockner, 'Knowledge in the dark: scientific challenges and ways forward', FACETS, Aug. 2019, doi: 10.1139/facets-2019-0007.
[3] M. D. Wilkinson et al., 'The FAIR Guiding Principles for scientific data management and stewardship', Sci Data, vol. 3, no. 1, Art. no. 1, Mar. 2016, doi: 10.1038/sdata.2016.18.
[4] C. Tenopir et al., 'Data sharing, management, use, and reuse: Practices and perceptions of scientists worldwide', PLoS ONE, vol. 15, no. 3, p. e0229003, Mar. 2020, doi: 10.1371/journal.pone.0229003.
[5] M. Baglioni et al., 'The OpenAIRE Research Community Dashboard: On Blending Scientific Workflows and Scientific Publishing', in Digital Libraries for Open Knowledge, Cham, 2019, pp. 56-69. doi: 10.1007/978-3-030-30760-8_5.
[6] A. Bardi, M. Manunta, E. Toth-Czifra, T. Vergoulis, P. Manghi, and M. Baglioni, 'D7.3 - Interoperability with Research Infrastructures', Sep. 2019, doi: 10.5281/zenodo.3701394.
[7] P. Kraker, M. Schramm, and C. Kittel, 'Discoverability in (a) Crisis', ABI Technik, vol. 41, no. 1, pp. 3-12, Feb. 2021, doi: 10.1515/abitech-2021-0003.
[8] 'Coronavirus (COVID-19): sharing research data', Wellcome. https://wellcome.org/pressrelease/sharing-research-data-and-findings-relevant-novel-coronavirus-ncov-outbreak (accessed Feb. 08, 2022).
[9] L. Besançon et al., 'Open science saves lives: lessons from the COVID-19 pandemic', BMC Medical Research Methodology, vol. 21, no. 1, p. 117, Jun. 2021, doi: 10.1186/s12874-021- 01304-y.
[10]R. C.-19 W. Group, 'RDA COVID-19 Recommendations and Guidelines on Data Sharing', Jun. 2020, doi: 10.15497/rda00052.
[11]B. Kramer and J. Bosman, '400+ Tools and innovations in scholarly communication'. [Online]. Available: http://bit.ly/innoscholcomm-list
[12]L. Bezuidenhout and J. Havemann, 'The varying openness of digital open science tools'. F1000Research, May 17, 2021. doi: 10.12688/f1000research.26615.2.
[13]B. Mathiak, N. Juty, A. Bardi, J. Colomb, and P. Kraker, 'Discoverability Use Cases to help define Requirements for Research Data Discovery Tools', Zenodo, Dec. 2021. doi: 10.5281/zenodo.5771603.
[14]G. Bilder, J. Lin, and C. Neylon, 'The Principles of Open Scholarly Infrastructure', The Principles of Open Scholarly Infrastructure, 2020. https://openscholarlyinfrastructure.org/ (accessed Feb. 08, 2022).
[15]'Defining open infrastructure - SCOSS - The Global Sustainability Coalition for Open Science Services'. https://scoss.org/what-is-scoss/defining-open-infrastructure/ (accessed Feb. 08, 2022).
[16]A. M. G. Zuiderwijk, M. F. W. H. A. Janssen, and C. B. Davis, 'Innovation with open data: Essential elements of open data ecosystems', Information Polity, 19 (1-2), 2014, 2014, Accessed: Feb. 28, 2022. [Online]. Available:
[17]R. Pollock, 'Building the (Open) Data Ecosystem', Open Knowledge Foundation blog, Mar. 31, 2011. https://blog.okfn.org/2011/03/31/building-the-open-data-ecosystem/ (accessed Feb. 28, 2022).
[18]A. Jaime, M. A. Osorio-Sanabria, T. Alcántara-Concepción, and P. L. Barreto, 'Mapping the open access ecosystem', The Journal of Academic Librarianship, vol. 47, no. 5, p. 102436, Sep. 2021, doi: 10.1016/j.acalib.2021.102436.
[19]R. Adner, 'Ecosystem as Structure: An Actionable Construct for Strategy', Journal of Management, vol. 43, no. 1, pp. 39-58, Jan. 2017, doi: 10.1177/0149206316678451.
[20]M. A. Phillips and P. Ritala, 'A complex adaptive systems agenda for ecosystem research methodology', Technological Forecasting and Social Change, vol. 148, p. 119739, Nov. 2019, doi: 10.1016/j.techfore.2019.119739.
[21]S. Jansen, 'A focus area maturity model for software ecosystem governance', Information and Software Technology, vol. 118, p. 106219, Feb. 2020, doi: 10.1016/j.infsof.2019.106219.
[22]A. Culina, M. Baglioni, T. W. Crowther, M. E. Visser, S. Woutersen-Windhouwer, and P. Manghi, 'Navigating the unfolding open data landscape in ecology and evolution', Nat Ecol Evol, vol. 2, no. 3, Art. no. 3, Mar. 2018, doi: 10.1038/s41559-017-0458-2.
[23]'Research data - CASRAI'. https://casrai-test.evision.ca/glossary-term/research-data/ (accessed Feb. 08, 2022).
[24]'Open access - H2020 Online Manual'. https://ec.europa.eu/research/participants/docs/h2020-funding-guide/cross-cuttingissues/open-access-data-management/open-access_en.htm (accessed Feb. 08, 2022).
[25]A. H. Renear, S. Sacchi, and K. M. Wickett, 'Definitions of dataset in the scientific and technical literature', Proceedings of the American Society for Information Science and Technology, vol. 47, no. 1, pp. 1-4, 2010, doi: 10.1002/meet.14504701240.
[26]'EUR-Lex - 32019L1024 - EN - EUR-Lex'. https://eur-lex.europa.eu/eli/dir/2019/1024/oj (accessed Feb. 08, 2022).
[27]'EOSC Glossary', EOSC Portal, Mar. 26, 2019. https://eosc-portal.eu/glossary (accessed Feb. 08, 2022).
[28]K. Achenbach et al., 'Defining discovery: Is Google Scholar a discovery platform? An essay on the need for a new approach to scholarly discovery [version 1; peer review: 1 approved, 1 approved with reservations]', Open Research Europe, vol. 2, no. 28, 2022, doi: 10.12688/openreseurope.14318.1.
[29]K. Skinner, Mapping the Scholarly Communication Landscape - 2019 Census. Atlanta, Georgia: Educopia Institute, 2019. Accessed: Mar. 02, 2022. [Online]. Available: https://educopia.org/2019-census/
[30]'Introducing eLife's first computationally reproducible article', eLife, Feb. 20, 2019. https://elifesciences.org/labs/ad58f08d/introducing-elife-s-first-computationally-reproduciblearticle (accessed Feb. 08, 2022).
[31]A. Burton et al., 'The Scholix Framework for Interoperability in Data-Literature Information Exchange', D-Lib Magazine, vol. 23, no. 1/2, Jan. 2017, doi: 10.1045/january2017-burton.
[32]A. Chapman et al., 'Dataset search: a survey', The VLDB Journal, vol. 29, no. 1, pp. 251- 272, Jan. 2020, doi: 10.1007/s00778-019-00564-x.
[33]E. Afgan et al., 'The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update', Nucleic Acids Research, vol. 46, no. W1, pp. W537- W544, Luglio 2018, doi: 10.1093/nar/gky379.
[34]M. Assante et al., 'Enacting open science by D4Science', Future Generation Computer Systems, vol. 101, pp. 555-563, Dicembre 2019, doi: 10.1016/j.future.2019.05.063.
[35]O. Benjelloun, S. Chen, and N. Noy, 'Google Dataset Search by the Numbers', in The Semantic Web - ISWC 2020, Cham, 2020, pp. 667-682. doi: 10.1007/978-3-030-62466- 8_41.
[36]D. Brickley, M. Burgess, and N. Noy, 'Google Dataset Search: Building a search engine for datasets in an open Web ecosystem', in The World Wide Web Conference, New York, NY, USA, Mai 2019, pp. 1365-1375. doi: 10.1145/3308558.3313685.
[37]K. M. Gregory, H. Cousijn, P. Groth, A. Scharnhorst, and S. Wyatt, 'Understanding data search as a socio-technical practice', Journal of Information Science, vol. 46, no. 4, pp. 459-475, Aug. 2020, doi: 10.1177/0165551519837182.
[38]T. Krämer, A. Papenmeier, Z. Carevic, D. Kern, and B. Mathiak, 'Data-Seeking Behaviour in the Social Sciences', Int J Digit Libr, vol. 22, no. 2, pp. 175-195, Jun. 2021, doi: 10.1007/s00799-021-00303-0.
[39]C. L. Borgman, P. T. Darch, I. V. Pasquetto, and M. F. Wofford, 'Our knowledge of knowledge infrastructures: Lessons learned and future directions', Jun. 2020, Accessed: Feb. 23, 2022. [Online]. Available: https://escholarship.org/uc/item/9rm6b7d4
[40]K. Gregory, P. Groth, A. Scharnhorst, and S. Wyatt, 'Lost or Found? Discovering Data Needed for Research', Harvard Data Science Review, vol. 2, no. 2, Apr. 2020, doi: 10.1162/99608f92.e38165eb.
[41]C. L. Palmer, 'Scholarly work and the shaping of digital access', Journal of the American Society for Information Science and Technology, vol. 56, no. 11, pp. 1140-1153, 2005, doi: 10.1002/asi.20204.
[42]L. M. Koesten, E. Kacprzak, J. F. A. Tennison, and E. Simperl, 'The Trials and Tribulations of Working with Structured Data: -a Study on Information Seeking Behaviour', in Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, New York, NY, USA, Mai 2017, pp. 1277-1289. doi: 10.1145/3025453.3025838.
[43]T. Friedrich, 'Looking for data', Nov. 2020, doi: 10.18452/22173.
[44]A. Ioannidis, 'Hardening our service', Zenodo Blog, Dec. 07, 2021. https://blog.zenodo.org/2021/12/07/2021-12-07-hardening-our-service/ (accessed Mar. 16, 2022).
[45]P. Kraker, M. Schramm, and C. Kittel, 'Open Knowledge Maps: Visual Discovery Based on the Principles of Open Science', Communications of the Association of Austrian Librarians, vol. 72, no. 2, Art. no. 2, Oct. 2019, doi: 10.31263/voebm.v72i2.3202.
[46]T. Alrashed, D. Paparas, O. Benjelloun, Y. Sheng, and N. Noy, 'Dataset or Not? A Study on the Veracity of Semantic Markup for Dataset Pages', in The Semantic Web - ISWC 2021, Cham, 2021, pp. 338-356. doi: 10.1007/978-3-030-88361-4_20.
[47]D. Kern and B. Mathiak, 'Are There Any Differences in Data Set Retrieval Compared to Well-Known Literature Retrieval?', in Research and Advanced Technology for Digital Libraries, Cham, 2015, pp. 197-208. doi: 10.1007/978-3-319-24592-8_15.
[48]G. Peng et al., 'Global Community Guidelines for Documenting, Sharing, and Reusing Quality Information of Individual Digital Datasets', Data Science Journal, vol. 21, no. 1, Art. no. 1, Mar. 2022, doi: 10.5334/dsj-2022-008.
[49]GO FAIR Discovery IN, 'Manifesto of the Discovery GO FAIR Implementation Network: Open User Interfaces for Increased Visibility of Research Results'. 2019. Accessed: May 12, 2022. [Online]. Available: https://www.go-fair.org/wp-content/uploads/2019/02/GO-FAIRManifesto_-Discovery-final.pdf

Metrics



Back to previous page
BibTeX entry
@misc{oai:iris.cnr.it:20.500.14243/483321,
	title = {The open ecosystem of e-infrastructures for data discovery: a review},
	author = {Bardi A. and Kraker P. and Juty N. and Culina A. and Colomb J. and Widmann H. and Goble C. and Hiseni V. and Flügel A.  -L. and Mathiak B. and Heger T.},
	doi = {10.5281/zenodo.6952904 and 10.5281/zenodo.7468089 and 10.5281/zenodo.6952905},
	year = {2022}
}

FAIRCORE4EOSC
Core Components Supporting a FAIR EOSC

DICE
Data Infrastructure Capacity for EOSC

D4
Deep Drug Discovery and Deployment

Integration of research literature and data
Integration of research literature and data

KonsortSWD
KonsortSWD

Mechanisms and disturbances in memory consolidation: From synapses to systems
Mechanisms and disturbances in memory consolidation: From synapses to systems

OpenAIRE Nexus
OpenAIRE-Nexus Scholarly Communication Services for EOSC users

Smart Harvesting 2
Smart Harvesting 2

TRIPLE
Transforming Research through Innovative Practices for Linked interdisciplinary Exploration

VENI personal grant
VENI personal grant


OpenAIRE