2016
Journal article  Open Access

Are scientific data repositories coping with research data publishing?

Assante M., Candela L., Castelli D., Tani A.

Data infrastructures  Data Quality  Scientific Data Repositories  Computer Science (miscellaneous)  Computer Science Applications  Online Information Services  Research Data Publishing 

Research data publishing is intended as the release of research data to make it possible for practitioners to (re)use them according to "open science" dynamics. There are three main actors called to deal with research data publishing practices: researchers, publishers, and data repositories. This study analyses the solutions offered by generalist scientific data repositories, i.e., repositories supporting the deposition of any type of research data. These repositories cannot make any assumption on the application domain. They are actually called to face with the almost open ended typologies of data used in science. The current practices promoted by such repositories are analysed with respect to eight key aspects of data publishing, i.e., dataset formatting, documentation, licensing, publication costs, validation, availability, discovery and access, and citation. From this analysis it emerges that these repositories implement well consolidated practices and pragmatic solutions for literature repositories. These practices and solutions can not totally meet the needs of management and use of datasets resources, especially in a context where rapid technological changes continuously open new exploitation prospects.

Source: Data science journal 15 (2016). doi:10.5334/dsj-2016-006

Publisher: Committee on Data for Science and Technology of the International Council for Science., Paris


Abelson, H, Adida, B, Linksvayer, M and Yergler, N 2008 ccREL: The Creative Commons Rights Expression Language. Tech. rep., Creative Commons.
Adie, E and Roe, W 2013 Altmetric: enriching scholarly content with article-level discussion and metrics. Learned Publishing, 26(1): 11-17. DOI: http://dx.doi.org/10.1087/20130103
Allen, L, Scott, J, Brand, A, Hlava, M and Altman, M 2014 (April) Publishing: Credit where credit is due. Nature, 312-313. Available at http://www.nature.com/news/publishing-credit-where-credit-is-due-1.15033. DOI: http://dx.doi.org/10.1038/508312a
Altman, M, Borgman, C L, Crosas, M and Martone, M 2015 An introduction to the joint principles for data citation. Bulletin of the Association for Information Science and Technology, 41(3). DOI: http://dx.doi. org/10.1002/bult.2015.1720410313
Andreoli-Versbach, P and Mueller-Langer, F 2014 Open access to data: An ideal professed but not practised. Research Policy, 43(9): 1621-1633. DOI: http://dx.doi.org/10.1016/j.respol.2014.04.008
Asher, A, Deards, K, Esteva, M, Halbert, M, Jahnke, L, Jordan, C, Keralis, S D C, Kulasekaran, S S, Moen, W E, Stark, S, Urban, T and Walling, D 2013 Research data management: Principles, practices, and prospects. Tech. rep., Council on Library and Information Resources.
Assante, M, Candela, L, Castelli, D, Manghi, P and Pagano, P 2015 Science 2.0 repositories: Time for a change in scholarly communication. D-Lib Magazine, 21(1/2). DOI: http://dx.doi.org/10.1045/ january2015-assante
Ball, A 2012 How to license research data. Tech. rep., Digital Curation Centre.
Ball, A, Chen, S, Greenberg, J, Perez, C, Jeffery, K and Koskela, R 2014 Building a disciplinary metadata standards directory. International Journal of Digital Curation, 9(1): 142-151. DOI: http://dx.doi. org/10.2218/ijdc.v9i1.308
Beckerle, M J and Hanson, S M 2014 (September) Data Format Description Language (DFDL) v1.0 Specification. Tech. Rep. GFD-P-R.207, Open Grid Forum.
Berenji Ardestani, S, Håkansson, C J, Laure, E, Livenson, I, Straňák, P, Dima, E, Blommesteijn, D and van de Sanden, M 2015 B2SHARE: An open escience data sharing platform. In: 11th IEEE International Conference on eScience, Munich, Germany.
Berman, F 2008 Got data? a guide to data preservation in the information age. Communications of the ACM, 51(12): 50-56. DOI: http://dx.doi.org/10.1145/1409360.1409376
Bizer, C, Heath, T and Berners-Lee, T 2009 Linked data - the story so far. International Journal on Semantic Web & Information Systems, 5(3): 1-22. DOI: http://dx.doi.org/10.4018/jswis.2009081901
Bobadilla, J, Ortega, F, Hernando, A and Gutiérrez, A 2013 Recommender systems survey. KnowledgeBased Systems, 46: 109-132. DOI: http://dx.doi.org/10.1016/j.knosys.2013.03.012
Borgman, C 2011 The Conundrum of Sharing Research Data. Journal of the Association for Information Science and Technology, 63(6): 1059-1078. DOI: http://dx.doi.org/10.1002/asi.22634
Borgman, C L 2015 Big Data, Little Data, No Data: Scholarship in the Networked World. The MIT Press.
Bourne, P E 2010 What Do I Want from the Publisher of the Future? PLoS Computational Biology, 6(5): e1000787. DOI: http://dx.doi.org/10.1371/journal.pcbi.1000787
Bourne, P E, Clark, T, Dale, R, de Waard, A, Herman, I, Hovy, E H and Shotton, D 2012 Improving the future of research communication and e-scholarship. Force11 white paper, Force11.
Broeder, D and Lannom, L 2014 Data type registries: A research data alliance working group. D-Lib Magazine, 20(1/2). DOI: http://dx.doi.org/10.1045/january2014-broeder
Buneman, P, Khanna, S, Tajima, K and Tan, W.-C 2004 Archiving scientific data. ACM Transactions on Database Systems, 29(1): 2-42. DOI: http://dx.doi.org/10.1145/974750.974752
Burda, D and Teuteberg, F 2013 Sustaining accessibility of information through digital preservation: A literature review. Journal of Information Science, 39(4): 442-458. DOI: http://dx.doi. org/10.1177/0165551513480107
Campbell, J 2015 Access to scientific data in the 21st century: Rationale and illustrative usage rights review. Data Science Journal, 13: 203-230. DOI: http://dx.doi.org/10.2481/dsj.14-043
Candela, L, Castelli, D, Manghi, P and Tani, A 2015 Data journals: A survey. Journal of the Association for Information Science and Technology, 66(9): 1747-1762. DOI: http://dx.doi.org/10.1002/asi.23358
Carata, L, Akoush, S, Balakrishnan, N, Bytheway, T, Sohan, R, Seltzer, M and Hopper, A 2014 A primer on provenance. Communications of the ACM, 57(5): 52-60. DOI: http://dx.doi.org/10.1145/2596628
Castelli, D, Manghi, P and Thanos, C 2013 A vision towards scientific communication infrastructures. International Journal on Digital Libraries, 13(3-4): 155-169. DOI: http://dx.doi.org/10.1007/s00799- 013-0106-7
CCSDS 2012 Reference model for an open archival information system. Recommended practice CCSDS 650.0- M-2, Consultative Committee for Space Data Systems.
Chard, K, Pruyne, J, Blaiszik, B, Ananthakrishnan, R, Tuecke, S and Foster, I 2015 Globus data publication as a service: Lowering barriers to reproducible science. In: 11th IEEE International Conference on eScience, Munich, Germany.
CODATA-ICSTI Task Group on Data Citation Standards and Practices 2013 Out of cite, out of mind: The current state of practice, policy, and technology for the citation of data. Data Science Journal, 12: CIDCR1-CIDCR75.
Cornillon, P, Gallagher, J and Sgouros, T 2003 OPeNDAP: Accessing data in a distributed, heterogeneous environment. Data Science Journal, 2. DOI: http://dx.doi.org/10.2481/dsj.2.164
Costas, R, Meijer, I, Zahedi, Z and Wouters, P 2013 (April) The value of research - data metrics for datasets from a cultural and technical point of view. Knowledge exchange report, Knowledge Exchange. URL www.knowledge-exchange.info/datametrics
Costello, M J, Michener, W K, Gahegan, M, Zhang, Z-Q and Bourne, P E 2013 Biodiversity data should be published, cited, and peer reviewed. Trends in Ecology & Evolution, 28(8): 454-461. Available at http:// www.sciencedirect.com/science/article/pii/S0169534713001092. DOI: http://dx.doi.org/10.1016/ j.tree.2013.05.002
Cragin, M H, Palmer, C L, Carlson, J R and Witt, M 2010 Data sharing, small science and institutional repositories. Philosophical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, 368(1926): 4023-4038. DOI: http://dx.doi.org/10.1098/rsta.2010.0165
Crosas, M 2011 The Dataverse Network: An Open-Source Application for Sharing, Discovering and Preserving Data. D-Lib Magazine, 17(1/2). DOI: http://dx.doi.org/10.1045/january2011-crosas
Daga, E, D'Aquin, M, Motta, E and Gangemi, A 2015 A bottom-up approach for licences classification and selection. In: Gandon, F., Gu´eret, C., Villata, S., Breslin, J., Faron-Zucker, C., Zimmermann, A. (Eds.), The Semantic Web: ESWC 2015 Satellite Events, Portorož, Slovenia, May 31 - June 4, 2015, Revised Selected Papers. Springer International Publishing, pp. 257-267. DOI: http://dx.doi.org/10.1007/978-3-319-25639-9_41
Data Citation Synthesis Group 2014 Joint Declaration of Data Citation Principles. Available at http:// www.force11.org/datacitation, accessed August 2014.
Denenberg, R 2009 Search web services - the oasis sws technical committee work - the abstract protocol definition, opensearch binding, and sru/cql 2.0. D-Lib Magazine, 15(1/2). DOI: http://dx.doi. org/10.1045/january2009-denenberg
Devine, E 2014 (November) Making data beautiful: the importance of supplemental material. Digital Science Blog. Available at http://www.digital-science.com/blog/guest/making-data-beautiful-theimportance-of-supplemental-material/.
Dobratz, S and Scholze, F 2006 DINI institutional repository certification and beyond. Library Hi Tech, 24(4): 583-594. DOI: http://dx.doi.org/10.1108/07378830610715446
Douglass, K, Allard, S, Tenopir, C, Wu, L and Frame, M 2014 Managing scientific data as public assets: Data sharing practices and policies among full-time government employees. Journal of the Association for Information Science and Technology, 65(2): 251-262. DOI: http://dx.doi.org/10.1002/asi.22988
Eschenfelder, K R and Johnson, A 2014 Managing the data commons: Controlled sharing of scholarly data. Journal of the Association for Information Science and Technology, 65(9): 1757-1774. DOI: http://dx.doi. org/10.1002/asi.23086
figshare 2013 (January) figshare partners with open access mega journal publisher PLOS. Figshare Blog. Available at http://figshare.com/blog/figshare_partners_with_Open_Access_mega_journal_ publisher_PLOS/68
Gandrud, C 2013 GitHub: A tool for social data set development and verification in the cloud. The Political Methodologist, 20(2): 7-16. DOI: http://dx.doi.org/10.2139/ssrn.2199367
Guha, R V, Brickley, D and Macbeth, S 2016 Schema.org: Evolution of structured data on the web. Communications of the ACM, 59(2), 44-51. DOI: http://dx.doi.org/10.1145/2844544
Guibault, L and Wiebe, A (Eds.) 2014 Safe to be open: Study on the protection of research data and recommendations for access and usage. Universitätsverlag Göttingen.
Heidorn, P B 2008 Shedding light on the dark data in the long tail of science. Library Trends, 57(2): 280- 299. DOI: http://dx.doi.org/10.1353/lib.0.0036
Hotan, A W, van Straten, W and Manchester, R N 2014 PSRCHIVE and PSRFITS: An Open Approach to Radio Pulsar Data Storage and Analysis. Publications of the Astronomical Society of Australia, 21(3): 302-309. DOI: http://dx.doi.org/10.1071/AS04022
Iannella, R 2002 Open Digital Rights Language (ODRL) Version 1.1. W3c note, W3C. Available at http:// www.w3.org/TR/odrl.
Kansa, E C, Kansa, S W and Arbuckle, B 2014 Publishing and pushing: Mixing models for communicating research data in archaeology. International Journal of Digital Curation, 9(1): 57-70. DOI: http://dx.doi. org/10.2218/ijdc.v9i1.301
Klump, J, Bertelmann, R, Brase, J, Diepenbroek, M, Grobe, H, Höck, H, Lautenschlager, M, Schindler, U, Sens, I and Wächter, J 2006 Data publication in the open access initiative. Data Science Journal, 5: 79-83. DOI: http://dx.doi.org/10.2481/dsj.5.79
Kowalczyk, S and Shankar, K 2011 Data sharing in the sciences. Annual Review of Information Science and Technology, 45(1): 247-294. DOI: http://dx.doi.org/10.1002/aris.2011.1440450113
Kratz, J 2014 (May) Fifteen ideas about data validation (and peer review). Data Pub Blog. Available at http:// datapub.cdlib.org/2014/05/08/fifteen-ideas-about-data-validation-and-peer-review/.
Kratz, J and Strasser, C 2014 Data publication consensus and controversies. F1000Research, 3(94). [v3; ref status: indexed, http://f1000r.es/4ja]. DOI: http://dx.doi.org/10.12688/f1000research.3979.3
Lagoze, C and Van de Sompel, H 2001 The open archives initiative: building a low-barrier interoperability framework. In: Proceedings of the first ACM/IEEE-CS Joint Conference on Digital Libraries. ACM Press, pp. 54-62. DOI: http://dx.doi.org/10.1145/379437.379449
Lawrence, B, Jones, C, Matthews, B, Pepler, S and Callaghan, S 2011 Citation and peer review of data: Moving towards formal data publication. International Journal of Digital Curation, 6(2): 4-37. DOI: http://dx.doi.org/10.2218/ijdc.v6i2.205
Manghi, P, Artini, M, Atzori, C, Bardi, A, Mannocci, A, La Bruzzo, S, Candela, L, Castelli, D and Pagano, P 2014 The D-NET software toolkit - a framework for the realization, maintenance, and operation of aggregative infrastructures. Program: electronic library and information systems, 48(4): 322-354. DOI: http://dx.doi.org/10.1108/PROG-08-2013-0045
Manghi, P, Bolikowski, L, Manola, N, Schirrwagen, J and Smith, T 2012 OpenAIREplus: the european scholarly communication data infrastructure. D-Lib Magazine, 18(9/10). DOI: http://dx.doi.org/10.1045/ september2012-manghi
Manghi, P and Mannocci, A 2013 Data Searchery: Preliminary Analysis of Data Sources Interlinking. In: Research and Advanced Technology for Digital Libraries. Vol. 8092 of Lecture Notes in Computer Science. pp. 458-461.
Mannheimer, S, Yoon, A, Greenberg, J, Feinstein, E and Scherle, R 2014 A balancing act: The ideal and the realistic in developing Dryad's preservation policy. First Monday, 19(8). DOI: http://dx.doi.org/10.5210/ fm.v19i8.5415
Marcial, L H and Hemminger, B M 2010 Scientific data repositories on the web: An initial survey. Journal of the American Society for Information Science and Technology, 61: 2029-2048. DOI: http://dx.doi. org/10.1002/asi.21339
Mayernik, M S, Callaghan, S, Leigh, R, Tedds, J and Worley, S 2014 Peer review of datasets: When, why, and how. Bulletin of the American Meteorological Society e-View.
McGath, G 2013 The format registry problem. Code4Lib, 19.
Nicol, A, Caruso, J and Archambault, É 2013 (August) Open data access policies and strategies in the european research area and beyond. Tech. rep., Science- Metrix Inc.
OECD 2007 OECD Principles and Guidelines for Access to Research Data from Public Funding. OECD Publications.
Palmer, C L, Cragin, M H, Heidorn, P B and Smith, L C 2007 Data curation for the long tail of science: The case of environmental sciences. In: Third International Digital Curation Conference, Washington, DC.
Palmer, C L, Weber, N M and Cragin, M H 2011 The analytic potential of scientific data: Understanding reuse value. Proceedings of the American Society for Information Science and Technology, 48(1): 1-10. DOI: http://dx.doi.org/10.1002/meet.2011.14504801174
Pampel, H and Dallmeier-Tiessen, S 2014 Open research data: From vision to practice. In: Bartling, S., Friesike, S. (Eds.), Opening Science. Springer International Publishing, pp. 213-224. DOI: http://dx.doi. org/10.1007/978-3-319-00026-8_14
Parsons, M and Fox, P 2013 Is data publication the right metaphor? Data Science Journal, 12: WDS31- WDS46. DOI: http://dx.doi.org/10.2481/dsj.WDS-042
Parsons, M A and Duerr, R 2005 Designating user communities for scientific data: challenges and solutions. Data Science Journal, 4: 31-38. DOI: http://dx.doi.org/10.2481/dsj.4.31
Parsons, M A, Gødoy, Ø, LeDrew, E, de Bruin, T F, Danis, B, Tomlinson, S and Carlson, D 2011 A conceptual framework for managing very diverse data for complex, interdisciplinary science. Journal of Information Science, 37(6): 555-569. DOI: http://dx.doi.org/10.1177/0165551511412705
Peroni, S and Shotton, D 2012 FaBiO and CiTO: Ontologies for describing bibliographic resources and citations. Web Semantics: Science, Services and Agents on the World Wide Web, 17: 33-43. DOI: http://dx.doi. org/10.1016/j.websem.2012.08.001
Peters, D 2015 (June) Wiley partnership with figshare enables data sharing. Press release. Available at http:// eu.wiley.com/WileyCDA/PressRelease/pressReleaseId-119082.html.
Piwowar, H A 2013 Altmetrics: Value all research products. Nature, 493(159).
Rauber, A, Asmi, A, van Uytvanck, D and Proll, S 2015 (September) Data citation of evolving data. RDA working group on Data Citation: Making Dynamic Data citeable Recommendations https://rd-alliance. org/system/files/documents/RDA-DC-Recommendations_150924.pdf.
Raymond, E S 2008 The Art of UNIX Programming. Addison-Wesley.
Renear, A H, Sacchi, S and Wickett, K M 2010 Definitions of dataset in the scientific and technical literature. Proceedings of the American Society for Information Science and Technology, 47(1): 1-4. DOI: http:// dx.doi.org/10.1002/meet.14504701240
Roche, D G, Jennions, M D and Binning, S A 2013 Data deposition: Fees could damage public data archives. Nature, 502(7470): 171. DOI: http://dx.doi.org/10.1038/502171a
Rombouts, J and Princic, A 2010 Building a 'data repository' for heterogenous technical research communities through collaborations. In: International Association of Scientific and Technological University Libraries, 31st Annual Conference.
Rousidis, D, Garoufallou, E, Balatsoukas, P and Sicilia, M-A 2014 Data Quality Issues and Content Analysis for Research Data Repositories: The Case of Dryad. In: ELPUB2014. Let's put data to use: digital scholarship for the next generation, 18th International Conference on Electronic Publishing 1920 June 2014, Thessaloniki, Greece. IOS Press, pp. 45-98.
Schopf, J M 2012 Treating Data Like Software: A Case for Production Quality Data. In: Proceedings of the 12th ACM/IEEE-CS Joint Conference on Digital Libraries. ACM, pp. 153-156. DOI: http://dx.doi. org/10.1145/2232817.2232846
Simmhan, Y L, Plale, B and Gannon, D 2005 (Sep.) A survey of data provenance in e-science. SIGMOD Rec. 34(3): 31-36. Available at http://doi.acm.org/10.1145/1084805.1084812. DOI: http://dx.doi. org/10.1145/1084805.1084812
Smith, M, Barton, M, Bass, M, Branschofsky, M, McClellan, G, Stuve, D, Tansley, R and Walker, J 2003 DSpace - An Open Source Dynamic Digital Repository. D-Lib Magazine, 9(1). Available at http://www. dlib.org/dlib/january03/smith/01smith.html. DOI: http://dx.doi.org/10.1045/january2003-smith
Starr, J, Castro, E, Crosas, M, Dumontier, M, Downs, R R, Duerr, R, Haak, L L, Haendel, M, Herman, I, Hodson, S, Hourclé, J, Kratz, J E, Lin, J, Nielsen, L H, Nurnberger, A, Proell, S, Rauber, A, Sacchi, S, Smith, A, Taylor, M and Clark, T 2015 Achieving human and machine accessibility of cited data in scholarly publications. PeerJ Computer Science, 1(e1).
Starr, J and Gastl, A 2011 isCitedBy: A metadata scheme for DataCite. D-Lib Magazine, 17: n.a. DOI: http:// dx.doi.org/10.1045/january2011-starr
Steinhart, G 2007 Datastar: An institutional approach to research data curation. IASSIST Quarterly, 31(3/4): 34-39.
Tenopir, C, Allard, S, Douglass, K, Aydinoglu, A U, Wu, L, Read, E, Manoff, M and Frame, M 2011 Data sharing by scientists: Practices and perceptions. PLoS ONE, 6(6): e21101. DOI: http://dx.doi.org/10.1371/ journal.pone.0021101
Thanos, C 2014 Scientific data reusability: Conceptual foundations, impediments and enabling technologies. Tech. rep., Istituto di Scienza e Tecnologie dell'Informazione “A. Faedo”, CNR.
Wang, X, De Martini, T, Wragg, B, Paramasivam, M and Barlas, C 2005 The MPEG-21 rights expression language and rights data dictionary. IEEE Transactions on Multimedia, 7(3): 408-417. DOI: http://dx.doi. org/10.1109/TMM.2005.846788
Willis, C, Greenberg, J and White, H 2012 Analysis and synthesis of metadata goals for scientific data. Journal of the American Society for Information Science and Technology, 63(8): 1505-1520. DOI: http:// dx.doi.org/10.1002/asi.22683

Metrics



Back to previous page
BibTeX entry
@article{oai:it.cnr:prodotti:354435,
	title = {Are scientific data repositories coping with research data publishing?},
	author = {Assante M. and Candela L. and Castelli D. and Tani A.},
	publisher = {Committee on Data for Science and Technology of the International Council for Science., Paris},
	doi = {10.5334/dsj-2016-006},
	journal = {Data science journal},
	volume = {15},
	year = {2016}
}

AGINFRA PLUS
Accelerating user-driven e-infrastructure innovation in Food Agriculture

BlueBRIDGE
Building Research environments for fostering Innovation, Decision making, Governance and Education to support Blue growth

IMARINE
Data e-Infrastructure Initiative for Fisheries Management and Conservation of Marine Living Resources


OpenAIRE