This section will describe what are public repositories.
Research data repositories are online repositories that enable the preservation, curation and publication of research ‘products’. These repositories are mainly used to deposit research ‘data’. However, the scope of the repositories is broader as we can also deposit/publish ‘code’ or ‘protocols’ (as we saw with protocols.io).
There are general “data agnostic” repositories, for example:
Or domain specific, for example:
Research outputs should be submitted to discipline/domain-specific repositories whenever it is possible. When such a resource does not exist, data should be submitted to a ‘general’ repository. Research data repositories are a key resource to help in data FAIRification as they assure Findability and Accessibility.
Exercise: What makes it FAIR?
Have a look at the following record for data set in Hydroshare repository: Hydroshare. What elements make it FAIR?
The elements that make this deposit FAIR are:
Findable (persistent identifiers, easy to find data and metadata):
Accessible (The (meta)data retrievable by their identifier using a standard web protocols):
Interoperable (The format of the data should be open and interpretable for various tools):
Reusable (data should be well-described so that they can be replicated and/or combined in different settings, reuse states with a clear licence):
Minimal data set to consist of the data required to replicate all study findings reported in the article, as well as related metadata and methods.
(no need for raw data if the standard in the field is to share data that have been processed)
As a general rule, your research needs to be deposited in discipline/data specific repository. If no specific repository can be found, then you can use a generalist repository. Having said this, there are tons of data repositories to choose from. Choosing one can be time consuming and challenging as well. So how do you go about finding a repository:
Check the publisher’s / funder’ recommended list of repositories, some of which can be found below:
Check Fairsharing recommendations
Exercise: Public Repository
Using what you’ve learned so far:
Finding a repository first may help in deciding what metadata to collect and how!
It is also worth considering that some repositories offer extra features, such as running simulations or providing visualisation. For example, FAIRDOMhub can run model simulations and has project structures. Do not forget to take this into account when choosing your repository. Extra features might come in handy.
To make your code repositories easier to reference in academic literature, you can create persistent identifiers for them. Particularly, you can use the data archiving tool in Zenodo to archive a GitHub repository and issue a DOI for it.
You can evaluate the repositories by following this criteria:
For more information
An interesting take can be found at Peter Murray-Rust’s blog post Criteria for successful repositories.
The content of this chapter was adapted or inspired by: