To wrap up another Love Data Week, we are going to talk about a topic that goes hand-in-hand with data sharing, but may often be overlooked – data discoverability. If you’re familiar with the FAIR data principles, you may remember that F stands for Findable, and that’s what we’re talking about here. It’s hard to use data if you can’t find it, after all!
One of our favorite things here at CDABS is getting to collaborate with our data colleagues at other universities, especially members of the Data Discovery Collaboration (DDC). Recently, members of the DDC (including yours truly), produced this handy guide on making your research data more discoverable.
- Decide what level of access you can provide – Data discovery and data access are related but distinct concepts. Data discovery is the ability to find if and where datasets exist, whereas data access refers to the processes for viewing and downloading data. It’s helpful to decide the level of access first, so you can be sure to describe your data accurately, and use appropriate tools to make data discoverable. For example, even if you cannot openly share your data, you can still share metadata about your data through the UMB Data Catalog.
- Comply with ethical standards – Ethics come into play in decisions and discussions throughout the life span of your research project, and will certainly be a part of decisions about data access and storage. Be especially careful about any data you collect regarding vulnerable populations, and keep in mind that advances in data and technology may outpace changes in regulations.
- Deposit your data somewhere trusted – In some fields and for some data types there is a clear place to deposit data, but for others there are more options. Look for recommendations from journals, funding agencies, professional associations, and your friendly neighborhood data librarian to find discipline specific or appropriate generalist repositories. Your data may be more findable if stored in the same place as other data from your field.
- Use persistent identifiers – Persistent identifiers or PIDs include things like ORCID iDs for people and DOIs for datasets, journal articles, and more. These identifiers provide a stable, unique link to data and the scholars who created it. PIDs are machine readable and thus an important building block of research data computing infrastructure.
- Create thoughtful and rich metadata – metadata is information about your data, and includes things like title, creator, subject, description, data types. Good metadata is also structured in a way that makes it both machine and human readable. When you search for something in a repository or catalog, it is the metadata that helps you find it. The UMB Data Catalog uses a robust metadata schema, so listing your data there is a great way to make sure your data is richly and accurately described.
- Choose your keywords carefully – not every repository or catalog will allow you to provide keywords, but this is another way you can add descriptive information to your data and potentially link it with other similar datasets that are described with the same terms (like how hashtags help you find tweets).
- Create links to related resources – think of your data as part of a larger research ecosystem that includes authors, publications, grants, institutions, software, code, etc. Link all these things together whenever you can. Creating multiple pathways to your data makes it more findable.
- Make supporting information discoverable too – It is often possible to submit your data along with an article to a journal. But these systems are not really designed for finding or storing data. A better bet is to submit data to a trusted repository and link to that data in your article.
- Include an accurate Data Availability Statement with your publications – a great way to make your data findable is to tell people exactly where to find it. You may have the opportunity in a publication to explain where one can find the associated data. Take the opportunity to provide a PID to your data, which is hopefully located in a trusted repository (see? we’re pulling it all together now). Also, pro tip – PubMed Central has special filters available that facilitate finding articles with data availability statements
- Talk to your institutional librarian – we called them “simple” rules, but we know they can actually be quite daunting! But have no fear, we at CDABS are here to help! Have questions about making your data more discoverable? Contact us for a research data consultation!
For more information, be sure to check out the full article!
This blog post has been adapted under a Creative Commons Attribution License, from Ten simple rules for improving research data discovery. PLoS Comput Biol 18(2): e1009768. https://doi.org/10.1371/journal.pcbi.1009768 by Contaxis N, Clark J, Dellureficio A, Gonzales S, Mannheimer S, Oxley PR, Ratajeski MA, Surkis A, Yarnell AM, Yee M, and Holmes K.
Contact: Amy Yarnell, Data Services Librarian and Jean-Paul Courneya, Bioinformationist – at firstname.lastname@example.org.
The Center for Data and Bioinformation Services (CDABS) is the University of Maryland Health Sciences and Human Services Library hub for data and bioinformation learning, services, resources, and communication.
To read more of our content and stay informed please fill out the form to subscribe here: https://www2.hshsl.umaryland.edu/cdabs/communications