Publishing and archiving your data

➔Data Flow Model: Sharing and Preservation

Researcher question:

Is data curation and data sharing the same thing? Does data curation help me publish peer-reviewed articles or is it only for digital publications like podcasts, websites, and online exhibitions? What are data papers?

? ? ? ! *

*Data Flow model questions that help you think through this complexity:

  • Which data requires long-term preservation?
  • Will data be published, or shared with audiences beyond the partnership?
  • Will metadata be shared?

If you go this far in the data curation management flow, then the decision about what to share or archive (preserve) should be clearer. You can follow the FAIR principle of REUSE as a starting point; if you want your data to be reusable, what needs to be shared to ensure your data can be used and understood – and then ensure you also stick to the consent and copyright sections – what needs to be restricted and how. As data curation is a resource-heavy undertaking, another best practice to follow is to determine what you can sustainably and ethically make reusable. This, again, is context-specific to your project, funding, and data set attributes.

Digital humanities projects can produce many kinds of publications across multiple media platforms, including, for example, traditional publications (monograph, article), data sharing, knowledge mobilization through non-traditional digital formats, data papers, and web pages. These digital assets can be archived in a repository as well, alongside the data for the purposes of long-term preservation – but these outputs might live as two different versions in two different places. Archiving data is not the same as preserving other digital research outputs. Further, the working storage solution (that allows RAs and researchers to work, save, share their data while the project is ongoing) and the data preservation solution both have different requirements in terms of rights, formats, capacity, protection, and life span. A solution has to be worked out for both in parallel. Finally, when working in partnerships, sharing and preservation must be planned in a way to recognize contributions, and ensure licenses are followed, as well as publishing and sharing agreements are carried out. Some institutions and infrastructures might already have user agreements in place that treat all research partners/owners equally.

A “circulation copy” or “access copy” of a workshop tutorial video might be published to a website and hosted on a commercial platform such as YouTube to reach the widest audience possible and for user friendly streaming, but this approach does not guarantee long term preservation of the video. For this, you could archive a preservation copy in a repository as “data.” Although in digital humanities circles, the word “repository” can be used as a metaphorical language to describe a custom built and privately hosted digital asset management solution without a long-term preservation plan. Similarly, you could always store data, research work, or other project related outputs that you don’t want to share publicly but want to preserve or share in the short-term using dark storage. In research data management networks, however, a preservation-ready data repository is a specialized software platform built to archive the data for as long as the technological infrastructure is maintained. Some examples of data repositories of interest to digital humanists are listed below; many large-scale DH partnerships are international in nature and so the repository options are global:

  • Zenodo (international in scope)

These platforms are an option for you to both publish your data for sharing with the public and self-archive your data on a stable preservation ready platform. Submit your final data files to a repository assigning a persistent identifier (e.g., handles or DOIs). Provide good metadata for your study so others can find it (your discipline’s metadata standard can be used, e.g., Dublin Core, DDI, etc.). Here again you may want to check back in with your institution’s library and archives; they may offer digital preservation services for researchers.

Data papers are a growing genre of computational methodology papers that accompany a published data set. For example, the Journal of Cultural Analytics is a peer-reviewed venue for this kind of data publication. Additional peer-reviewed venues include the Journal of Open Humanities Data. Not only does it share your data while providing your theoretical and critical framework for collecting and working with your data, it also provides a valuable teachable resource for future generations of Digital Humanities scholars who will learn how to do computational work on carefully curated and well-documented data sets.


Icon for the Creative Commons Attribution 4.0 International License

Data Primer: Making Digital Humanities Research Data Public by Felicity Tayler; Marjorie Mitchell; Chantal Ripp; and Pascale Dangoisse is licensed under a Creative Commons Attribution 4.0 International License, except where otherwise noted.