Working with Data
12 Data Management Planning for Open Science Workflows
Felicity Tayler; Mélanie Brunet; Kathleen Gregory; Lina Harper; and Stefanie Haustein
Learning Outcomes
By the end of this chapter you should be able to:
- Describe open science as a movement that includes data sharing and reuse as best practices.
- Articulate your own researcher-centred motivations for data sharing and data citation.
- Write a Data Management Plan that describes an open science approach for mixed methods in social sciences.
- Make the connection between Data Management Plans and their relationship to national funding bodies in Canadian and international settings.
- Understand intellectual property as it applies to open data licensing options.
Pre-assessment
Introduction
This chapter will look at the hot topic of open science from the Research Data Management (RDM) perspective of supporting open data in the Social Sciences and related disciplinary contexts. We’ll discuss a mixed methods (qualitative and quantitative) Data Management Plan (DMP) exemplar to help you plan for an open science workflow. There are also open topics that resonate with other chapters in this textbook because open science workflows and Research Data Management for the purpose of data sharing and reuse are closely related. At the end of this chapter, we’ll address intellectual property (IP) as it defines data ownership, copyright, licensing, and permissions and therefore impacts options for practising open data and open science workflows.
The DMP presented as a case study in this chapter is taken from the real-world example of the Meaningful Data Counts (MDC) research project with principal investigators (PI) at both the University of Ottawa and Kiel University, Germany. The purpose of the MDC international research partnership is to improve the understanding of the role that datasets play in scholarly communication. The project generates empirical evidence on open data practices, including research data reuse and citation, which is essential to the development of meaningful data metrics and can help to elevate research data to first-class scholarly outputs. From the MDC project, we learn about data sharing motivations and behaviours. The mixed methods approach of the research offers a helpful case study that demonstrates, in practice, what an open science workflow looks like in a DMP. This DMP has been shared as a model and is one of the exemplars that is built into the Digital Research Alliance of Canada DMP Assistant. The DMP Assistant is an online tool, freely available to all researchers, that develops a DMP through a series of key data management questions, supported by best practice guidance and examples.
A researcher’s decision to share data or to engage in open science practices often depends on disciplinary norms. This chapter focuses on open science workflows and data sharing in the social sciences and related fields. These principles and practices are widely transferrable to other fields that work with quantitative and qualitative data methods. However, it is important to note that open science is defined differently across disciplinary contexts. For example, this chapter does not cover practices specific to biomedical fields, such as registration for clinical trials, systematic reviews, or other study types (requiring registration), and the use of study reporting guidelines.[1]
The next sections begin with a few definitions before moving into the case study example (DMP Exemplar) and applied best practices for open science workflows using an interdisciplinary mixed methods approach. The final section addresses intellectual property considerations that are key to ethical practices of working with open data.
What Is Open Science?
You may have heard the term open science used in different and sometimes contradictory contexts as numerous practitioner approaches, policies, articles, and mandates abound. This umbrella term is understood by different people in different ways and is discussed from different standpoints, each with its own assumptions, goals, and claims. Taking the MDC research project as a case study for how Research Data Management best practices can support an open science workflow, we’ll define open science from the standpoint of a researcher, or practitioner, as “the movement to make scientific research, data and dissemination accessible to all levels of an inquiring society” (FOSTER, n.d.). FOSTER is a European project dedicated to fostering the practical implementation of open science. Because there are so many ways to “do” open science FOSTER uses a taxonomy approach to map the broad field of activities and outputs related to these practices. For example, open science practice includes open access to publications, makes data openly available and reusable, uses open tools, engages in citizen science, and has open methods for evaluation of research.
The full range of possible open science activities and outputs often is reduced to discussions of open access publications, but open science aims to make the entire research process transparent and accessible, not just the final publication! Further, the importance of disciplinary norms in shaping different ways of actually “doing” open science in real life is often overlooked. This is problematic because different disciplines have different norms and avenues for making publications open and for sharing data. Researchers reuse open data for a variety of purposes. Existing datasets can serve as a basis for a new study, for classroom teaching on computational methods, to calibrate instruments, as a model, or as algorithm inputs. For this reason, RDM best practices recommend that researchers deposit data in repositories because this infrastructure is more reliable for long-term storage and maintenance of persistent identifiers (e.g., DOIs) that help other people find and cite the dataset. However, researchers also share data via personal websites, person to person, or through data availability statements in articles.
This chapter draws from an interdisciplinary mixed methods approach to sharing data that can be broadly applied across multiple disciplinary areas, but there are many other applications of open practices that can be explored in other disciplinary areas.
What Are Open Data?
FOSTER (n.d.) defines open data as “online, free of cost, accessible data that can be used, reused and distributed provided that the data source is attributed.” However, accessibility is only one part of the open data equation; data need to be prepared in a usable format (Fecher & Friesike citing Boulton et al., 2011). This is where Research Data Management best practices enable the usability of open data through the FAIR principles, as discussed in chapter 2, “The FAIR Principles and Research Data Management” (Wilkinson et al., 2016). Data should be findable, accessible, interoperable, and reusable — with an emphasis on machine interoperability. Making the data FAIR is only one part of the solution that Research Data Management best practices uphold; data sharing and reuse also requires context provided via supplementary information, such as literature, data documentation, and metadata.
Not all data can be open. Data with privacy concerns, such as confidential data with personal information, have to remain restricted. Research Data Management best practices can foreground an array of open science approaches while finding a balance between data that are as open as possible, but as closed as necessary.
The sharing and reuse of (open) data is an important concept in support of open science, with a preference for open data, when ethically appropriate. Perceived benefits of sharing and reusing data mirror the potential benefits of open science: to make research more reproducible and transparent, to save time and money, and to bring previously siloed data together in new ways. The UNESCO Recommendation on Open Science highlights the transformative potential of open science and its importance when addressing some of the most challenging problems of today, such as climate change, health issues, poverty, and rising inequalities.
The next sections will outline the MDC case study and DMP Exemplar, where the application of these principles of open science and documentation practices are described. Documentation practices, including a DMP, enable collaboration with other people who need to understand and make sense of data so the data can be reused appropriately.
Case Study: The Meaningful Data Counts Project
The MDC research project is a helpful case study for RDM best practices because the project both studies open data practices across disciplines, and practises open science using a social sciences mixed methods (qualitative and quantitative) approach: bibliometric, survey responses, and interviews.
The MDC project is part of the larger Make Data Count initiative, which drives the adoption of the building blocks for open data metrics: standardized data usage and data citation practices at repositories and publishers. MDC reports empirical evidence on data usage and data citation behaviour to improve the understanding of the role that datasets play in scholarly communication. Data sharing and citation patterns are studied across academic disciplines and researchers’ career stages. MDC also looks at underlying motivations researchers have to share or cite datasets — or not to do this. MDC has found that there are many motivations and ways for researchers to reuse and cite data. Although there is a great variety of data citation practices, most respondents to a survey conducted in the course of the project reported that they cite data, often for reasons motivated by “ideal” research practices, such as acknowledging intellectual debt, helping others to locate and access data, and supporting the validity of their own claims (Gregory et al., 2023). Conversely, barriers to sharing data include researchers’ fear of being scooped, fear of errors being exposed in their research, perception that the effort of preparing and publishing datasets is not worth the potential benefits; and belief that data sharing is not applicable to their own research (Tenopir et al., 2020).
The MDC project implements an open science workflow in order to report on challenges experienced by team members engaging in open science practices, such as sharing and citing research data. As much as possible, an open science workflow makes the research process transparent to people outside the original research team through sharing of research plans, processes, code, preliminary results, and data.
A key part of the MDC’s open science practice was the development of a detailed Data Management Plan in collaboration with the RDM librarian at the University of Ottawa, which has been shared as a model DMP endorsed by the Digital Research Alliance of Canada. As you learned in chapter 1, “The Basics,” a DMP is a document that describes how the data for a research project will be handled, from collection through organization and analysis to eventual disposal or deletion. DMPs are living documents that can be updated throughout the life of the project; this iterative approach pairs well with the goal of enabling ethical data sharing. Research Data Management best practices are central to academia embracing open science and are increasingly required to meet the goals of open science (Tenopir et al., 2020). The Tri-Agency Research Data Management Policy, for example, supports the FAIR (Findable, Accessible, Interoperable, and Reusable) guiding principles for Research Data Management and stewardship, and the three federal research funding agencies (SSHRC, NSERC, and CIHR) currently build data sharing into the grant application process in the “Knowledge Mobilization” section. It is expected that, to be successful, future grant applications will require a clearly articulated DMP.
This DMP for MDC describes how the project manages different types of data that the research team collects and analyzes. The DMP is one of the methods that the team uses to document the project workflow in order to communicate ethics protocols, file transfer and storage procedures, metadata standards, and software code between different team members working remotely. Anton Ninkov, a postdoctoral research team member tasked with data management responsibilities, observed that documenting workflow is “about thinking about the project as a bigger thing than an individual task. It’s about the movement of the whole project, which my work is just one component of” (personal communication, February 15, 2022).
MDC datasets include a bibliographic analysis of data citation patterns in a corpus of 8,643,593 datasets in DataCite (Ninkov et al., 2022), survey responses from more than 2,500 researchers reflecting upon data sharing and data citation practices across disciplines, and semistructured interviews with researchers that provide further insight into their motivations for sharing/citing data — or for not doing so. The DMP discussed in the next section outlines a plan to manage all the datasets produced through bibliometrics analyses, surveys, and interviews, with the intention to share the data with an open license throughout the lifecycle of the project — and not only at the time of publication.
Best Practices for a Data Management Plan in Support of an Open Science Workflow
A DMP is a great opportunity to emphasize open science practices, such as data sharing and reuse, but it can also support other components of an open science workflow. By linking the workflows documented in your DMP to other components of the research project, you are making sure that your research will be shared widely at multiple phases of the project, and that the data, which underpin the research findings reported in a publication, are transparent and replicable throughout — not only at project completion. Many researchers focus on the planning aspect of a DMP, writing out a plan at the start of the project and ignoring it after. But research is rarely linear, and plans often need to change. Creating subsequent versions can be incredibly useful as well, from the perspective of project planning and capturing the evolution of your research process.
- Open science emphasizes data sharing and reuse throughout research projects, not only at the final stage of publication.
- Open science workflows can be used for a myriad of research methods — mixed methods, quantitative, and qualitative — and across all disciplines.
- Updated versions of your DMP capture the evolution of your research methods and workflows.
MDC’s open science practices foregrounded the development of a comprehensive Data Management Plan. Version 1 of the plan, created at the beginning of the project, describes how the team of international researchers will manage different types of data that researchers will collect using a mixed methods (qualitative and quantitative) approach. DMPs are a living document, and the MDC team has recently updated their DMP, in keeping with open science best practices: review data documentation periodically and confirm that it accurately reflects research methods and data management processes followed by the research team. Version 2 of the DMP is deposited in the same repository.
Revising the DMP contributed to efficient project management. As principal investigator, Stefanie Haustein found, “Some sections prescribed by the DMP Assistant template did not apply to our research project after all” (personal communication, February 15, 2022). The default DMP Assistant template (as of 2022) asked researchers to address long-term preservation; however, Haustein reflected, “Long term preservation isn’t as relevant to us, as we assume that the technology such as the APIs (application programming interfaces) and the relevance of the data will have changed in 20 years from now” (personal communication, February 15, 2022). Revising the DMP encouraged a review of the research team’s workflow, including the work of members who joined the team after the first version was published. This review captured changes in data collection/processing that needed to be reflected in the documentation. The documentation of these methodological workflows is important as an open science best practice because, in order for shared data and related findings to be understood or replicated by people outside a research team, there must be some context on how the data were collected, structured, and analyzed.
Both versions of the DMP were created using the Digital Research Alliance of Canada’s recommended tool, DMP Assistant, in collaboration with the RDM librarian at the University of Ottawa. The team also contributed a template for open science workflows to the DMP Assistant, which guides research teams through the best practices to include in funder-required DMPs. The MDC DMP has been peer reviewed, published, and distributed as a national example of best practice in writing a DMP for an open science workflow, a mixed methods approach, and an international research partnership. All training resources created by Digital Research Alliance of Canada are licensed under CC BY-NC 4.0 and are free to share and adapt for your own needs.
This section outlines some of the best practices that were written into the MDC DMP in order to document processes and enable collaboration within the research team or with other people who need to understand and make sense of the data so the data can be reused appropriately. We list a few here but encourage you to consult the “Guidance” sections of the full DMP Exemplar or template for details.
Responsibility and Resources
- Allocate adequate human resources for data stewardship responsibilities in your budget and in advance of data collection. The principal investigator is usually in charge of maintaining data accessibility standards for the team. Assign people to structure data, document data, and field questions about accessing information or granting access to the data.
- Create an onboarding document to ensure that all team members adopt the same workflows. Logical file structures, informative naming conventions, and clear indications of file versions all contribute to better use of your data during and after your research project. Using a file naming convention worksheet can be very useful.
- Document your process and revise your Data Management Plan if it changes: Consult regularly with members of the research team to capture potential changes in data collection, processing, and publishing that need to be reflected in the documentation.
Documentation and Metadata
- Document workflows with a README file accompanying all datasets. Good data documentation includes information about the study, data-level descriptions, and any other contextual information required to make the data usable by other researchers.
- Use open file formats or industry-standard formats (e.g., those widely used by a given community) whenever possible.
- Use a metadata schema specifically for open datasets or any of the many other general and domain-specific metadata standards. Dataset documentation should be provided in one of these standard, machine-readable, openly accessible formats to enable the effective exchange of information between users and systems. DataCite has developed a set of core metadata fields and instructions to make datasets easily identifiable and citable.
Ethics and Legal Compliance
- Open science workflows prioritize being “as open as possible and as closed as necessary.” Consider which types of data need to be shared to meet institutional or funding requirements and which data may be restricted because of confidentiality, privacy, and/or intellectual property considerations outlined in your ethics protocol.
- Request the appropriate consent from research participants so that their data may be shared. Your statement of informed consent may identify certain conditions clarifying the uses of the data. Inform your study participants if you intend to publish an anonymized and de-identified version of collected data, and make sure they understand that by participating, they agree to these terms.
- Use open licenses, such as CC BY, to promote data sharing and reuse. Licenses determine how your data can be used by others. Consider including a copy of your end-user license with your DMP (addressed further in the next section).
Knowledge Mobilization
- Help others reuse and cite your data. Did you know that a dataset is a scholarly output that you can list on your CV, just like a journal article? If you publish your data in a data repository (e.g., Zenodo, Borealis, Dryad), they can be found and reused by others. Unique Digital Object Identifiers (DOIs) make it easier to identify and cite datasets.
- Use social media, e-newsletters, bulletin boards, posters, talks, webinars, discussion boards, or discipline-specific forums to gain visibility for your published data, promote transparency, and encourage data discovery and reuse. Cite your datasets the same way you cite other types of publications.
What Makes Open Data? Restrictions on Sharing Data
The MDC case study makes the connection between data sharing and Data Management Plans as they work together in support of open science practices across a research project. This section addresses the legal and contractual terms that allow or restrict access to the sharing and reuse of data as they flow through digital infrastructures. Following an overview of the privacy considerations in the MDC project, this section focuses on intellectual property considerations when determining data ownership and sharing research data.[2] While the discussion of IP and licensing data responds to a Canadian context, the MDC DMP clearly states how access will be restricted to data with privacy concerns in the context of an international research project. It also states how data that have been anonymized will be shared using an open license, which will enable reuse of the dataset.
A license is a permission from the copyright owner to allow someone else to use their work (in this case, data in some form) for certain purposes and under certain conditions. The copyright remains with the copyright owner (Canadian Intellectual Property Office, 2019). Once you have determined if the data are protected by copyright and, if so, who owns them and whether it is possible to share the data openly, there are a variety of open licenses that can be applied to indicate that openness. Open licenses are used by copyright owners to indicate which rights they wish to keep while also communicating how others can use their work without having to ask for permission every time. When a copyright owner decides to apply an open license to their work, they keep their copyright but make their work free of some of the usual constraints related to sharing, remixing, and reusing the work legally so long as the conditions of the license are respected. These open licenses are a simple and legal way to communicate that permission to potential users. Many repositories make it possible to select an open license easily and incorporate that information in the metadata.
While data sharing is a cornerstone of open science, it may not always be advisable, safe, or even legal to share data. Open science best practices prioritize respecting ethical and legal restrictions on access to data as a balance to broader goals of sharing, publishing, and reusing data. To follow this best practice, you will need to consider which types of data need to be shared to meet institutional or funding requirements and which data must be restricted because of confidentiality, privacy, and/or intellectual property considerations outlined in your ethics protocol. Indeed, before making data available publicly and openly, it is essential to determine whether doing so is ethically and legally permitted. The safety and privacy of participants, Indigenous data sovereignty, and the confidential or proprietary nature of the data may limit your ability to share them. In relation to data ownership, copyright status also needs to be clarified.
In our case study, the MDC DMP declares that all final data and publications will be published using an open access model. To achieve this goal, the international, multi-institutional partnership must also comply with the RDM policies of its host institutions, which take into account relevant legislation, industry standards, and best practices. Specifically, the data workflows will reflect the University of Ottawa’s legal and ethical considerations and the Canadian Tri-Council Policy Statement: Ethical Conduct for Research Involving Humans (TCPS 2) (2022) but may also refer to the University of Kiel’s integrity and ethics in research policy if the TCPS 2 doesn’t provide enough guidance. The Co-PI is affiliated with European institutions; therefore, research methods will comply with the EU’s General Data Protection Regulation (GDPR), which is stricter than Canadian equivalents.
The research team has stored sensitive data on a secure server in Canada, with access limited to only the PI and Co-PI for the entire project. Other team members were granted temporary access while they worked on data collection and anonymization of sensitive data. Collection of qualitative and personal data followed formal ethics approval from the University of Ottawa’s research ethics board and required explicit and informed participant agreement for data sharing following the Recommended Informed Consent Language for Data Sharing (ICPSR, n.d.). Social media and other public web data were collected and managed in line with the Association of Internet Researchers’ Ethical Guidelines 3.0 document (franzke et al., 2019). Any data determined to be sensitive will be stored securely with password protection and encryption. Data will be anonymized in reporting, except where explicitly agreed otherwise. When the data have been anonymized, they can be shared as Open Data with a Creative Commons Attribution (CC BY) 4.0 International license. If CC BY is not possible, the team will use the more restrictive Creative Commons Attribution-NoDerivatives (CC BY-ND) license.
Can I Share Data? Determining Data Ownership
You may wonder why the research team must assign a license to their data to make them open. Are data even protected by copyright? Because copyright protects the original expression of ideas or facts fixed in a tangible medium, it’s easy to conclude that data are like facts, so not protected. Indeed, raw or factual data that are not interpreted generally do not enjoy copyright protection. However, a compilation of data can be protected because of the judgment, skill, or effort applied when determining which data to include and/or their arrangement (making the data an “original expression”). Also, if the data are literary, musical, dramatic, or artistic works, they can be protected by copyright. Table 1 below summarizes the types of data that could be protected by copyright.
Not Protected by Copyright | Could Be Protected by Copyright |
Raw data (i.e., a number or measurement) | Data representations (e.g., tables and graphs) |
Datasets | |
Data compilations | |
Databases | |
Purchased data (with conditions of use) | |
Literary, musical, dramatic, or artistic works (e.g., photos) |
If it is determined that the data are protected by copyright, then who owns them? If you are in possession of data generated or supplied by a third party, even if they were accessible for free, it does not mean that you own any existing copyright. Always look for a license or terms of use. Copyright ownership can vary by type of data (as summarized in Table 2).
Primary Data | Data collected for your own purposes, from an experiment or research you have conducted and which you have fixed in a tangible medium |
If copyright exists, you are probably the owner, but you should check the agreements or contracts related to your research project to confirm | |
Secondary Data | Data collected for other purposes from experiment(s) or research conducted by others |
If copyright exists, it is likely owned by others | |
Tertiary Data | Synthesis of data from experiment(s) or research conducted by others |
Articles, reports, etc. written by others for which you do not own the copyright |
There may be factors external to your research team or project that could determine whether data are protected by copyright and who owns them, including the following:
- policies or contractual arrangements between researchers and affiliated institutions (e.g., employment contracts, collective agreements)
- disciplinary conventions or practices in authorship attribution
- policies of the agency or organization that is funding the research in whole or in part
- license conditions or terms of use of purchased data — acquiring data from a third party does not mean that copyright has been transferred to you or that you are authorized to share the data
All parties involved in a research project should clarify data and copyright ownership issues early on. The various and sometimes overlapping statuses of data collectors or researchers, even within one institution or organization, are significant factors in determining who owns the copyright on research data. It is crucial to clarify copyright ownership because protected data cannot be made more open without the permission of the owner.
Three main types of open licenses are used for data:
- Creative Commons licenses
- Open Data Commons licenses
- Software licenses
Two Creative Commons designations are often used for data and are offered as options in data repositories:
- CC BY 4.0 (Creative Commons Attribution 4.0 International License): This license requires users to credit the author.
- CC0 (Public Domain): This designation is used to indicate that the copyright owner is waiving their rights to recent content. When data are in the public domain, there are no restrictions on their use and attribution is not required. In some data repositories, such as Borealis, CC0 is the license by default.
Creative Commons licenses apply to both the contents of a database and the database itself. Creative Commons does not recommend using licenses with the NonCommercial (NC) or NoDerivatives (ND) conditions for data because they severely restrict scholarly and scientific use.[3] Although we don’t recommend limiting the reuse of data to noncommercial purposes, you could apply a Creative Commons Attribution-NonCommercial license. However, it is important to note that this condition generally applies to the use as opposed to the user. It would likely not prevent a commercial entity from using the data if it does not resell them or use them as the basis for a product or service that will be sold for profit.
While not available in all data repositories, the Open Knowledge Foundation offers three open licenses used specifically for databases:
- ODbL 1.0 (Open Data Commons Open Database License)
- ODC-BY 1.0 (Open Data Commons Attribution License)
- PDDL 1.0 (Open Data Commons Public Domain Dedication and License)
Note that Open Data Commons licenses apply to databases only and not to the individual contents within a database.
Software licenses are some of the earliest open licenses and are also used in data repositories. They can be applied to the software or to the code, as well as to the associated documentation files:
Table 3 below offers a comparison of these open licenses based on what they allow and the need for attribution, from the perspective of a user of licensed data (not the creator).
Licence* | Distribution | Modification | Sublicensing€ | Attribution |
© All rights reserved | Permission needed | Permission needed | Permission needed | Required |
CC BY | Allowed | Allowed | Allowed | Required |
CC0 | Allowed | Allowed | Not allowed | Not required |
ODbL | Allowed | Allowed | Not allowed | Required |
ODC-BY | Allowed | Allowed | Not allowed | Required |
PDDL | Allowed | Allowed | Allowed | Not required |
MIT | Allowed | Allowed | Allowed | Required |
GNU GPL | Allowed | Allowed | Allowed | Required |
Apache | Allowed | Allowed | Allowed | Required |
Comparison table licensed CC BY-SA 4.0, based on “Comparison of Free and Open-Source Software licenses,” Wikipedia, CC BY-SA 3.0.
* All eight licenses allow for commercial use
€ Sublicensing means that derivatives can be shared under a different license
Conclusion
This chapter discussed data management planning as an RDM best practice that can support open data and data sharing as integral parts of an open science workflow in the social sciences and related disciplinary contexts. Individual researchers choose to make their data openly available for many different reasons, including increased citation of their work, but the collective goals of the open science movement are to make research more reproducible and transparent, to save time and money, and to bring previously isolated/siloed data together in new ways. Through the Data Management Plan in the case study, Meaningful Data Counts, you have learned the value of a DMP in overall project planning with open science goals in mind. The DMP ensures consistent and ethical management of all datasets produced by multiple research team members through bibliometrics analyses, surveys, and interviews; it also ensures that the data will be shared throughout the lifecycle of the project — not only at the time of publication. Key components of data sharing outlined in the DMP include depositing datasets in a recognized repository using an open license. Open licensing grants permission from MDC to other researchers to reuse their work, and the data repository ensures researchers can find the datasets and cite them appropriately. In the final section of this chapter, you learned that, in addition to privacy considerations, before making data open, you must ascertain whether the data are protected by copyright and, if so, who owns them. Once it is determined that the data can be shared openly, choosing an open license that allows for modifications encourages reuse for scholarly and scientific purposes. Not all data can be open data, but, if you wish to adopt the principles of the open science movement through data sharing and deposit in repositories, a DMP can help you standardize and communicate the steps to follow across the research team and to the wider disciplinary community.
Reflective Questions
Key Takeaways
- Open science is a movement to make scientific research, data, and dissemination accessible through open access to publications. It supports making data openly available and reusable, using open tools, engaging in citizen science, and having open methods for evaluation of research.
- Researcher motivations for data sharing and data citation often depend on disciplinary norms, but all researchers who publish and cite data participate in a process of elevating research data as a first-class research output with equivalent status to other research outputs.
- Crafting a Data Management Plan (DMP) with an open science workflow is a good way to meet funder requirements for the effective management of research data during a project, with a goal of enabling ethical data sharing.
- By linking the workflows documented in your DMP to other components of the research project, you ensure that your research will be shared widely at multiple phases of the project, and that the data that underpin the research findings reported in a publication are transparent and replicable throughout the project (not only at completion).
- DMPs are living documents, and it can be helpful to revisit and update your DMP throughout the research project. Creating subsequent versions is a useful way to capture the evolution of your research process.
- In addition to ethical considerations, before making data open, the existence and ownership of copyright need to be clarified; if applicable, obtain permission before depositing data in an open repository.
- Once it is determined that the data can be shared openly, choose an open license that allows for modifications as much as possible: a “no derivatives” condition will severely restrict use for scholarly and scientific purposes and limit the benefits of making the data open.
Reference List
Boulton, G., Rawlins, M., Vallance, P. & Walport, M. (2011). Science as a public enterprise: The case for open data. The Lancet, 377(9778), 1633–1635. https://doi.org/10.1016/S0140-6736(11)60647-8
Brunet, M., Hatherill J., & Ripp, C. (2021). Open access to knowledge part 2: Sharing your research data. University of Ottawa Library. http://hdl.handle.net/10393/43309
Brunet, M., & Rouleau, T. (2021). Copyright and research data at uOttawa – FAQ, University of Ottawa Library. https://copyright.uottawa.ca/sites/copyright.uottawa.ca/files/copyright_and_research_data_faq.pdf
Canadian Intellectual Property Office (CIPO). 2019. A Guide to Copyright (Assignments and Licences). Government of Canada. https://www.ic.gc.ca/eic/site/cipointernet-internetopic.nsf/eng/h_wr02281.html#assignmentsLicences.
Cobey, K. D., Haustein, S., Brehaut, J. Dirnagl, U., Franzen, D. L., Hemkens, L. G., Presseau, J., Riedel, N., Strech, D., Alperin J. P., Costas, R., Sena, E. S., van Leeuwen, T., Ardern, C. L., Bacellar I. O. L, Camack, N., Correa, M. B., Buccione, R., Cenci, M. S., … Moher, D. (2022). Establishing a core set of open science practices in biomedicine: A modified Delphi study [pre-print]. medRxiv 2022.06.27.22276964. https://doi.org/10.1101/2022.06.27.22276964
Fecher, B., & Friesike, S. (2014). Open science: One term, five schools of thought. In S. Bartling & S. Friesike (Eds.), Opening Science: The evolving guide on how the internet is changing research, collaboration and scholarly publishing (pp. 17–47). Springer International Publishing. https://doi.org/10.1007/978-3-319-00026-8_2
FOSTER. (n.d.-a). Open data. https://www.fosteropenscience.eu/taxonomy/term/6
FOSTER. (n.d.-b). Open science. https://www.fosteropenscience.eu/taxonomy/term/7
franzke, a. s., Bechmann, A., Zimmer, M., Ess, C., & the
Association of Internet Researchers (2020). Internet research: Ethical guidelines 3.0.
https://aoir.org/reports/ethics3.pdf
Gregory, K., Ninkov, A. B., Ripp, C., Roblin, E. Peters, I., & Haustein, S. (2023). Tracing data: A survey investigating disciplinary differences in data citation [pre-print]. Zenodo. https://doi.org/10.5281/zenodo.7555266
ICPSR. (n.d.). Recommended informed consent language for data sharing. https://www.icpsr.umich.edu/web/pages/datamanagement/confidentiality/conf-language.html
Ninkov, A., Gregory, K., Ripp. C., Morissette, E., Harper, L., Peters, I., Tayler, F., & Haustein, S. (2022). Research data management plan for the meaningful data counts project (v.2). Zenodo. https://doi.org/10.5281/zenodo.6473351
Tenopir C., Rice, N.M., Allard, S., Baird, L., Borycz, J., Christian, L., Grant, B., Olendorf, R., Sandusky, R.J. (2020). Data sharing, management, use, and reuse: Practices and perceptions of scientists worldwide. PLOS ONE, 15(3): e0229003. https://doi.org/10.1371/journal.pone.0229003
Wilkinson, M. D., Dumontier, M., Aalbersberg, Ij. J., Appleton, G., Axton, M., Baak, A., Blomberg, N., Boiten, J.-W., da Silva Santos, L. B., Bourne, P. E., Bouwman, J., Brookes, A. J., Clark, T., Crosas, M., Dillo, I., Dumon, O., Edmunds, S., Evelo, C. T., Finkers, R., … Mons, B. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data, 3(1), 160018. https://doi.org/10.1038/sdata.2016.18
- Nineteen open science practices in biomedical fields were identified in a recent Delphi study. The authors would like to thank David Moher and the Centre for Journalology at the Ottawa Hospital Research Institute for ongoing conversations about open science practices across disciplines. Cobey, K. D., Haustein, S., Brehaut, J. Dirnagl, U., Franzen, D. L., Hemkens, L. G., Presseau, J., Riedel, N., Strech, D., Alperin J. P., Costas, R., Sena, E. S., van Leeuwen, T., Ardern, C. L., Bacellar I. O. L, Camack, N., Correa, M. B., Buccione, R., Cenci, M. S., … Moher, D. (2022). Establishing a core set of open science practices in biomedicine: A modified Delphi study [pre-print]. medRxiv 2022.06.27.22276964. https://doi.org/10.1101/2022.06.27.22276964 ↵
- Parts of the section on intellectual property are an adaptation of M. Brunet, J. Hatherill & C. Ripp. 2021. Open Access to Knowledge Part 2: Sharing Your Research Data, University of Ottawa Library, CC BY 4.0, http://hdl.handle.net/10393/43309 and M. Brunet & T. Rouleau. 2021. Copyright and Research Data at uOttawa – FAQ, University of Ottawa Library, CC BY 4.0, https://copyright.uottawa.ca/sites/copyright.uottawa.ca/files/copyright_and_research_data_faq.pdf. ↵
- See Creative Commons Frequently Asked Questions about data and CC licences, https://wiki.creativecommons.org/wiki/Data#Frequently_asked_questions_about_data_and_CC_licenses. ↵
the movement to make scientific research, data, and dissemination transparent and widely accessible without barriers, financial or otherwise.
online, free of cost, accessible data that can be used, reused, and distributed provided that the data source is attributed.
a formal description of what a researcher plans to do with their data from collection to eventual disposal or deletion.
sources of information or evidence that have been compiled to serve as input to research.
a term that describes all the activities that researchers perform to structure, organize, and maintain research data before, during, and after the research process.
the free, immediate, online availability of information coupled with the rights to use this information fully in the digital environment.
a long-lasting reference to a digital object that gives information about that object regardless of what happens to it. Developed to address “link rot,” a persistent identifier can be resolved to provide an appropriate representation of an object whether that objects changes its online location or goes offline (CODATA, CC BY 4.0).
guiding principles to ensure that machines and humans can easily discover, access, interoperate, and properly reuse information. They ensure that information is findable, accessible, interoperable, and reusable.
the ability of data or tools from non-cooperating resources to work with or communicate with each other with minimal effort using a common language.
data about data; data that define and describe the characteristics of other data.
a set of functions and procedures provided by one software library or web service through which another application can communicate with it.
a plain text file that includes detailed information about datasets or code files. These files help users understand what is required to use and interpret the files, which means they are unique to each individual project. Cornell University has a detailed guide to writing README files that includes downloadable templates (Research Data Management Service Group, n.d.).
the format’s technical specifications are public; the information that helps to understand its operation and its structure are accessible.
a grouping of elements intended to describe a resource. For each element, the name and the semantics (the meaning of the element) are specified. Content rules (how content should be phrased), representation rules (e.g., capitalization rules), and allowed element values (e.g., from a controlled vocabulary) may be optionally specified, but this is not always the case.
a name (not a location) for an entity on digital networks. A DOI provides a system for persistent and actionable identification and interoperable exchange of managed information on digital networks. A DOI is a type of Persistent Identifier (PID) issued by the International DOI Foundation. This permanent identifier is associated with a digital object that permits it to be referenced reliably even if its location and metadata undergo change over time (CODATA Research Data Management Terminology, CC BY 4.0).
Tri-Council Policy Statement: Ethical Conduct for Research Involving Humans. The primary harmonized framework that accounts for Canadian-wide laws and broader ethical paradigms applicable to the rights of human participants in research