Data management in action

Isaac Pratt; Anneliese Eber; Cathy Paton; Kaelan Caspary

doi:https://doi.org/10.71548/10nj-yb65

Section 3 – Data Management Plans

Data management in action

Isaac Pratt; Anneliese Eber; Cathy Paton; and Kaelan Caspary

This chapter is a companion to the Template for building a Data Management Plan through conversation. Where the template is a tool for working together and creating structure, agreements, and documentation, this resource includes examples of how data is managed and guidance on answering the questions in the template.

Introduction

What does it mean for something to be “data”?

Person collaging — Rachel Glaves, “Collage party at Million Fishes,” CC-BY 2.0: https://www.flickr.com/photos/pepperlime/3227196507/

There are many ways in which “data” can be conceptualized. One example of a collaborative project between researchers and community participants is a project that includes a focus group in which participants create collage art. Depending on who is determining what counts as data, as well as what the project is intended to achieve, some examples of “data” in this scenario might include:

transcripts of conversations
the pieces of collage art themselves
the observations that are made of participants
photo documentation of the process of creating the collages, and/or
hand-written reflections from a group discussion

This example showcases the many different ways that data might be understood and interpreted, and points to the need to define what the “data” will be before embarking on the project together.

Data collection

Collecting people’s data requires their consent

Collecting data is a crucial part of any research project; however, it is important to consider what your data is and whose consent is required to collect data for a project. This consent may look different depending on the type of data you’re collecting or if you’re reusing data that has previously been collected. For example, if you are collecting new data from a community, you may need individual consent for data collection in addition to broader community consent to do research about the community.

The Tri-Council Policy Statement on Ethics (TCPS2) sets out the principles of ethical conduct that university or college affiliated researchers must follow in Canada when they are collecting and using human data. Each University or College has a Research Ethics Board (REB) that oversees the implementation of TCPS2 and approves or denies permission to conduct studies. However, REB approval is not the beginning and end of an ethical approach to data collection. Some Indigenous communities may have their own ethics offices or committees, such as the Six Nations Ethics Committee. While REBs only apply to institutionally affiliated researchers, community organizations can apply for research ethics support and review at the Community Research Ethics Office, a non-profit organization in south-western Ontario. Chapter 3 of TCPS2 covers consent, and has specific guidance on when and how researchers should collect consent for re-use and sharing of data. Some example language for collecting consent for data sharing can be found in the Portage Sensitive Data Toolkit for Researchers Part 3: Research Data Management Language for Informed Consent.

Additionally, if you are reusing data collected by others, you will need permission from the original data collectors, and you should make sure that data was collected with appropriate permission and consent from individuals and the community. If you are concerned about whether something was collected appropriately, there are a few indicators you can look at. If the study involved a university affiliated researcher, look for or ask for a copy of the consent form used by the researcher, or ask for their ethics protocol number. The consent form should specifically ask participants to consent to data being re-used.

Documentation

Why is it important to create documentation?

Documentation is a critical part of a research project because it records everything about a research project from the beginning to the end. This helps everyone understand everything they need to know about a project. For example, if you are reusing data that was collected by researchers from another institution or organization the documentation will indicate everything you need to know about the data, in addition to contact information should you need to contact the authors for additional information about the data.

Robust documentation can help:

Other people to find, reuse (should reuse be permitted) and cite the data.
Easily integrate your data with other collections of data (sometimes referred to as datasets), should you choose to integrate your data into larger datasets that may have been collected by people at different institutions and organizations. This allows for broader analysis across larger groups.
Ensure data collected by different people for the same project is collected consistently, making analysis easier. This creates higher quality data which is essential for being able to answer your research questions.

One example is research on education and student outcomes in schools. This kind of research often combines data collected at different schools and across multiple school boards. In those cases, ensuring data is collected in the same way (standardized) at different schools makes sure that data from the various places is directly comparable. The research team can then integrate data from multiple schools and school boards into a larger dataset to study student outcomes across a larger area like a whole province, multiple provinces, an entire country, or multiple countries.

Examples of documentation for community research

Documentation for a community-based research project might include:

Basic information about your project, such as the title of the project and the funder
An organizational chart or RACI (Responsible, Accountable, Consulted, and Informed) chart
Your research project design, such as the background information, research questions, and theoretical framework
Descriptions of the data variables you have collected (e.g., if you’ve collected data about weight, make sure to describe what measurement system you used; sometimes shortened variable names will be used for headings and these should be described and written out in full)
The decisions made during data collection and analysis (and why you made them)
The project methodology and a description of the processes and materials

Storage and security

Use educational institutional storage where appropriate

The decision on where to store data may be influenced by some team members’ access to existing data management platforms. Educational institutions, for example, provide free access to cloud storage platforms and it may be easy for the research team to use those platforms for research data. However, keep in mind institutional platforms are set up for institutional users and are not always easy to access for team members outside the university. Large community organizations may also have IT departments and data storage platforms that they run or license. Make sure that everyone involved in the research agrees on where data is being stored.

Alternatively, you might need to apply for additional funding to ensure that you have some sort of reliable data management system.

Human data is sensitive data

Data about people often contains sensitive information that people would like to keep private and confidential. Human data, commonly referred to as personal information or personally identifiable information, may be defined differently depending on where and with whom you are working. It includes things like someone’s age and place of residence as well as things like IP addresses or browser search histories. As the holder of that information, you and the research team will have a responsibility to protect that information and keep it secure. Keeping data secure means a few things here:

Sensitive data should be accessible only to team members who need access to it to perform their work.
The raw, identifiable, dataset should not be used for data processing and analysis. Rather, a de-identified dataset should be used.

When determining if data is identifiable, consider both direct identifiers, information like a name or email that can be used to directly identify someone, and indirect identifiers, information like age or gender that is about a person and in combination could be used to identify a particular person.

Let’s consider an example. A community shelter was interested in knowing more about how people engaged with their services so they could prioritize essential needs. They decided to collaborate with a researcher to collect and analyze data. They sat down with people and had them fill out a paper survey or filled out the survey through a conversation with the individuals. The data they collected included personal information like age, gender, and asked questions that were very personal.

When managing this kind of data for analysis, we can lower the risk level of the data by removing the identifiers from the data before analysis. This process is called de-identification, and by separating the information from the person it lowers the sensitivity of the data and the risk of exposing that person to additional harm. Typically, at this stage we only need to do a basic de-identification here, replacing names with codes or participant IDs and replacing other direct identifiers in other questions with non-identifiable information. Participant names and the connection to participant IDs can be stored in a ‘linking file’ which should be encrypted. McMaster University has a resource on encrypting files that may be helpful. A more comprehensive de-identification process is often used prior to sharing data with others after the project is complete. Make sure when you de-identify your data that you are not losing vital information that could affect your analysis.

Let’s return to the collage art example from the introduction and discuss what sensitive information is contained within the data and materials that were collected and how they might be de-identified.

Material	Identifiable aspects	De-identified
Participant lists	Participant names contact information	Pseudonyms Separate contact information
Survey responses	Personal information including age, gender, income, area of residence	Numbers like age can be generalized into age ranges Some data may not be able to be de-identified but can be aggregated for release
Focus group audio/video recordings	Voices of participants, Video images of participants	Blur faces in video images Use a voice changer on audio (Audio/Video recordings are difficult to de-identify)
Focus group transcripts	References to participant names or personal details (job title, family life, home) made during conversation	Remove or generalize personal details (i.e.: employer could be changed from ‘McMaster’ to ‘University’)
Collage art pieces	Signatures	It may not be possible to de-identify signatures while respecting participant authorship and intent
Observations and notes on how participants made art	References to participants by their names	Replace names with pseudonyms
Photographs of the event	Photos of people’s faces	Choose angles that do not show faces or distinguishing features Blur faces
Reflections handwritten by participants	Handwriting Personal details	Use digitized copies where possible Remove or generalize personal details
Digitized copies of reflections	Personal details	Remove or generalize personal details

There is more about making data anonymous on the McMaster RDM Website.

Working with ‘physical’ data: hard copy records

When the data collected during a project is physical (e.g., artifacts, documents, or artwork created by participants), decisions about storage may depend on available resources and practical considerations. For instance, it’s important to determine who has the space to safely store the data for an extended period and how the research team can access it if the person storing it is unavailable.

Additionally, the long-term care of physical data requires thoughtful planning. Consider factors such as the storage environment (e.g., humidity, temperature, light exposure) and any necessary upkeep (e.g., cleaning, rotating materials) to ensure the data remains intact and usable over time.

Maintenance, care, and archival

Storage systems

If you do use storage systems affiliated with an educational institution, remember that access to these platforms and services often depends on affiliation with an institutional user. Access to these systems may be cost-effective and more secure compared to commercial options, but obtaining support can be more difficult as you may need to work through an institutional contact on the research team. Processes like adding and removing users will take more time as you may need to obtain institutional credentials for each person who needs access.

Managing responsibilities

In a research project involving community participants, the participants may create materials or share reflections as part of the process. For instance, participants might create physical artifacts that they keep, while the project team is responsible for digitizing these artifacts and storing them securely. Additionally, the team may collect written or verbal reflections from participants about their experience.

This example underscores the importance of shared responsibility within the research team for organizing and safeguarding project materials and data. Clear roles and processes ensure that data is managed responsibly and ethically throughout the project lifecycle.

Sharing, ownership, and reuse

What does your long-term plan look like?

Storing datasets after a project is completed can be difficult, especially when managing digital assets. In some cases, when there is a collaboration between researchers at an institution and a community, the research group may look to store data on institutional servers and resources. This may be fine for some research projects, but for others, there may be additional logistical considerations to think about. For example, if data is stored on institutional servers, how will community members maintain access to that data into the future, and how will new community members be granted access? This may look like having dedicated server space for community data can only be accessed by community members and approved researchers, or community members may need institutional logins to gain access to data. Both options come with logistical challenges that will need to be addressed to ensure community members maintain access to their data.

Who owns data from people?

Even with the best intentions, data can sometimes be used in ways that are unhelpful or even harmful to the communities it represents.

Consider this scenario: A community group collects and publishes statistics about their own community, including data related to race. While the data is not inherently sensitive, it is made publicly available. Later, a different organization or institution uses this data in ways that the original community finds harmful—such as justifying increased surveillance of the community.

This example highlights the importance of thoughtfully considering how and where data is shared and whether it should be licensed in a way that limits misuse. It also underscores the broader ethical responsibility involved in managing data, especially when it pertains to communities that experience marginalization and systemic oppression.

We’ll get into this later! There is more on data sharing, ownership, and reuse in the Data Deposit Chapter of this book.

Data management in action

Introduction

What does it mean for something to be “data”?

Data collection

Collecting people’s data requires their consent

Documentation

Why is it important to create documentation?

Examples of documentation for community research

Storage and security

Use educational institutional storage where appropriate

Human data is sensitive data

Material

Identifiable aspects

De-identified

Participant lists

Survey responses

Focus group audio/video recordings

Focus group transcripts

Collage art pieces

Observations and notes on how participants made art

Photographs of the event

Reflections handwritten by participants

Digitized copies of reflections

Working with ‘physical’ data: hard copy records

Maintenance, care, and archival

Storage systems

Managing responsibilities

Sharing, ownership, and reuse

What does your long-term plan look like?

Who owns data from people?

License

Digital Object Identifier (DOI)

Share This Book