Transforming data management into scholarly and creative work
➔Data Flow Model: Critical Analysis
Researcher questions:
Over the many years of my project, how do I make sure I can find all the files, digitized images, and other digital things that I need to write my papers, make podcasts, make conference presentations, etc.? I want to share files with my community and research partners but don’t want them to go public? My research assistants have left the project and I don’t know how to access the files they worked on?
? ? ? ! *
*Data Flow model questions that help you think through this complexity:
- What is the anticipated use of the data in the research team — is there a need for access to files between multi-institutional partnership collaborators?
- Does file storage balance security with file sharing across collaborators?
Once you have collected and processed the data, it’s time to ensure everything can be easily accessed by the right organizations and people so that you can do your scholarly and creative work. Different files need to be accessed by different members of your research team or partnerships, and not all team members will access the files for the same purpose or from the same location. For many DH researchers working in a team, their “critical analysis” phase is dependent upon well-organized, and accessible files. The data needs to be curated, described, and made accessible for theories, histories, narratives, discourses, or dialectics to be gleaned.
Before considering a specific technical solution to sharing files, you must have a system-agnostic file management strategy which involves having:
1) File and directory naming conventions so that individual files can be recognized even if removed from their original context
2) Ensuring that references to filenames in data/metadata files are valid
3) Documenting the conventions in a README file (yes, the README file again!) so future project collaborators can learn the file management plan and have documentation to update as the project changes scope.
Also, consider what differences there may be between the research data that your team shares and processes for the duration of the project, versus the format of data that you ultimately share out as part of publication and preservation. Differences may include varying license and re-use restrictions, but also practical matters, such as size of data, format, and bandwidth requirements. You need to securely store, version control, and backup files, but also want a system that allows restricted access while RAs and researchers work on them. Also, confidentiality and copyright may restrict access to files in different ways. You must also look at the various sharing practices between partnerships or institutions: if data is located at one institution in a multi-institutional project, are there issues of access for team members that are not based at those institutions? You may need to collaborate with IT support teams in different locations – this takes time and planning!
In your larger research plan, take time to pause and assess whether the data management and sharing solutions you worked out at the start of the project still fit the scope, questions, and sources that your research project has evolved to encompass. It may be that you now have many more (or much larger) files to manage; your original file-naming convention may have assumed all your objects were coming from one institution, and now you are incorporating items from other institutions and need to now track those varied sources explicitly; you originally only worked with collaborators within your institution, and now you want to bring in external collaborators, etc.
How will you manage onboarding of new, incoming RAs and everyone else new to the research project? Take time to do an orientation of your research group, and have a common set of “guidelines” or “rules” about what is acceptable use of the research materials, and what isn’t. Answer common questions and “worst case scenarios”. For example, what if an RA thinks it is fine to copy research files to their personal laptop. Is that an acceptable practice? Who can they share those files with? Or not? Explicitly stating expectations allows all to participate with a shared understanding. Sometimes a project binder (a soft or hard version), created by the project lead and unique to each workflow, in which all project protocols and best practices are recorded, can be the most practical way to go about this.
Note: Research Assistants have unique opportunities on digital humanities projects to build technical skills and knowledge of metadata entry. These are not only excellent skill sets for them to graduate with, but it is also likely how a large percentage of the labour needs of a digital humanities project will be resourced. It is important to view this labour as a significant contribution to the scholarly work in future publications even if/after their contracts have expired. We recommend using a protocol such as the Credit Taxonomy or The Student Collaborator’s Bill of Rights to acknowledge your student labour in co-authorship or other appropriate roles; anonymous skill-building does not go far enough in helping them build their scholarly careers.
Concordia has solicited Compute Canada to join SpokenWeb as an institutional partner; as a partner, this national data infrastructure service provides cloud storage space and basic services to all SpokenWeb partners who choose not to use local infrastructure options. Other SpokenWeb partners, such as UBCO, have chosen to use their own local institutionally provided cloud storage infrastructure, but contribute metadata to the shared catalogue, SWALLOW.
The SpokenWeb “Repository” currently holds files from Concordia, SFU and uAlberta, and each level of the file structure has controlled access through user passwords and permissions. If members want to search other institution’s holdings, they search the metadata describing files in SWALLOW.