Section 2 – Responsibilities and Resources in Research Partnerships
File hacks: naming and organizing simplified
Billie Hu
Choosing the right file format is vital for preserving research data long-term. Non-proprietary formats like CSV for spreadsheets, TXT for text documents, and TIFF for images ensure that data remains accessible and usable in the future. “Proprietary formats”—file formats owned by a company—such as an .xlsx from Excel or a .docx file from Word may become obsolete, making it difficult to retrieve and use data after several years. For example, a CSV file for tabular data is preferable over an Excel file because CSV is a plain text format that can be read by almost any software. Table below shows some recommended formats for common file types.[1]
| File Type | Recommended Formats | Avoided Formats |
| Text | XML, ASCII, TXT, PDF, LaTeX, .docx | .doc, .docx .wpd |
| Images | TIFF, JPEG2000, PNG, JPEG/JFIF | RAW, Adobe Photoshop, PDF |
| Video | MOV, MPEG-2 | .wmv |
| Audio | PCM, WAVE, DSD | CD, DVD, .m4p, .mp3, xmi, .mod |
| Dataset | CSV, TSV, .db, .sqlite, Shapefile, .xlsx | .xls, .xlsx |
| Web Data | JSON, XML, HTML |
Proper file naming conventions are essential for organizing and locating files efficiently. A good file name should include meaningful components like the project name, date, and version number.
For instance, a file name like “2025-06-08-YouthOutreach_GuelphStaff_TM_V2.csv“ clearly indicates the project, date, and version.
- Date: 2025_06_08 (collection date)
- Project Name: YouthOutreach
- Short Description: GuelphStaff
- Name: TM (Tracy MacDern)
Avoid using special characters and spaces; instead, use underscores or hyphens. Consistent and descriptive file names help prevent confusion and ensure that files can be easily identified and retrieved.
Only use:
- Alphanumeric characters (alphabetic characters and Arabic numerals)
- Use _(underscore) to separate words/numbers (snake case).
E.g. this_is_snake_case - Use capitalization to separate words/numbers (camel case).
E.g. ThisIsCamelCase - Do not use spaces and other special characters, such as: ~ ! @ # $ % ^ & * ( ) ` ; : < > ? . , [ ] { } ‘ “ |
README files are critical for documenting research data. They provide necessary context, such as the purpose of the data, methods used to collect it, and explanations of any codes or abbreviations. A well-written README file ensures that you and others can understand and use the data accurately. For example, a README file might include sections like “Project Description,” “Data Collection Methods,” “File Naming Conventions,” and “Data Dictionary.” This documentation is especially important when sharing data with collaborators or for future reuse.
Core elements of any README include:
- Contact information for the researcher(s)
- The use license for your data (unless that is included in a separate file)
- Your data collection methods (protocols, sampling, instruments, coverage)
- The structure of files
- Naming conventions for files, if applicable
- The sources you used
- Your quality assurance work (data validation)
- Any data manipulations or modifications
- Data confidentiality and permissions
- The names of labels and variables and explanations of codes and classifications -i.e. a data dictionary or a codebook
Incorporating file formats, naming conventions, and README files significantly enhances data organization, accessibility, and reuse. For instance, specifying the use of CSV files, a naming convention like “ProjectName_Date_Version,” and detailed README files helps maintain the integrity and usability of the data. Proper documentation and structured data management practices are key to ensuring that research data remains valuable and usable over time.
- Eugene Barsky, Billie Hu, and Andrew Li, “File Formats For Data Curation,” Research Data Management, https://ubc-library-rc.github.io/rdm/content/02_file_formats.html ↵