File hacks: naming and organizing simplified

Billie Hu

doi:https://doi.org/10.71548/zeh9-t585

Section 2 – Responsibilities and Resources in Research Partnerships

File hacks: naming and organizing simplified

Billie Hu

Choosing the right file format is vital for preserving research data long-term. Non-proprietary formats like CSV for spreadsheets, TXT for text documents, and TIFF for images ensure that data remains accessible and usable in the future. “Proprietary formats”—file formats owned by a company—such as an .xlsx from Excel or a .docx file from Word may become obsolete, making it difficult to retrieve and use data after several years. For example, a CSV file for tabular data is preferable over an Excel file because CSV is a plain text format that can be read by almost any software. Table below shows some recommended formats for common file types.^[1]

File Type	Recommended Formats	Avoided Formats
Text	XML, ASCII, TXT, PDF, LaTeX, .docx	.doc, .docx .wpd
Images	TIFF, JPEG2000, PNG, JPEG/JFIF	RAW, Adobe Photoshop, PDF
Video	MOV, MPEG-2	.wmv
Audio	PCM, WAVE, DSD	CD, DVD, .m4p, .mp3, xmi, .mod
Dataset	CSV, TSV, .db, .sqlite, Shapefile, .xlsx	.xls, .xlsx
Web Data	JSON, XML, HTML

Proper file naming conventions are essential for organizing and locating files efficiently. A good file name should include meaningful components like the project name, date, and version number.

For instance, a file name like “2025-06-08-YouthOutreach_GuelphStaff_TM_V2.csv“ clearly indicates the project, date, and version.

Date: 2025_06_08 (collection date)
Project Name: YouthOutreach
Short Description: GuelphStaff
Name: TM (Tracy MacDern)

Avoid using special characters and spaces; instead, use underscores or hyphens. Consistent and descriptive file names help prevent confusion and ensure that files can be easily identified and retrieved.

Only use:

Alphanumeric characters (alphabetic characters and Arabic numerals)
Use _(underscore) to separate words/numbers (snake case).
E.g. this_is_snake_case
Use capitalization to separate words/numbers (camel case).
E.g. ThisIsCamelCase
Do not use spaces and other special characters, such as: ~ ! @ # $ % ^ & * ( ) ` ; : < > ? . , [ ] { } ‘ “ |

README files are critical for documenting research data. They provide necessary context, such as the purpose of the data, methods used to collect it, and explanations of any codes or abbreviations. A well-written README file ensures that you and others can understand and use the data accurately. For example, a README file might include sections like “Project Description,” “Data Collection Methods,” “File Naming Conventions,” and “Data Dictionary.” This documentation is especially important when sharing data with collaborators or for future reuse.

Core elements of any README include:

Contact information for the researcher(s)
The use license for your data (unless that is included in a separate file)
Your data collection methods (protocols, sampling, instruments, coverage)
The structure of files
Naming conventions for files, if applicable
The sources you used
Your quality assurance work (data validation)
Any data manipulations or modifications
Data confidentiality and permissions
The names of labels and variables and explanations of codes and classifications -i.e. a data dictionary or a codebook

Incorporating file formats, naming conventions, and README files significantly enhances data organization, accessibility, and reuse. For instance, specifying the use of CSV files, a naming convention like “ProjectName_Date_Version,” and detailed README files helps maintain the integrity and usability of the data. Proper documentation and structured data management practices are key to ensuring that research data remains valuable and usable over time.

Eugene Barsky, Billie Hu, and Andrew Li, “File Formats For Data Curation,” Research Data Management, https://ubc-library-rc.github.io/rdm/content/02_file_formats.html ↵

File hacks: naming and organizing simplified

License

Digital Object Identifier (DOI)

Share This Book