Digital Forensics
The considerations of the processes necessary to ensure transparency from a user’s viewpoint is one of the bases of digital forensics. Although it is often employed as legal term, digital forensics, in the context of the digital humanities, refers to the analysis of data stored in digital media, and the physical media on which these data reside. From an historical perspective, digital forensics provide a crucial source of information about digital resources, including those stored in digital archives or archives of digital assets.
Digital forensics tools improve access to legacy media, such as discontinued disk drive hardware or other storage media. Using the metadata recovered with these tools, digital forensics also enables the reconstruction of events from these devices. Modifications to files, timestamps, and creation and access data are all valuable sources of information that provide new insights, complementing the content acquired from the files themselves (Bartliff et al., 2020).
Digital forensics therefore has the potential to open new lines of inquiry directly from digital resources. Furthermore, game media provide historical perspective and promote new lines of inquiry. For instance, Matthew Kirschenbaum, a leading digital humanities scholar and Professor of English and Digital Studies at the University of Maryland, used digital forensics tools to investigate the code of a game found on a floppy disk – a predecessor to CD-ROM storage – and discovered that the disk contained code that was “deleted”. (The game, “Mystery House” from 1980 is described here, and here)
In digital media, data are not technically deleted or physically removed, but the parts of the physical media on which the old data reside (blocks or addresses on the disk) are marked by the operating system to be overwritten. In other words, if the “deleted” data is not overwritten by new data, it is still readable with specialized digital tools. Applying digital forensics techniques, Kirschenbaum discovered that the game on the disk was a modification of another game originally stored on that disk. That is, the new game was created from a copy of an older game, a realization that spurred new questions and focal areas for new research (Owens & Padilla, 2021).
In another recent example relevant to the digital humanities, digital forensics and data exploration were applied to the archive of digitized materials of American experimental filmmaker Stephen Dwoskin (1939-2012) (Bartliff et al., 2020). The archive contains correspondence, working materials, unpublished writings, and photographs, as well as twenty legacy hard drives containing additional writings, emails, and works in progress. The purpose of the research was to: (1) reconstruct aspects of Dwoskin’s biography and professional history from the available datasets, using the tools of digital forensics; (2) delineate the stages in Dwoskin’s creative development; (3) reconstruct the technological environment in which Dwoskin worked. These questions were explored by extracting a “content-driven timeline” stemming from available digital assets in the archive. To answer them, the researchers developed a complex workflow consisting of many imaging, preprocessing, extraction, reporting, transformation, exporting, error-checking, and analysis steps. The workflow can be summarized with the following major groupings of tasks. The hard disks were imaged. That is, a bit-by-bit, sector-by-sector copy of the disk was made to ensure that data were not changed during the analysis, and to enable capturing data inaccessible to the operating system, such as deleted files. Timeline generation was performed by careful analysis of the timestamps on the disk. Specialized forensics tools were employed to obtain the metadata concerning activities (creation, modification, deletion) on the files. The metadata are analyzed relatively easily. Content analysis consists in correlating the acquired timestamps with information about the content, which includes file names, file extensions, folder/directory names, and the actual textual, graphical, and video material residing on the hard drives. File extensions indicate files formats, whereas file and folder names suggest how material were conceptually organized. Because folders are generally used to group related files together, the folder structure on the archive provides clues as to how Dwoskin organized his topics and themes.
The researchers continued their investigations through file type clustering, assisted with tag cloud visualizations. In complementing the content analysis already described, file and folder names combined with information extracted about the file types can potentially reveal information about the technology that was used in Dwoskin’s projects. The combination of data acquisition using the tools and techniques of digital forensics, combined with data exploration techniques, such as content analysis and metadata analysis constitute essential components in the infrastructure for digital legacies (in this specific example, the digital legacy of Stephen Dwoskin) that allows a fuller and more enriching view of the archive. The researchers advocate applying statistical techniques, Bayesian reasoning, “to identify dependencies and relationships between different types of content, timelines, and conditions” (Bartliff et al., 2020).