Deep Methods
Archives are also affected by advanced computational approaches, including artificial intelligence and machine learning. With the digitization of archives and born digital material, archives are being transformed into data, to which digital techniques can be applied. As a result, these methods are being used for recordkeeping activities on an increasing amount of material. New research in this area focuses on the automation of recordkeeping activities and processes, archive organization and access, new forms of digital archives, and developing the theoretical foundations of applying these techniques to archives, as well as professional considerations in critically integrating these approaches into archival systems and practice (Colavizza et al., 2021). As an example, convolutional neural networks (CNNs), a recent advanced machine learning technique, can be used to enable distant viewing by categorizing and analyzing digitized historical visual resources. Analogously to distant reading, distant viewing is the exploration and analysis of large collections of digitized historical images. Distant viewing allows researchers to perform large-scale analysis of parts of digital archives that have not been sufficiently explored, and to gain a deep understanding of visual trends in an archive. Consequently, millions of images in digitized media and documents can be explored using non-textual search methods (Wevers & Smits, 2020).
Some digital humanities researchers have raised questions with some applications of advanced computational techniques because these methods may produce misleading or incomplete results and conclusions. These scholars caution against uncritical adoption of these techniques in humanities work. For instance, a recent publication questions the application of a specific statistical/machine learning method (principal components analysis, or PCA) to literary studies, wherein the method was used to support the attribution of parts of Arden of Faversham to William Shakespeare and parts of the Henry VI trilogy to Christopher Marlowe. PCA is a widely used data analysis and dimension reduction technique that transforms the complex, high-dimensional data into a less complex form. However, in some cases, the transformed data does not have a clear and intuitive relationship to the original data. In the example described here, the approach has been criticized for the interpretation of the PCA results, leading to unreliable test results (Rizvi, 2021).
Computational techniques enhance the ability for digital humanities researchers to access, search, and analyze the digital resources they used in their work. These resources include digitized texts, images, video, audio, and other artifacts. Text analysis and natural language processing (NLP) have benefitted from these techniques, which include state-of-the-art deep neural networks (DNNs). DNNs are a type of artificial neural network (ANN) machine learning algorithm which can be “trained” to “learn” to classify input and answer questions from examples. ANNs (and therefore DNNs) are mathematical models which iteratively generate a mapping between labeled input and desired outputs through mathematical optimization methods. One of the best-known applications of ANNs is handwriting recognition. Small images of characters are labeled with the character. That is, an image of a handwritten “A” is presented to the network with its label, “A”. However, networks require a large amount of training data to adequately “learn” this mapping, but the availability of training data and adapting the technique to humanities research has proven challenging. Research into DNNs is active and ongoing because of the potential for considerable benefits conveyed by DNNs in addressing spell checking, language detection, entity extraction, and author detection, all important NLP tasks in digital humanities research (Suissa et al., 2021).