2004 – Present
In the mid-2000s to the present day, digital humanities scholarship continued to increase, with the creative application of existing methods and the development of new tools and techniques, many of which were designed and implemented by humanities scholars. During this time,
new areas became allied with the digital humanities, including classical studies, archaeology, and music, to name only a few (Theron & Wandl-Vogt, 2016). The Internet and the Web not only became integral to humanities scholarship in general, but the Internet of Things (IoT) became the paradigm in which all manner of devices were put online and communicating with the Internet and each other. Democratization of science, citizen science, where non-practitioners create and utilize resources, also emerged as areas of interest. In a very important development, given the quantity of resources that are generated, digitized, stored, and analyzed, as well as links among the various data sources, digital humanities has entered the era of Big Data, which will be discussed in a subsequent section. Consequently, new user interfaces and interactive visualizations are required to assist in navigating these vast quantities of data. In this sense, digital humanities is a driver for further research into human-computer interaction (HCI) and visual analytics, the science of augmenting human knowledge and insight with intuitive, interactive visualizations. HCI is necessary for digital humanities professionals to adopt leading-edge developments and is a necessary consideration in designing these tools (Theron & Wandl-Vogt, 2016). Visual analytics is a large field of study with important applications in digital humanities and will be discussed in detail in a subsequent section.
As an indication of some emerging trends in the digital humanities, some distinctive topics are emphasized. These topics were discussed in the Fourth International Conference on Technological Ecosystems for Enhancing Mutliculturality specifically in the “New trends in Digital Humanities” track of that conference, leading to some important focal points for discussion.
- The digital humanities discipline is beginning to align with open science (disseminating the results of scientific research to interested parties and stakeholders outside the scientific community, thereby increasing transparency) and citizen science (research facilitated by and conducted in full or in part by people outside of a scientific specialization or profession). However, challenges unique to the digital humanities, such as data management and HCI, must be addressed for open/citizen science in the digital humanities to be successful.
- New improvements in human-computer interaction and visualizations (specifically, visual analytics) will be adapted and employed to facilitate exploration of the exponentially growing amount of data being used in the digital humanities, partially due to ubiquitous computing and Internet-enabled devices.
- New areas are being incorporated into the digital humanities, most notably cultural analytics, which will be discussed in detail in a subsequent section.
- Curriculum design for post-secondary education in the digital humanities, and specifically what topics are most relevant, are still being debated . As in the early years of humanities computing and the digital humanities, integration and effective use of computational methods is also a major concern.
- More collaborative and distributed work in the digital humanities is needed, especially in light of the complexity of navigating and effectively utilizing digital collections. Consequently, computational infrastructure requirements must be well-understood.
Leading-edge innovations, including data science, visual analytics, machine learning and artificial intelligence, “Big Data” in the digital humanities, and the “quantum digital humanities”, will be discussed in subsequent sections.
Another trend in the digital humanities is the exponentially growing volume of data, as well as the variety and heterogeneity of this data (Hall, 2020). In fact, issues concerning data distinguish the digital humanities from non-computerized humanities research. The study of large-scale texts has become known as “distant reading”, and is facilitated by automated techniques, including word counting.
Advanced approaches to large-scale data include topic modelling, vector-space models, and machine learning (Hall, 2020). Topic modelling is an algorithmic computational approach that is based on the idea that topics, as generally understood, are comprised of words. Words in a text are therefore assigned a likelihood value that indicates the importance of a specific word for a specific topic. The various topics identified in a document are then assigned a likelihood indicating the importance of a specific topic to a document, thereby allowing the analysis of topics in a textual data set. Complex mathematical techniques, such as Latent Dirichlet Allocation (LDA), are used for topic modelling. Specialized libraries in the Python programming language for natural language processing (NLP) implement LDA and other topic modelling algorithms (Hall, 2020).
In the digital humanities, vector-space models are employed to represent word semantics models, which relate to the meanings of words. Such meanings are generally identified by assessing the words that occur in the proximity of the word being analyzed – that is, the co-occurrence of words, or words that form a pattern of clustering together (Hall, 2020). Word2vec is a popular model for semantics in the digital humanities. It can be implemented with artificial neural networks, an artificial intelligence and machine learning technique discussed below, and in detail in a later section. Python implementations also exist in specialized libraries.
Machine learning (ML) is a vast field in computational science and data science and is becoming increasingly important in many disciplines. It is especially critical for “big data” analysis. The basic idea behind ML is that algorithms can self-adapt, depending on the input that is being processed. In simple terms, algorithms can metaphorically “learn” from data, and identify patterns and anomalies with minimal or no human guidance or manual adaptation. Machine learning is often considered to be a subfield of artificial intelligence, although the former relies heavily on advanced statistics and computational techniques. ML also has a firm mathematical foundation, and the correctness of various ML algorithms can be proven analytically. ML algorithms are implemented in a variety of programming languages, but Python is the preferred choice for implementations, given that its learning curve is not steep, because of its power and efficiency, and because of a vast amount of expertise, software, and libraries available for it. Most of these libraries are freely available and downloadable. Advanced libraries such as TensorFlow facilitate development of ML systems.
As indicated above, artificial neural networks (ANNs) are widely used ML models. ANNs are self-adapting systems consisting of an input layer consisting of the data that is to be “learned”, one or more “hidden” computation layers that connect the input to the output, and an output layer for results, consisting of “labels”. The goal is to transform input data into labels. Therefore, classification of data is an important application of ANNs. Recent research into advanced machine learning has focused on deep learning that utilizes deep neural networks, or networks having multiple hidden layers. Most ANNs are “trained”, or adapted, in a “supervised” fashion, in which input/label pairs are presented to the network, and the ANN “learns” a mapping from the input to its corresponding label. For simplicity, the following discussion will focus on ANNs with one hidden layer. In basic terms, an ANN adjusts “weights” between the various layers to minimize the error between its output and the input/label pairs presented to it during training. The adaptation process can occur with a variety of mathematical optimization approaches, but the most common is an optimization approach called backpropagation, in which weights are adjusted to minimize the error between the output of the network and the expected label, given a specific input, until an accepted error tolerance is reached, or the network stabilizes (adaptations are minimal in each iteration). Although ANNs have proven to be beneficial in a vast number of applications, a large amount of data, in the form of input/label pairs, must be available for training if the network is to successfully “learn” an input/label mapping with minimal error.
Artificial neural networks are used in Word2vec word semantic modelling, as discussed above. They are also used to process vast amounts of data, such as occurs in typical problems in the digital humanities. They are therefore useful tools for digital humanities researchers and practitioners that deal with large quantities of heterogenous data (Hall, 2020). ANNs and other machine learning techniques will be addressed in detail in subsequent sections.
Finally, other “digital” fields of study, such as the digital classics and digital archaeology, are being accepted as subfields of the digital humanities. Digital classics employs computational methods and techniques used in the digital humanities for studying the ancient world and its artifacts, and applies visualization approaches to represent maps, digital photographs, and other data related to geographic information systems (GIS). Digital archaeology, like the digital classics, also employs tools and techniques developed for the digital humanities, including GIS, and also makes use of advanced 3D modeling and visualization techniques. Both GIS and 3D visualization will be discussed in a subsequent section.