Techniques in Close Reading and Distant Reading
In contrast to close reading, with its rigorous and careful scrutiny of textual content, the goal of distant reading is forming an abstract view of the global features in one or more texts (Jänicke et al., 2015). Close reading is sometimes problematic, it is argued, because it necessarily concentrates on a vanishingly small part of the literary output of a culture, time period, or geographic region, and therefore, thousands of texts remain unread (Oberhelman, 2015). In this sense, it is “microscopic”. Part of the goal of distant reading was to recover this “great unread” (Underwood, 2017). Distant reading, however, is “macroscopic”. As originally conceived by Moretti, distant reading was not meant to replace close reading, as the former was to be used to analyze secondary literature as supplemental to direct, careful analysis of primary literature.
Distant reading incorporates a quantitative dimension facilitated by computational and data analysis techniques. Distant reading is distinguished from its pre-cursors by its use of methods from the computational sciences, rather than from the social science (Underwood, 2017). It uses techniques from corpus linguistics (studying language from its corpora (sing. Corpus), which are large, structured collections of texts), information retrieval, in which digital resources are searched and obtained from a large collection of such resources, and machine learning (Underwood, 2017). Machine learning has a connection to artificial intelligence and constitutes a very large area of study that will be covered in subsequent sections and courses. For the present purposes, however, it suffices to note that machine learning algorithms are adaptive algorithms that improve their performance through the data on which they act, and specialized optimization (“learning”) approaches. Machine learning algorithms use sample data, called training data, as their input to generate a mathematical model for prediction, classification, or decision making. The resulting model is not programmed to make these predictions, classifications, or decisions. These capabilities emerge from the model based only on the training data.