Quantitative Analysis
In addition, in both pedagogy and research, there has been an emphasis on programming skills. In digital humanities courses, for instance, Python and R theory and practices are taught, developed, and nurtured. Very frequently, scholars and students need to “clean”, or preprocess, data or text that has been obtained from any number of sources, including web crawlers in order to convert the data to a useable, analyzable form. In fact, such cleaning tasks may constitute a large part of the analysis effort. As Andrew Goldstone reports of literary studies students: “Here, their efforts at mastering R really bore fruit, allowing them to wrestle big bodies of data into sense” (Goldstone, 2019). Additionally, in many cases, humanists cannot rely on available algorithms and implementations to conduct their work, as many of these tools have underlying biases and assumptions that are often unstated, and therefore the output results from these methods may not be readily interpreted. The result of this is that humanists may need to develop and deploy their own tools. Humanists therefore need to know the important structures and mechanisms of a programming language. However, they also need to privilege tool-building, modeling, and accurately representing the objects studied in humanities scholarship (Goldstone, 2019). Programming skills are therefore undoubtedly required, and enable sophisticated, innovative, and creative thinking about the use of data in humanities scholarship. Without programming knowledge, scholars are limited in their analysis to available software and visualizations, substantially limiting variety and dimensions that have not explicitly been coded into the pre-programmed tool. Goldstone uses the example of the powerful and widely web based Voyant Tools software system for text analysis. The system has the selections of how to convert the text into an analyzable format programmed into it, but, other than the choice of stop words, provides the user with no additional alternatives (Goldstone, 2019). Another problem with relying on pre-programmed tools arises if a particular project is no longer updated or maintained, then users who have depended on the tool will be at a disadvantage. Programming knowledge empowers students and scholars to work with any processing, analysis, tabulation, or visualization, or to create their own for specific purposes. Such knowledge enables subsequent analysis, as data and texts can be transformed to fit the applications. Furthermore, issues of maintenance for pre-programmed tools do not arise.
However, although programming knowledge is crucially important, skill in programming does not substitute for knowledge of quantitative methods. Scholars must first understand what a particular tool does and whether it is relevant to the task at hand. Many advanced methods for text analysis exist, including those which perform complex transformations of data, such as frequency correlation matrices and probabilistic topic models. These methods can be constructed from programming languages relatively easily. However, researchers must determine whether those methods are appropriate or even applicable to the data being analyzed. Exploratory data analysis, often employing interactive visualization, and descriptive and inferential statistics, in which inferences are drawn from data and statistically tested, are skills that digital humanists should master. Computational methods, whether from pre-programmed systems or from tools built by the scholar, only provide quantitative outputs. Such results, however, are of limited use without knowledge of how to interpret these quantitative results (Goldstone, 2019).
Columbia University computational culture scholar Dennis Tenen argues that “Tools are great when they save time, but not when they shield us from the complexity of thought” (Tenen, 2016) (89).