Interpretability of Machine Learning and Computational Techniques
Machine learning techniques are advancing at a rapid pace and are becoming more powerful and robust. However, with this improvement comes a corresponding increase in complexity, and therefore, the question of interpretability arises. Issues of interpretability are becoming increasingly important in the humanities as the acceptance and deployment of advanced computational tools increases. The notion of interpretability is not fully defined, but put simply, interpretability is a measure of how well a machine learning model and its outputs can be understood and how well the results or predictions of the algorithm answer researchers’ questions. In other words, interpretability reflects human users’ ability to understand the reasoning behind the machine learning model, and the process by which the model makes decisions or predictions. A high degree of interpretability inspires trust in the model. Where there is a lack of interpretability, users have limited knowledge of the processes and assumptions employed to produce the output. Some researchers equate interpretability with “understandability” or “intelligibility”, which is the degree to which human users can understand the models. Understandable models are described as transparent, in contrast to incomprehensible models, called black box (Lipton, 2018). For the humanities scholar, interpretation of data must be correct and rigorous. However, in addition, a humanistic approach mandates that computational techniques produce interpretable qualitative and quantitative outputs. Interpretability is particularly important because quantitative features are used to represent and to encode qualitative results, which are subsequently interpreted. For humanities scholarship, such quantitative features encoding the qualitative include edge cases, anomalies, outliers, values that do not easily fit into a model, and noise characteristics, among others. Consequently, what may be treated as “noise” or “outliers” in quantitative modeling may have a substantial meaning for the nuance and ambiguity that are important to humanities scholarship. For example, in machine learning classification algorithms, this involves moving beyond standard accuracy statistics, such as F-scores (an accuracy measure), precision (the ratio of true positives to all positives [true and false positives]) and recall (the ratio of true positives to the sum of true positives and false negatives) values, and even confusion matrices, which clearly indicate the inputs that have been misclassified to give researchers information on where classification failed. In many scientific fields, the success of computational approaches is directly related to these accuracy measures on testing data. However, in humanities scholarship, algorithms are chosen based on their interpretability, as well as on standard accuracy metrics, and more interpretable algorithms are often selected over more accurate, but less interpretable ones. In other words, although accuracy metrics allow researchers to assess an algorithm’s performance, models need to provide interpretable output features, such as the coefficient used in the model, probabilities, and weights learned by the algorithm during training. These metrics facilitate humanistic interpretation, allowing the scale to shift from, for example, the level of individual words to the larger, meaningful units, such as sentences, paragraphs, chapters, and volumes (Dobson, 2021).
For increased interpretability, the data model, or the manner in which the data are represented, must be considered. For instance, the data structure containing the data should allow an exploratory data analysis that maps the output of an algorithm back to the original input data. As another example, a classification algorithm that transforms text into features that can be manipulated numerically should allow a researcher to use those features to determine the original text that was subsequently transformed. Dobson provides the example of the Scikit-learn library of machine learning algorithms within the Python programming ecosystem, which contains a variety of tools for text processing and analysis and is therefore suited to humanities research. Scikit-learn includes many preprocessing methods, highly efficient machine learning algorithms, and data models to facilitate these algorithms, which allows researchers to construct workflows that can be used by other researchers for reproducibility. Scikit-learn provides multiple functions that perform essentially the same tasks, depending on the application and the questions being asked. For instance, in text analysis, there are different preprocessing methods that convert the text to numerical features that are used in subsequent analyses. However, although all these methods allow the user to access the features used in the machine learning model, some of these techniques do not provide a means for the researcher to retrieve the original text from those features. In techniques that use hashing (applying a one-way mathematical function to text to transform it into a numerical representation), the original textual data cannot be retrieved from the hash values, although methods utilizing hashing are very efficient and robust. Other functions in Scikit-learn that perform the same task use a different data model that allows the untransformed text to be examined and is therefore more useful for humanities work where interpretability is important (Dobson, 2021).
To improve interpretability, transparency, and reproducibility of results, scholars applying complex computational techniques increasingly make their data, models, and source code and analysis methods available to peers. What is important to interpretability is not only knowledge of the tool and the algorithms it implements, but also insights into working with the aspects of the technique that are unknown.
To summarize, accuracy and precision values to measure the success of a particular computational or machine learning algorithm, although crucially important, are not sufficient for humanities scholarship. Interpretability is also a key concern. The choice of an algorithm or computational tool is motivated by the degree to which the algorithm or tool is interpretable. Black-box approaches that produce highly accurate results on testing data, simply because they are black-box, do not provide this interpretability, and are therefore not useful to humanities researchers. Interpretability necessitates that all phases of the workflow are transparent, that all assumptions and biases of the tool are delineated, and that features, weights, and parameters used in a specific model are accessible to the user. Consequently, part of the effort in applying machine learning methods to humanistic scholarship is tool criticism, the critical inquiry into the technologies and tools used in research. In tool criticism, technologies are reviewed and considered with respect to the working mechanisms, interface, and embedded assumptions of the tool that affect the user (Dobson, 2021).