Database Management Systems
Databases management systems (DBMSs) are software systems for creating, updating, modifying, accessing, and querying collections of data organized into a manipulable and searchable structure known as a database. DBMSs are frequently used in humanities research to organize text, information about authors, relationships between textual entities, and information about cultural artifacts. In the most popular type of database, known as a relational database, data are organized into two-dimensional tables, analogous to a spreadsheet, where each row of the table represents a specific entity and the columns contain attributes, or pieces of information, about that entity. Usually, a database is comprised of multiple tables that are related by some of the attributes in the individual tables. These attributes that are needed toneeded for relationships between tables are known as keys, and there are different categories of keys depending on what type of relationship is being specified. DBMSs allow users to perform queries on the database, where the user retrieves specific information from a database according to some criteria or constraints. For instance, in a database pertaining to British authors of the 19th Century, a scholar may wish to determine those authors whose primary genres were both the short-story and journalism. The scholar then constructs a query on the database (sometimes called a query against the database) in which attributes requested by the user, such as authors’ last names, date of birth, influences, and city of residence, are returned for those authors working primarily in the short-story genre and in journalism. In relational databases, queries are performed with a special language called the Structured Query Language, or SQL, versions of which are supported by all major database management systems. However, users themselves usually do not need to write SQL queries to probe the database, as user interfaces or web interfaces allow users to indicate what information is needed. The interface would then run a program that formulates the SQL query and sends it to the DBMS, which subsequently returns the results that are presented by the user’s interface.
Because of the sheer volume and heterogeneity of data, including the digital humanities, relational databases are being supplemented with or replaced by newer, non-relational database technologies. NoSQL database systems represent tables without tables. “Big Data” is one of the primary application areas of NoSQL databases, due to scalability of NoSQL systems.
However important computational algorithms, visualization, machine learning, and programming knowledge are to digital humanities scholars, in themselves, they are insufficient. To fully realize the potential of digital tools, quantitative insight is required. Computational techniques produce results. However, these results require interpretation, and exploratory analysis needs to be conducted. Intuition may be employed to interpret these results, and, in the case of experienced researchers, this intuition can be very good. However, in general, intuition does not provide sufficient warrant for determining whether apparent trends, groupings, or variations are supported by the data. Consequently, an in-depth knowledge of quantitative analyses and statistical methodologies is needed to complement and to make sense of computational results. As explained by Rutgers University digital humanities scholar Andrew Goldstone, programming expertise alone and curricular training in programming, without knowledge of possibilities opened up by quantitative analysis, “can bring students—but really, I should say, it has brought digital humanities as a field—only up to the threshold of method and not over it” (Goldstone, 2019).