Linguistic Atlas Project
The Linguistic Atlas Project (LAP) is an example of a somewhat older digital humanities initiative – at that time, still called humanities computing – that demonstrates the evolution and development of a large scale project, and some of the challenges that such undertakings faced, and to some degree, still face. Developed with a large contribution from scholars at the University of Georgia led by Professor of Humanities and Editor Emeritus of the Linguistic Atlas Project William A. Kretzschmar, and with the website hosted at that institution, the project is notable for its goals of providing interactive visualizations incorporating geographic information systems (GIS) and for the accessibility of the data used in the project to scholars and to the general public. The suite of tools was developed under the LAMSAS name (Linguistic Atlas of the Middle and South Atlantic States (LAMSAS). The Linguistic Atlas Project is comprised of multiple investigations into the usage and pronunciation of American English words. The project integrates data from interviews from American English speakers, dating from 1929 and still ongoing, primarily in the Western United States. Over 800 topics were discussed in most of the interviews to extract common word usages. Many of the older interviews predate tape recorders, and were therefore conducted by trained field workers, who kept detailed records in a phonetic scripting format to describe variations in pronunciation. This complex dataset was the subject of the LAP project.
The project was conceived to contribute to linguistic and cultural aspects of humanities scholarship. The interviews, viewed in aggregate, provide a detailed panorama of day-to-day life in the 20th century United States, as spoken (literally) by actual people. The interviews discuss historical topics, and consequently additional cultural and historical insights can be attained by comparing the interviews temporally and/or by different regions. The project, which began in 1990, had the task of extracting words and phrases from the interviews for subsequent computational analysis. The interface, which was quite advanced at the time, consisted of three panels, or columns, arranged horizontally. The first (leftmost) column contained user control functions, such as selecting a database or choosing an item on which to search. The middle panel contained the results of the statistical analysis, including the frequency of the occurrence of a word selected by the user, with additional information about the word usage determined from an analysis of the interviews. The last panel contained the visualization enabled by GIS, consisting of a base map layer with state boundaries, and a layer containing geographic locations plotted atop the first layer that indicate the prevalence of the selected word (Kretzschmar Jr, 2009). The system originally ran on the Apple Macintosh platform, with the GIS using the Foxbase relational database management system. The database enabled associating geospatial coordinates, mapped to the graphical display, with linguistic data.
The system was subsequently ported to the World Wide Web in 1996. The then-new technology enabled the research team to make the data to the general public. Interactive GIS features facilitated locating individual speakers and information from the survey data. New technology made it possible to retrieve information on speakers and their survey data by clicking on the speaker’s location on a map. Searching, browsing, and static (non-interactive) pages were also part of the site (Kretzschmar Jr, 2009). Data could be stored as CSV (human-readable) files, instead of in binary (machine-readable) format. The new web site, now named Atlas Web, made possible the integration of the various components of the Atlas project, providing enhanced support for scholarship and teaching (Kretzschmar Jr, 2009).
The web site underwent a major revision in 2003. The GIS subsystem was enhanced for increased flexibility, and to support sociolinguistics, in which relates linguistic features to sociological variables. Separate browse screens were generated and presented in real-time. Pronunciation-based searches were now supported. From a technical standpoint, data storage for pronunciation was converted from a custom phonetic format to Unicode. A relational database management system based on MySQL (SQL is the Structured Query Language, used extensively in relational database systems) was employed. A lot of the functionality of the new site was effected through scripts written in the Python language. However, the main innovations were related to linguistic theory and analysis (Kretzschmar Jr, 2009).
As of 2011, archival audio files also exist, which were integrated into the project. The data set of interviews can be re-conceived as “conversational corpora” that link text transcriptions to these audio files (Kretzschmar Jr, 2009).
On the main site, users may view the 17 atlas projects that comprise LAP, such as Linguistic Atlas of the Gulf States (LAGS), Linguistic Atlas of the Middle and South Atlantic States (LAMSAS), Gullah, and Southern England, among many others, and link to these projects to obtain detailed information. A Data Download Center is also accessible.
On the current website, the portal to search and browse records from the different LAP projects is provided through the LICHEN graphical user interface. The site enables browsing by Informant (the interviewee) and Items for the answers given. There are two main views: Browse and Search. The Browse view consists of three tabs. In the List tab, information about the data – Project name, ID, State, Sex, Ethnicity, Age Level, etc. – is listed in a browsable table. The Map tab indicates the geographic locations associated with the rows of the List table on a clickable map. The current map (as of October, 2021) uses Google Maps. Selecting a location on the map allows the user to display the record associated with that location. The CSV tab allows the user to download the data in comma-separated value text format. The Search view allows user to search the data by any of the fields in the List view: Project (a dropdown menu is used for selecting specific projects and ID), Location (State, County, Town, Land region), and Speaker (Sex, Ethnicity, Age Level, Education, Social Status). In the Informants view, the informants are anonymous, but identified through a unique Informant ID. Basic demographic and socioeconomic information and the location of the informant are shown. The Items view provides detailed information on the questions in the interviews.
The reader is encouraged to visit the LAP site to examine its interface and to explore its features.
Tracing the evolution of the Linguistic Atlas Project illustrates how a team of scholars and technical professionals respond to technological innovations, from the emergence of widespread personal computer usage in the early 1990s, to the growing influence of the World Wide Web, and continuing with multimedia and new text-encoding paradigms (Kretzschmar Jr, 2009). The responses to technological innovations resulted in consequent developments in language variation scholarship, progressing from dialect maps to advanced visualizations rendered in real-time, as well as statistical processing and analysis for the large amount of textual and audio data used in the project (Kretzschmar Jr, 2009).
However, through all the improvements and enhancements of the site, the older sites will remain accessible. New visualizations are always needed in response to both computational innovations and new research questions, theories, and analysis styles (Kretzschmar Jr, 2009).