Chapter 10: Language Variation and Change

10.5 Variationist methods and concepts


Variationist methods are closely linked to earlier dialectology – the study of the variation found across dialects and, in particular, regional varieties. A major distinction between dialectology and variationist sociolinguistics has to do with the nature of the study participants. Because early dialectologists were coming from a historical linguistic tradition that was more interested in ‘older’ language varieties, the ‘optimal’ participant for them was a N.O.R.M. – a Non-mobile, Older, Rural, Male (Chambers and Trudgill 1998: 30). This was based on four assumptions:

  1. People who had not moved around much during their life were assumed to be more likely to maintain a way of speaking that represented the speech of the place they were from (hence non-mobile);
  2. The older a speaker is the older their speech was assumed to be (hence older);
  3. There is a lot of movement in and through urban areas that might influence the way people in those areas speak whereas rural areas were assumed to preserve tradition (hence rural), and;
  4. Women’s speech was assumed to be more self-conscious and status-conscious than men’s speech (hence male).

The result of this bias toward N.O.R.M. participants in early dialectology meant that by the middle of the twentieth century, there was a gap in research on language variation in urban areas (not to mention the language use of women and innovative language practice). This changed with the advent of variationist sociolinguistics which brought not only a focus on urban dialects (and hence that’s why variationist sociolinguistics is sometimes referred to as urban dialectology), but it also brought a much wider array of social factors, beyond place, into consideration. In other words, a highly-mobile, younger, urban, woman is just as likely to be of interest as a N.O.R.M.!

Dialectology: Then and now. One of the most influential European dialectological projects was the Atlas linguistique de la France (‘The Linguistic Atlas of France’). For four years, between 1898 and 1901, a linguistically-trained field worker named Edmont Edmont travelled all over France (by bicycle!) and spoke to 735 people in 638 places about their language. He conducted interviews and transcribed people’s speech in fine phonetic detail. The result was a wonderful and huge book of maps of France that each plotted the specific realizations of a linguistic variable that varied across the country. The atlas has been digitized and anyone can browse high-resolution scans of all the maps on the project’s website!

Portrait of Han dynasty philosopher Yang Xiong
Figure 10.2. Portrait of Yang Xiong (Public domain image)

But well before Edmont Edmont cycled across France, Han dynasty philosopher Yang Xiong (53 BCE–18 CE) spent 27 years compiling a dictionary of regional words, called Fāngyán 方言 (‘regional speech, variety, dialect’), in which he catalogued the many various terms and pronunciations of thousands of Chinese characters across China. From this work, twentieth century linguists have inferred six main dialect groups spoken during that time.

Dialectology has continued into the 21st century in its own right making use of and developing new digital tools (e.g., GIS, Google Maps, social media APIs, rapid online and mobile app data collection, automated acoustic analysis etc.). For example, check out the Algonquian Linguistic Atlas. It includes not just transcripts of regional variation but hundreds of audio recordings of speakers of 18 Algonquian varieties and 9 dialect maps based on the regional patterning of variation.

Sociolinguistic interviews and corpora

Just about any collection of language in use (audio/video recorded, transcribed, or written) can be analyzed with a variationist sociolinguistic lens: collections of hand-written letters, text messages, television shows, or even every single one of Queen Elizabeth II of the United Kingdom’s annual Christmas messages over several decades! However, the most common method of data collection in variationist sociolinguistics is the sociolinguistic interview. The sociolinguistic interview is not like what we normally think of as an interview (i.e., a set of questions asked by an interviewer of an interviewee with the intention of gathering information or understanding a topic from the interviewee’s personal experience and perspective). In its original formulation, the sociolinguistic interview was composed of different tasks including a Minimal Pairs task, a Reading Passage, and Casual Speech. These tasks were specifically designed to correlate with an interviewee’s degree of self-monitoring.

How might you modify the tasks of the sociolinguistic interview if you were studying variation in a signed language?


During a Minimal Pair task, the interviewee is asked to read aloud from a list of words that have been carefully organized into pairs. In Chapter 3, the concept of a minimal pair was already introduced: two words that differ in just one specific way. So for example, in Southwestern Ojibwe (spoken in Minnesota), giiwe [ɡiːweː] (‘he goes home’) and giiwenh [ɡiːwẽː] (‘so the story goes’) are minimal pairs, differing only with respect to whether the final vowel is nasalized or not (Nicolls 1980). In the context of the sociolinguistic interview, minimal pairs also differ in only one specific way, however, one of the words contains a linguistic variable, such that the articulation of one variant of the word will result in the two words in the pair becoming identical. That’s a mouthful but an example will help! In New York City English, ‘r-dropping’ (or the variable deletion or vocalization of non-prevocalic r) is a prevalent linguistic variable. So for example the word sore can be pronounced as [sɔɹ] or [sɔə], where the <r> is realized as a schwa. The first variant here forms a minimal pair with the word saw (pronounced [sɔə] in NYC English), differing only with respect to [ɹ] and [ə]. However, the second variant of sore, the one without [ɹ], is phonetically identical to the word saw. When participants are asked to read pairs of words like saw~sore, udder~other, bag~beg, and bruin~brewing they pay a lot of attention to their language to make sure each member of the pair is pronounced distinctly, especially when the variant that would make the words sound identical is stigmatized.

In a Reading Passage task, participants read a paragraph out loud. This context also elicits a high degree of self-monitored language, but the requirement of reading coherently deflects some of the focus from the choice between variants to the content of the passage. Reading passages are written so that they contain a sufficient number of examples of the linguistic variables that the researcher is interested in.

In more casual conversation participants put much more focus on the content of what they are communicating and much less attention is paid to how they are speaking. At the same time, because being interviewed (often by a stranger) is not an everyday occurrence for most people, this context is still less casual than people’s everyday way of speaking, what we call their vernacular. The vernacular represents someone’s unmonitored language. In other words, the vernacular is the way we use language when we aren’t being recorded by linguists! In a sociolinguistic interview, a participant’s vernacular can came out during moments of Casual Speech. Moments of casual speech appear when people momentarily ‘forget’ that they are being monitored, like when a third-party interrupts the sociolinguistic interview. Certain questions, especially those that prompt an emotional reaction, can elicit casual speech. For example, “was there ever a time you got blamed for something you didn’t do?” or “do you remember what you were doing when [a major community event, like when the Toronto Raptors won the NBA championship] took place?”. These kinds of questions can elicit high-emotion responses and also often result in the participant telling a story. When we are engaged in story-telling, we pay much less attention to linguistic variables in our language use! But still, there is no fool proof way of eliciting someone’s vernacular and this is such a fundamental methodological problem for the field that it has a name: the observer’s paradox. Labov (1972: 209) describes it as such: “The aim of linguistic research in the community must be to find out how people talk when they are not being systematically observed; yet we can only obtain this data by systematic observation”. Today, the main method of overcoming the observer’s paradox is to interact with participants in a genuine, interested, and organic way by asking good questions, following up with curiosity, and by building rapport and trust.

The nature of a sociolinguistic interview. The sociolinguistic interview has been modified and tailored to the specific needs of different linguists over the years. For example, if your research question is not about contextual style, there’s no need to record people across different tasks. Likewise, if you are examining syntactic variation, a minimal pair task just isn’t going to provide you with relevant data. In fact, many sociolinguists approach the sociolinguistic interview with just one goal – record a natural conversation. The interviewer may have certain questions in mind going into the conversation but unlike the oral history methodology or qualitative/ethnographic interviews, the goal isn’t to uncover the answer to specific questions but rather to simply chat for a bit!

One sociolinguistic interview might reveal certain things about how one person style-shifts or perhaps may tell us something about the linguistic constraints on variation in their grammar. But how one person uses language doesn’t tell us how linguistic variables pattern within a community. What we need is multiple sociolinguistic interviews with multiple people. Ideally, we would record sociolinguistic interviews from across a representative, socially-stratified sample of the communities we are investigating.  Depending on the questions we are asking, a representative sample would include an equal number of participants from the cross-sections of all relevant social groups. For example, if we were asking questions about social class, we’d include roughly equal numbers of speakers from across a range of social classes. This collection of sociolinguistic interviews forms a sociolinguistic corpus. Sociolinguistic corpora (note the irregular plural) can be used to study a multitude of linguistic variables and sociolinguistic phenomena. Some of the most important sociolinguistic corpora , in terms of how much the analysis of its data has advanced our understanding of language variation and change, are Canadian!  For example, the Sankoff-Cedergren corpus of Montreal French was one of the very first large-scale corpora of sociolinguistic interviews and the first of its scale to represent a language other than English. Analysis of this corpus led to important methodological, analytical, and theoretical developments in the field.  Sali Tagliamonte’s Toronto English Archive is a more recent project that has already produced over 70 publications on a huge variety of topics.

Diversity. The vast majority of variationist sociolinguistic studies have considered only three languages: English, French, or Spanish (see Stanford 2016). There are many reasons why this is the case, including a historical lack of diverse representation in academia. However, over the last few years, a much wider diversity of languages have been considered. In this chapter we try to highlight some of the research on this wider diversity of languages.

Quantitative analysis

Okay, so now we have a corpus of sociolinguistic interviews at our disposal. What’s next? How do we actually analyze linguistic variables, the main object of study? Remember that the choice between variants of a linguistic variable is subject to probability. That means an analysis of linguistic variables must be quantitative in nature. The quantitative approach of variationist sociolinguistics rests on the Principle of Accountability. The idea is pretty simple. We don’t just want to look at the variant that is interesting to us (whether it’s new or non-standard or whatever). We also have to consider all of the other variants that make up the linguistic variable. For example, just like Fischer did in 1958, if we were interested in the [ɪn] variant of -ing, we can’t just count up how many times our participants said [ɪn]. Instead, we need to know how many times they said [ɪn] out of the total number of times they could have possibly said [ɪn] and that means we also have to count up the times they said [ɪŋ] and not [ɪn]. With that information, we can calculate the percentage of tokens – each individual instance of a variant in our data – of variable -ing that were realized as [ɪn].  This is the Principle of Accountability in action.

Sociophonetics. In some cases, particularly with phonetic variation, the variable being examined doesn’t fit into discrete variant categories. For example, the vowel in nyuz ‘news’ and shuts ‘shoots’ in Hawai`ian Creole varies between [uː] and [ʉː] and anything in between (Grama 2015). We could listen to each token of this vowel and classify it as belonging to one of two categories (back or central), but a more accurate approach is to use acoustic phonetic tools to measure the second formant of each vowel which corresponds directly to how front or back the tongue is in the mouth. Treating the variable as continuous rather than discrete requires slightly different quantitative techniques, but the approach is essentially the same!

This principle applies at every step of the analysis too.  Imagine you want to compare the frequency of [ɪn] in a reading passage to the frequency of [ɪn] during the interview task.  You would need to count the number of [ɪn] tokens in the reading passage and the number of [ɪŋ] tokens in the reading passage to calculate the proportion of [ɪn] in the reading passage, and likewise count both [ɪn] and [ɪŋ] in the interview to determine the proportion of [ɪn] in that context. Just counting the number of [ɪn] tokens in each context doesn’t tell us enough information.  Table 12.1 demonstrates why the principle of accountability is so important. If we don’t follow the principle of accountability it seems like [ɪn] is more frequent in the reading passage than in casual speech (10 vs. 8) but of course this doesn’t take into account how frequently the variant could have occurred but didn’t. If we add a denominator that indicates how many tokens of [ɪn] and [ɪŋ] occurred in each context to our table, we get a more accurate picture of the effect of contextual style (25% in the reading passage vs. 40% in casual speech). The principle of accountability applies to every linguistic and social factor we might consider.

Table 10.1. Why we must follow the principle of accountability

Not following the Principle of Accountability Following the Principle of Accountability
Reading Passage 10 tokens of [ɪn] 10/40 tokens of -ing = 25%
Casual Speech 8 tokens of [ɪn] 8/20 tokens of -ing = 40%

In this section, we’ve learned about the methods, data, and analyses used in variationist sociolinguistics to the study of language variation and change. The hallmarks of the variationist method are the sociolinguistic interview (for collecting data) and the principle of accountability (for analyzing data).


