[in progress] Chapter 14: Historical Linguistics

14.8 Reconstructing the past

Direct evidence

The changes discussed in previous sections have all been attested changes, because we have direct evidence of how they progressed: how the meanings of words shifted, how some words in poetry used to rhyme but no longer rhyme today, etc. Some of these changes are attested in audio or visual recordings, but that technology has only existed since the late 1800s. Writing goes back much further, to around 3400 BC in ancient Sumer, in what is now modern-day southern Iraq. This gives us access to up to over five thousand years of attested information that we can use to investigate the linguistic past, at least for some languages.

However, there are thousands of languages that have not yet been written, even in their modern form. This is especially true for many Indigenous languages, signed languages, stigmatized dialects, and other minoritized language varieties. These languages and their users typically have not held social power, so they were historically not considered worth being documented. While this attitude is changing, there are still large gaps in the written historical record for the vast majority of languages.

Furthermore, even for the small set of languages whose histories we do have extensive written records for, their records are not perfect transcriptions. There are many ways that the written form of a language can differ from how the language is used in ordinary conversation. For example, as discussed in Section 3.6, spelling may not match pronunciation. In addition, even specific words or syntactic structures may differ between the two ways of using language, such as how English gonna is much more common in speech than in writing.

Thus, we need additional methods that allow us to analyze a greater diversity of languages and linguistic patterns. With the right methods, we can even go further back into the linguistic past, before the invention of writing.

Comparing cognates

A crucial method for historical linguistics is the comparative method, in which multiple related languages or language varieties are compared to each other, and we extrapolate backwards in time to a single plausible hypothetical ancestral form for all of the compared languages. This extrapolation is called comparative reconstruction or simply reconstruction for short.

Note that reconstruction can only be properly used to analyze data from languages that are related, that is, that they are descended through ordinary language change from a single common language at some point in the past. By analogy with biological evolution, related languages are sometimes said to be genetically related. In some cases, it is obvious that languages are related because of extensive patterns of similarity, but sometimes, we may only suspect that the languages are related, so we could use the comparative method to help confirm or deny that suspicion.

The main objects of study in the comparative method are cognates. These are words or morphemes in related languages that are directly descended from the same single ancestor form, which is called their etymon. We can often recognize cognates due to similarity in both pronunciation and meaning. For example, it is easy to see that the English word [haʊs] house and the German word [haʊs] Haus ‘house’ are cognates, because their pronunciations and meanings are essentially the same.

Furthermore, we also can make a reasonable guess that their etymon also had a similar pronunciation and meaning. That is, in some older language that eventually evolved into both English and German, there was likely a word pronounced something like [haʊs] with a meaning something like ‘house’; this word would be the etymon of modern house and Haus. Because of the Great Vowel Shift in English and the similar independent shift that happened in the history of German (discussed in Section 14.3), we could further refine our hypothesis and propose that the etymon was actually pronounced more like [huːs].

This is the basic idea behind the comparative method. However, before using it to compare cognates, there are some issues that have to be considered when looking for cognates. It can take decades of study of multiple languages to properly find cognates. Even then, we can still get things wrong for a variety of reasons.

Pitfalls in the search for cognates

As we have already seen in previous sections, words can change in both form and meaning. This means that cognates could end up being very different from each other, making it hard to even identify them as cognates at all. For example, the English word [t͡ʃɔr] chore ‘task’ and the German word [keːʁə] Kehre ‘U-turn’ are cognates, but their historical relationship is obscured by their very different pronunciations and meanings.

In addition, borrowings between languages often look like cognates, but we normally exclude them when using the comparative method (see Section 14.7 for discussion of borrowings). Borrowings do not undergo the full history of changes in the recipient language, so they do not accurately reflect that history. In addition, they undergo some portion of the donor language’s recent history that is not shared with the recipient language. If we looked at too many borrowings instead of true cognates, we could be fooled into thinking that the languages were more closely related than they actually are.

However, we sometimes cannot tell whether a word is a borrowing or not, especially if it was borrowed far enough back in the histories of the comparison languages. In that case, the borrowing could be mistaken for a genuine cognate, since it would have evolved within the recipient language for a longer period of time and within the donor language for a shorter period of time, making it look more like a proper original word of the recipient language. Any remaining discrepancies might then be treated as sporadic change rather than borrowing, if we do not have enough evidence to know it was borrowed rather than descended separately from the same etymon.

Sometimes, a word could be borrowed into the languages of interest from a completely different language. For example, the Arabic word [zaraːfa] زَرافَة‎ ‘giraffe’ was borrowed (by way of Italian) into both English and German separately, as [d͡ʒəræf] giraffe and [ɡiʁafə] Giraffe, respectively. Without knowing the original source, these words might again falsely convince us of a closer relationship between English and German than would otherwise be justified from true cognates alone.

In the extreme, some borrowings can crisscross through many languages for centuries, especially for concepts relevant to trade. Such a word is sometimes called a Wanderwort (a German compound that literally means ‘wander word’) or wanderword (an English calque of Wanderwort). Notable examples of wanderwords include ginger, honey, silver, sugar, and tea. In some cases, a wanderword can even enter a language at multiple points in time from different sources, as happened with tea and chai in English. Because of their long and complex multilingual history, it can be difficult to determine the origins of a wanderword within a particular language.

We also have to be careful about false cognates, which are words that seem like they could be cognates due to their similar pronunciations and meanings, but which actually have very different histories and do not come from the same etymon. For example, even though English [mr̩dr̩] murder ‘kill’ and German [maʁtɐ] Marter ‘torture’ have similar pronunciations and meanings (ways of inflicting harm), they are not cognates and have different etymologies, given in (1) and (2), respectively. False cognates like these need to be excluded from the comparative method.

(1) Modern English murder < Old English morðor , cognate with German Mord
(2) Modern German Marter < Old High German martira ‘torment, martyrdom’, borrowed from Latin martyrium ‘martyrdom’ (Kluge 1883)

A related concept is false friends, which are words that are cognates with similar pronunciations, but whose meanings have diverged enough over time that they might not be recognized as cognates. For example, the English word [ɡɪft] gift and the German word [ɡɪft] Gift are pronounced essentially the same, but the German Gift means ‘poison’, not something we would normally associate with English gift. Both of these words likely come from the same etymon with a more neutral meaning of ‘act of giving’ that underwent amelioration in English and pejoration in German (see Section 14.6 for discussion of amelioration and pejoration).

Because gift and Gift are actual cognates, we would want to include them when using the comparative method, despite their seemingly unrelated semantics. If we did not know the languages and their histories well enough, it might be difficult to recognize them as cognates, so we could miss out on including them, giving us less data to work with and weakening our overall analysis. Note that the term false friends is sometimes used more broadly for any words with similar pronunciations and very different meanings, whether or not they are actual cognates. However, we use false friends here as a special type of cognates with divergent semantics.

If we can avoid these pitfalls, we can then build sets of true cognates that can give us insight into the common ancestor language they descend from. In Section 14.9, we analyze cognates from three modern Chinese languages and construct hypotheses about etymons for some of their cognates. Crucially, we use only the comparative method and our general knowledge of how languages change, without looking at older written records of Chinese or any other external information. The goal is to see how powerful the comparative method is and just how far we can without access to written records, since most languages have none.

