Chapter 3: Phonetics

3.2 Speech articulators

Overview of the vocal tract

Spoken language is articulated by manipulating parts of the body inside the vocal tract, such as the lips, tongue, and other parts of the mouth and throat. The vocal tract is often depicted in a midsagittal diagram, a special kind of diagram that represents the inside of the head as if it were split down the middle between the eyes. Midsagittal diagrams are conventionally oriented as in Figure 3.2, with the nostrils and lips on the left and the back of the head on the right, so that we are viewing the inside of the human head from its left side. The main regions and individual articulators of the vocal tract labelled in Figure 3.2 are defined and described in more detail in the rest of this section and the following sections.

Midsagittal view of the vocal tract, facing left, with various body parts labelled.
Figure 3.2. Midsagittal diagram of the human vocal tract.

Open spaces in the vocal tract

There are three main open regions of the vocal tract. The oral cavity is the main interior of the mouth, taking up space horizontally from the lips backward. The pharynx is behind the oral cavity and tongue, forming the upper part of what we normally think of as the throat. Finally, the nasal cavity is the open interior of the head above the oral cavity and pharynx, from the nostrils backward and down to the pharynx.

The bottom of the pharynx splits into two tubes: the trachea (also known as the windpipe), which leads down to the lungs, and the esophagus, which leads down to the stomach. The esophagus is not normally relevant for phonetics, but the trachea is important, since the vast majority of spoken language is articulated with air coming from the lungs, and as discussed later in Section 3.3, there are ways we can manipulate that airflow when it passes from the trachea to the pharynx.

Phones as a basic unit of speech

The pieces of the vocal tract can be articulated in various ways to create and manipulate a wide range of sounds. In the phonetics of spoken languages, we are primarily interested in studying units of speech called phones or speech sounds. It is difficult to provide a precise definition of what a phone is, either in general or for a specific spoken language, but roughly speaking, a phone in a spoken language is a linguistically significant sound, which means that can be used as part of an ordinary word in that language. For example, the ordinary English words spill, slip, lisp, and lips each contain four phones; in fact, these words have the same four phones, just in different orders (with some slight variation in how they are pronounced; see Chapter 4 for more information).

There are many other sounds we can produce with the vocal tract or even with other body parts, such as burps, snorts, finger snaps, etc., However, these are not typically studied in phonetics, because they are not known to be phones in any spoken language. However, even though they do not occur in ordinary words, they may still be used to express non-linguistic meaning. For example, in some cultures, snapping fingers can indicate quickness or a desire for attention.

Note that spoken languages may differ in how they use phones and whether they even use the same phones at all. For example, English speakers may use clicking sounds to express disapproval (the soft teeth-sucking tsk-tsk click) or to urge a horse to go faster (the loud popping giddyup click), but they are not phones in English, because they are not used within ordinary words. However, these same sounds do occur as phones in some other languages, such as Hadza (a language isolate spoken in Tanzania; Sands et al. 1996) and isiZulu (a.k.a. Zulu, a Southern Bantu language of the Niger-Congo family, spoken in southern Africa; Poulos and Msimang 1998).

We have to be careful about what kinds of words we look at to determine the phones of a language, because there are some marginal word-like expressions that can be used while speaking, but which may contain sounds that are not phones in the language. For example, the English word ugh is often pronounced with a rough gravelly sound that is otherwise not used in English, and we can say things like Kaoru noticed their car was making a glzzk-glzzk-glzzk sound, where glzzk is some impromptu sound produced to mimic the noise made by a vehicle in desperate need of repair.

One of the most fundamental distinctions between phones is whether they are consonants or vowels. The next three sections address how consonants and vowels are articulated and how they are described and categorized in meaningful ways by linguists.

Check your understanding


Poulos, George, and Christian T. Msimang. 1998. A linguistic analysis of Zulu. Pretoria: Via Afrika.

Sands, Bonny, Ian Maddieson, and Peter Ladefoged. 1996. The phonetic structures of Hadza. Studies in African Linguistics 25(2): 171–204.


Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Essentials of Linguistics, 2nd edition by Catherine Anderson; Bronwyn Bjorkman; Derek Denis; Julianne Doner; Margaret Grant; Nathan Sanders; and Ai Taniguchi is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.

Share This Book