Chapter 3: Phonetics

3.6 The International Phonetic Alphabet


Note that we have been talking about phones as if it is obvious what they are, but this is not always the case. It is sometimes easy to find a clear separation between the phones in a given word, that is, to segment the word into its component phones, but sometimes, it can be very difficult. We can see this by looking at waveforms, which are special pictures that graphically represent the air vibrations of sound waves. The two waveforms in Figure 3.18 show a notable difference in how easy it is to segment the English words nab and wool.

Two waveforms. Left waveform for the word nab is segmented into three distinct regions, labelled n, a, and b. The right waveform for the word wool has no clear segmentation between w, oo, and l.
Figure 3.18. Waveforms for the English words nab and wool.

The waveform for nab contains abrupt transitions between three very different regions corresponding to three phones, while the waveform for wool has smooth transitions from beginning to end, with no obvious divisions between phones.


When we can identify the individual phones in a word, we want to have a suitable way to notate them that can be easily and consistently understood, so that the relevant information about the pronunciation can be conveyed in an unambiguous way to other linguists. Such notation is called a transcription, which may be very broad (giving only the minimal information needed to contrast one word with another), very narrow (giving a large amount of fine-grained phonetic detail), or somewhere in between. Whether broad, narrow, or in between, phonetic transcription is conventionally given in square brackets [ ], so that, for example, the consonant at the beginning of the English word nab could be transcribed as [n], with the understanding that the symbol [n] is intended to represent a voiced alveolar nasal stop.

As linguists, we are interested in studying and describing as many languages as we can, so we want to use a transcription system that can be used for all possible phones in any spoken language. This means we cannot simply use any one existing language’s writing system, because it would be optimized for representing only the phones of that language, so it would not have easy ways to represent phones from other languages.

In addition, many writing systems are filled with inconsistencies and irregularities that make them unsuitable for any kind of rigorous and unambiguous transcription. For example, the letter <a> in the English writing system is used to represent different phones in the words nab, father, halo, and diva, while the phone represented by the letter <i> in the word diva is represented with different letters or letter combinations in other words: <ee> in meet, <ea> in meat, <e> in me, and <y> in mummy.

Note that symbols from a writing system are represented here with surrounding angle brackets < >. This is a common notational convention in linguistics that helps visually distinguish symbols in a writing system from symbols used for the transcription of phones, which are enclosed in square brackets.

Furthermore, even if English spelling were perfectly regular, many specific English words can be pronounced in different ways, such as either and route, which have different equally valid pronunciations. This kind of variation is particularly common between different dialects.

For example, the word mop has a vowel that is typically pronounced differently by speakers from Los Angeles (with the tongue low and back in the mouth), London (similar to the Los Angeles vowel, but with some lip rounding), and Chicago (more central in the mouth, making mop sound nearly like map to other speakers). If we tried to describe in writing how to pronounce a vowel from another language, but we said that was pronounced the same as in the English word mop, we could not guarantee that the reader would know whether the vowel is back and unrounded, back and round, or central and unrounded.

The International Phonetic Alphabet

To avoid these problems, linguists have devised more suitable transcription systems, each with their own strengths and weaknesses. In this textbook, we will use a widespread standard transcription system called the International Phonetic Alphabet (IPA). The IPA was created by the International Phonetic Association (unhelpfully also abbreviated IPA), which was founded in 1886. The first version of the IPA transcription system was published shortly after, and it has undergone many revisions since then as our knowledge and understanding of the world’s spoken languages have evolved. The most recent symbol was added in 2005: [ⱱ] for the labiodental tap, a phone found in many languages of central Africa, such as Mono, a Central Banda language of the Ubangian family, spoken in the Democratic Republic of the Congo.

For reference, the full chart for the IPA is given in Figure 3.19. This chart is available under a Creative Commons Attribution-Sharealike 3.0 Unported License, copyright © 2020 by the International Phonetic Association. It is also available online at the IPA’s homepage, and there are also some online versions that are accessible for screenreaders, such as this one.

Full International Phonetic Alphabet chart.
Figure 3.19. Full chart of all symbols in the International Phonetic Alphabet.

Learning the IPA takes a lot of time, practice, and guidance. But learning the IPA is not just about memorizing symbols. The underlying structure and principles behind the organization of the table is what matters. The IPA is like the periodic table of elements in this way. It is helpful to know that Na is the chemical symbol for sodium, or that [m] is the IPA symbol for a voiced bilabial nasal stop, but it is much more important to know what these concepts are. What is sodium? What does it mean for a phone to be voiced? How is the vocal tract configured for a bilabial nasal stop?

This is why this chapter has focused on defining concepts, so that you can build a solid foundation in understanding how phones are articulated. The notation is secondary to that.

Using the IPA

A full discussion of how to use the IPA is beyond the scope of an introductory textbook like this one. Here, we discuss a few guidelines and some concrete examples from English. For any transcription, it is important to keep in mind who your audience is and what the purpose of the transcription is. Most of the time, we normally only need a fairly broad transcription to get across a basic idea of the most important aspects of the articulation.

One important guiding recommendation from the IPA for broad transcription is to use the typographically simplest transcription that still conveys the most crucial information. That is, when possible, choose symbols like upright ordinary Roman symbols like [a] and [r] rather than their inverted counterparts [ɐ] and [ɹ]. Ordinary symbols are easier to type, easier to read, and more reliable in how they are displayed in different fonts.

Another aspect of typographic simplicity is to avoid diacritics, which are special marks like [  ̪] and [ʰ] that are placed above, below, through, or next to a symbol to give it a slightly different meaning. These are often necessary for certain contexts, but sometimes, they are superfluous.

Typographic simplicity is good practice when dealing with a lot of variation that is not relevant to the main point. For example, the English consonant typically spelled by the letter <r> is pronounced in many different ways by different speakers. Many North Americans have some sort of central approximant, but it varies from alveolar [ɹ], to postalveolar [ɹ̱], to retroflex [ɻ]. Some speakers may also have a pharyngeal constriction, indicated in the IPA with a superscript [ˁ] diacritic after the symbol. Some speakers may also have lip rounding, indicated in the IPA with a superscript [ʷ] diacritic after the symbol. Some speakers may have both pharyngealization and rounding!

That’s at least twelve different possible articulations, each with its own transcription in the IPA, depending on the place  of articulation, and whether or not there is pharyngealization and/or rounding. The IPA symbols for these twelve possibilities are given in the list below; for each, the symbols are in order by place articulation: alveolar, postalveolar, and retroflex.

  • no pharyngealization and no rounding: [ɹ], [ɹ̱], or [ɻ]
  • pharyngealization and no rounding: [ɹˁ], [ɹ̱ˁ], or [ɻˁ]
  • rounding and no pharyngealization: [ɹʷ], [ɹ̱ʷ], or [ɻʷ]
  • both pharyngealization and rounding: [ɹˁʷ], [ɹ̱ˁʷ], or [ɻˁʷ]

Furthermore, looking at English more broadly, there are many other pronunciations beyond those in North American varieties, such as an alveolar tap [ɾ] or trill [r] in Scotland, a voiced uvular fricative [ʁ] in Northumbria, and a labiodental approximant [ʋ] in London.

Thus, when transcribing English, there is no one single symbol that accurately represents the pronunciation of this consonant, so [r] is a reasonable choice because of its typographic simplicity. Of course, when transcribing a specific articulation from a specific speaker, it may make sense to use a more precise symbol, especially if the details of the articulation are important. But generally speaking, a plain [r] is normally fine for English, though some linguists may prefer to use [ɹ] or [ɻ] for North American English, even though there are at least a dozen equally valid North American pronunciations. If you are taking a course in linguistics, be sure to follow the standards and conventions set by your instructor.

Why is there so much variation in the pronunciation of English <r>? These phones belong to an unusual class called rhotics, named after the Greek letter rho ρ, which itself represents a rhotic phone. Across the world’s languages, we find a lot of variation in rhotics. Many languages only have one rhotic, but which particular rhotic they have can be very different from related or neighbouring languages. The pronunciation of rhotics in a language can also shift over time, especially if the language only has one, as English does. There seems to be no single overarching phonetic similarity in the various rhotics, and linguists are still trying to figure out what makes this class of consonants so special.

However, even when the pronunciation of a given phone is fairly consistent across speakers, many linguists still choose a typographically simpler transcription. Consider the consonant at the beginning of the English word chin, which is a voiceless postalveolar affricate. Affricates in the IPA are normally transcribed by writing the corresponding plosive symbol for the stop closure, followed by the corresponding fricative symbol for the fricated release, both united under a tie-bar [  ͡  ].

The symbol for a voiceless postalveolar affricate is [ʃ]. We can find this in the IPA chart by looking in the section devoted to consonants. Places of articulation are listed across the top, while manners of articulation are listed down the left. Within a given cell, if there are two symbols, the one on the left is voiceless, and the one on the right is voiced. So looking in the postalveolar column and the fricative row, we find the symbols [ʃ] and [ʒ], and since we are interested in the voiceless fricative, we pick the symbol [ʃ].

However, there is no basic symbol for a voiceless postalveolar plosive in the IPA. That part of the chart is blank, so we have to create our own symbol by using the base symbol for a similar phone and adding one or more diacritics. In this case, we can use alveolar [t] and put a retraction diacritic [  ̱] under it to indicate that its place of articulation is slightly farther back, as we did for the postalveolar central approximant [ɹ̱] above. Thus, we get [ṯ] as the symbol for a voiceless postalveolar plosive.

In addition, most English speakers also pronounce this affricate with some amount of lip rounding, so a fully accurate transcription would be something like [ṯ͡ʃʷ]. But hardly any linguist transcribes this affricate with that much phonetic detail. It is almost never relevant that it is round, and the postalveolar location of the stop closure is implied by the fact that it has a postalveolar release; you cannot release a stop closure in a position different from where it is made. So the affricate is more commonly transcribed as [t͡ʃ]. As with [r] for the English rhotic, [t͡ʃ] is not technically accurate for most speakers, but it is typographically simpler and conveys all the crucial information needed to understand the transcription. The tie-bar on the affricate may also sometimes be left off in transcriptions, so [tʃ] is also a common transcription for this affricate.

Even without these issues, there is still usually no such thing as “the” correct transcription. Two pronunciations of the same word will always have some differences, because we live in a physical world where we cannot avoid slight imperfections and fluctuations, and we cannot capture all of those differences with the IPA. It is simply not designed for that level of phonetic detail. When such detail is important, it needs to be conveyed in other ways, such as with diagrams and numerical measurements (loudness in decibels, duration in milliseconds, etc.).

Transcribing English with the IPA

Despite all these pitfalls, it is still important to get some basic skill in transcription, and since this textbook is presented in English, English is a good starting point to give you something concrete in which to ground your understanding of transcription. However, this is much dialectal variation, so the transcriptions offered here are very general and may differ from English you are familiar with. We begin with consonants, where there is less variation across dialects.

Table 3.2 lists some plosives of English, with their IPA symbol (keeping in mind the principle of simplicity) and words containing each plosive in various positions, where possible. A full phonetic description is also given. The portion of the spelling that corresponds to the phone is in bold.

Table 3.2. English plosives and affricates.
symbol example
[p] pan rapid lap voiceless bilabial plosive
[b] ban rabid lab voiced bilabial plosive
[t] tan atop let voiceless alveolar plosive
[d] den adopt led voiced alveolar plosive
[t͡ʃ] chin batches rich voiceless postalveolar affricate
[d͡ʒ] gin badges ridge voiced postalveolar affricate
[k] can bicker lack voiceless velar plosive
[ɡ] gain bigger lag voiced velar plosive
[ʔ] uhoh voiceless glottal plosive

Most of these are straightforward. As discussed in Section 3.3, the alveolar consonants are normally apicoalveolar, but some speakers may pronounce them with the blade of the tongue. If that detail is necessary, they can be transcribed as [t̻] and [d̻], using the laminal diacritic. Alternatively, some speakers may pronounce them on the back of the teeth, in which case they would be transcribed as [t̪] and [d̪], using the dental diacritic.

The glottal plosive (also frequently called a glottal stop) is only a marginal consonant in English, showing up as the catch in the throat in the middle of the interjection uh-oh. Some speakers also have it elsewhere, such as in the middle of the British English pronunciation of the word bottle. It is articulated by making a full stop closure with the vocal folds, blocking all airflow through the glottis.

Table 3.3 lists some fricatives of English.

Table 3.3. English fricatives.
symbol example
[f] fan wafer leaf voiceless labiodental fricative
[v] van waver leave voiced labiodental fricative
[θ] thin ether truth voiceless interdental fricative
[ð] than either smooth voiced interdental fricative
[s] sin muscle bus voiceless alveolar fricative
[z] zone muzzle buzz voiced alveolar fricative
[ʃ] shin Haitian rush voiceless postalveolar fricative
[ʒ] Asian rouge voiced postalveolar fricative
[h] hen ahead voiceless glottal fricative

The most notable variation here is that some speakers do not have [θ] and [ð], and instead used [t] and [d] or [f] and [v], depending on the dialect and the position in the word. As with the postalveolar affricates mentioned before, the postalveolar fricatives are also usually somewhat rounded, so they could be more narrowly transcribed as [ʃʷ] and [ʒʷ]. The voiced postalveolar fricative [ʒ] is also one of the rarest consonants in English, and many speakers pronounce it as an affricate in some positions instead of a fricative.

Table 3.4 lists some sonorants of English. Note that the sonorants of English are generally voiced, so that is not listed here. Across the world’s spoken languages, sonorants are tend to be voiced by default, because their high degree of airflow causes the vocal folds to spontaneously vibrate if extra effort is not put in to keep them still.

Table 3.3. English sonorants.
symbol example
[m] man simmer ram bilabial nasal stop
[n] nun sinner ran alveolar nasal stop
[ŋ] singer rang velar nasal stop
[l] lane folly ball alveolar lateral approximant
[r] run sorry bar (various! see earlier discussion)
[j] yawn reuse palatal central approximant
[w] won awake labial-velar central approximant

A few of these sonorants warrant special attention. The alveolar nasal stop [n] has much of the same variation as the alveolar plosives, with some speakers having a laminoalveolar articulation [n̻] and some having a dental articulation [n̪]. The velar nasal stop is often one of the most surprising phones of English to people who are new to phonetics, because is not easily identifiable as its own phone. Many people are mislead by the spelling and think they say words like singer with [ɡ], but in fact, most speakers have only a nasal there, so that singer differs from finger, with singer having only [ŋ] and finger having [ŋɡ]. However, there are speakers who do genuinely pronounce all words like these with a [ɡ] after the nasal stop, but even then, the nasal stop they have is still velar [ŋ], not alveolar [n].

A notable consonant here is [w], which is special among the consonants of English in being doubly articulated, which means that it has two equal places of articulation. It is both bilabial (with an approximant constriction between the two lips) and velar (with a second approximant constriction between the tongue back and the velum). Its place of articulation is usually called labial-velar. English used to have two labial-velar approximants, a voiced [w] and a voiceless [ʍ]. Very few speakers today have both of these, but those who do pronounce the words witch and which differently, with voiced [w] in witch and voiceless [ʍ] in which.

Next, we can move to the vowels. This is where much of the variation in pronunciation occurs across English dialects, and fully describing all the vowels across English could take up a textbook of its own. Table 3.5 lists some monophthongs of English, with a focus on the English vowels as they are broadly pronounced across North American dialects. However, there is still much variation just in North America, and this discussion should not be taken to represent any particular speaker or region, let alone any sort of idealized standard or target. This is simply a convenient abstraction that provides a useful baseline, though it is still only a very rough guide, and individual speakers can vary quite a lot from what is discussed here. The vowels of English are generally all voiced and oral, so that is not listed here. Example words are given that show the vowel in a stressed syllable, an unstressed syllable, and at the end of the word (see Sections 3.10 and 3.11 for more about syllables and stress).

Table 3.4. English monophthongs.
symbol example
[i] beater radius see high front unrounded tense
[ɪ] bitter high front unrounded lax
[e] baker say mid front unrounded tense
[ɛ] better mid front unrounded tense
[æ] batter low front unrounded
[ɑ] father saw low back unrounded
[ɒ] bonnet saw low back round
[ɔ] border saw mid back round lax
[o] boater sew mid back round tense
[ʊ] booker hiɡh back round lax
[u] boomer manual sue hiɡh back round tense
[ʌ] butter mid central unrounded lax
[ə] animal sofa mid central unrounded lax

As noted before, there is a lot of variation that cannot be adequately discussed here, so we only cover a few notable deviations. First, while many speakers pronounce the four tense vowels as monophthongs as transcribed here, many speakers pronounce some or all of them as diphthongs instead, perhaps even having an approximant at the end rather than a vowel. For example, high front unrounded tense [i] may be pronounced more like [ɪi] or [ij] by some speakers. It is especially common for the two tense mid vowels to be pronounced as diphthongs, something like [eɪ] and [oʊ] or perhaps [ej] and [ow].

Many of the back round vowels, especially [ʊ], are fronter and/or unrounded for some speakers in some dialects. The back vowels in bore and bought are pronounced similarly to each other by some North Americans, and so they are often represented with the same symbol [ɔ], though note there may still be some differences, with [ɔ] before a rhotic often pronounced somewhat higher, closer to [o]. However, many speakers in Canada and in the western United States have a very different vowel in bought from bore. Their bought vowel is much lower, and for some speakers, it is also unrounded. These speakers use the same low vowel in bought that they use in bot. For most North Americans, the low vowels in bot and balm are pronounced the same, either as back round [ɒ] or back unrounded [ɑ]; in some dialects, it may be central unrounded [a]. Others have two different vowels for these words, usually [ɒ] in bot and [ɑ] or [a] in balm. Needless to say, this part of the vowel system of English is particularly troublesome, and even many expert linguists get aspects of it wrong.

The two central vowels [ʌ] and [ə] are often treated as related pronunciations of the same vowel, based on whether or not they occur in a stressed syllable (again, see Sections 3.10 and 3.11 for more about syllables and stress). For now, just note that some vowels of English are pronounced louder and longer than others, which we call “stress”, while the other vowels are said to be unstressed. We can see the difference in stress in pairs like billow and below, which differ mostly in which syllable is stressed: the first syllable in billow and the second syllable in below. The two central vowels of English differ in stress: the first syllable of the name Bubba is stressed, and the second is unstressed, so we might transcribe this name as [bʌbə]. However, these two vowels sound very similar and could easily be notated with the same symbol [ə]. However, there is a long tradition of notating the unstressed central vowel of English with [ə] and the stressed central vowel with [ʌ], based on historical pronunciations in which the stressed vowel used to be pronounced much farther back (and still is, in some dialects).

Finally, we consider diphthongs and syllabic consonants, which are phones that have consonant-like constrictions, but which function more like vowels within English. Some diphthongs and syllabic consonants of English are given in Table 3.5.

Table 3.5. English diphthongs and syllabic consonants.
symbol example
[aɪ] biter sigh low central unrounded to high front unrounded diphthong
[aʊ] browner how low central unrounded to high back round diphthong
[ɔɪ] boiler soy mid back round to high front unrounded diphthong
[r̩] burning interval sir syllabic rhotic
[l̩] hazelnut saddle syllabic alveolar lateral approximant
[n̩] calendar sudden syllabic alveolar nasal stop
[m̩] bottomless seldom syllabic bilabial nasal stop

For the diphthongs, the symbols used here represent a rough average over where they typically start and end, but the actual pronunciation varies quite a lot from speaker to speaker and even for the same speaker. The low starting point for [aɪ] and [aʊ] may be closer to [ɑ] or [æ], the mid back starting point for [ɔɪ] may be closer to [o], the hiɡh front ending point for [aɪ] and [ɔɪ] may be closer to [i] or [j], and the hiɡh back ending point for [aʊ] may be closer to [u] or [w].

Syllabic consonants are transcribed by using the syllabic diacritic [ˌ] under the relevant consonant symbol. However, sometimes these are transcribed with a preceding [ə], so that hazlenut could be transcribed either as [hezl̩nʌt] or as [hezəlnʌt]. Syllabic rhotics (also called rhotacized vowels or r-coloured vowels) are so common that they have their own dedicated symbols: [ɝ] for stressed syllables and [ɚ] for unstressed syllables. Thus, burning could be transcribed as [br̩nɪŋ] or [bɝnɪŋ], while interval could be transcribed as [ɪntr̩vl̩] or [ɪntɚvl̩].

With all of this variation, not just in pronunciation, but in transcription choices by individual linguists, it can be difficult to figure out what is really intended by a given transcription. This is why when exact phonetic details matter, it is a good idea not to rely just on the IPA, but to include prose descriptions, midsagittal diagrams, and other tools that can help clarify exactly what is meant.

Check your understanding


Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Essentials of Linguistics, 2nd edition by Catherine Anderson; Bronwyn Bjorkman; Derek Denis; Julianne Doner; Margaret Grant; Nathan Sanders; and Ai Taniguchi is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.

Share This Book