Essentials of Linguistics, 2nd edition

Essentials of Linguistics, 2nd edition

Catherine Anderson

Bronwyn Bjorkman

Derek Denis

Julianne Doner

Margaret Grant

Nathan Sanders

Ai Taniguchi


Hamilton, Ontario

Essentials of Linguistics, 2nd edition

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Essentials of Linguistics, 2nd edition by Catherine Anderson; Bronwyn Bjorkman; Derek Denis; Julianne Doner; Margaret Grant; Nathan Sanders; and Ai Taniguchi is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.

Accessibility Statement


This textbook conforms to the Accessibility requirements of eCampus Ontario’s Virtual Learning Strategy. In addition, the authors understand the importance of accessibility and strive to not only meet the minimum accessibility standards, but to research and develop accessibility standards particular to linguistic diagrams, formalisms, and notation.

Below we list several of the measures we have taken to increase accessibility.

We welcome feedback both in terms of what is working well and for issues that we overlooked or can be improved. We can be emailed at

Note: As of “soft launch” on February 28th, 2022, not all of these accessibility features have been implemented. However, we plan to have them fully implemented by our full launch in June 2022. We still welcome feedback at this preliminary stage!


About the Authors


TiLCoP Canada

The members of the Teaching in Linguistics Community of Practice (TiLCoP) are instructors of linguistics at universities in Canada. We meet regularly to talk about our teaching practice and our pedagogical research. TiLCoP came together in 2020 to share resources and support at a time when we were all grappling with a rapid shift in teaching modality in response to the pandemic. Several exciting projects have grown out of our collaboration, including Word to the Whys (a companion podcast for Intro Linguistics courses), a special issue of the Canadian Journal of Linguistics, and of course this textbook!

Catherine Anderson

Catherine Anderson, a white woman with short blonde hair, blue eyes and glasses. She's wearing a maroon top and grey jacket.

Catherine Anderson (she/her) is an Associate Professor, Teaching Stream, in the Department of Linguistics & Languages and the Director of the Gender & Social Justice program at McMaster University. She earned a PhD in Linguistics from Northwestern University in 2004, and a BA from McMaster in the department where she is now a faculty member. The thread that connects her wide-ranging teaching and research interests is partnership: collaborating with learners and colleagues to further justice and to make learning accessible and enjoyable. Catherine lives with her wife and their twin teenage sons in Hamilton, on the territory governed by the Dish With One Spoon wampum agreement between the Haudenosaunee and Anishinaabe Nations.

Bronwyn Bjorkman

Photo of Bronwyn Bjorkman, a white woman with short brown hair.

Bronwyn Bjorkman (she/her) is an Associate Professor, Research Stream, in the Department of Languages, Literatures, and Cultures at Queen’s University, located in the traditional shared territories of the Haudenosaunee and Anishinaabe peoples. She received her PhD from MIT in 2011. Her research explores the interfaces between phonology, morphology, syntax, and semantics, focusing on how how information is represented and transferred between formal modules of grammar. Her research has appeared in journals such as Linguistic Inquiry, Natural Language and Linguistic Theory, and Glossa. The COVID-19 pandemic shifted her belief in the value of virtual and remote community into ongoing work on building meaningful social connection into hybrid and virtual events both inside and outside academia.

Derek Denis

Portrait photo of Derek Denis, a white man with short brown hair and a beard.

Derek Denis (he/him) is a tenure-stream, Assistant Professor, in the Department of Language Studies at the University of Toronto Mississauga, located within Dish With One Spoon territory and the treaty lands of the Mississaugas of the Credit. He received his PhD from the University of Toronto in 2015. His research examines language change and innovation from variationist and sociocultural linguistic perspectives, most recently focussing on the influence of immigrant youth in the emergence of a multiethnolect in Toronto. His work has appeared in Language, Language Variation and Change, American Speech, and the Journal of Multilingual and Multicultural Development among venues. He lives in Toronto but spends as much time as possible at the cottage with his partner.

Julianne Doner

A portrait of Julianne Doner, a white woman with glasses, brown hair tied back, a grey knit handband and maroon coat.

Julianne Doner (she/her) is a sessional instructor at several institutions, including the University of Toronto Mississauga, the University of Guelph-Humber, the University of Victoria, and Seneca College. Julianne has also worked in incorporating Writing Across the Curriculum in the Linguistics department at the University of Toronto and provided support and training to instructors as they transitioned to online learning due to the COVID-19 pandemic at the University of Toronto St. George and Mississauga linguistics departments. She earned her PhD in Linguistics in 2019 from the University of Toronto, with a dissertation on cross-linguistic variation in clausal architecture, incorporating data from more than two dozen languages from over ten different language families. Julianne is currently working on directional particles in Niuean (Polynesian) and a cross-linguistic analysis of the cluster or properties associated with verb-initial word order, including doing fieldwork on K’iche’ (Mayan) in Guatemala. Julianne plays the violin and enjoys dancing Lindy Hop.

Margaret Grant

Photo of Margaret GrantMargaret Grant (she/her) is Lecturer in the Department of Linguistics and in the Cognitive Science Program at Simon Fraser University, with campuses on the unceded territories of the xʷməθkʷəy̓əm (Musqueam), Sḵwx̱wú7mesh Úxwumixw (Squamish), səl̓ilw̓ətaʔɬ (Tsleil-Waututh), q̓íc̓əy̓ (Katzie), kʷikʷəƛ̓əm (Kwikwetlem), Qayqayt, Kwantlen, Semiahmoo and Tsawwassen peoples. She got her start in Linguistics as an undergraduate student at McGill University, where she earned her BA. Margaret graduated with a PhD in Linguistics from the University of Massachusetts, Amherst in 2013, specializing in psycholinguistics. Her research has focused on sentence comprehension, using a variety of experimental methods including the recording of eye movements during reading. When she isn’t introducing students to the study of language, mind and brain, she enjoys spending time outdoors with her family.

Nathan Sanders

A portrait of Nathan Sanders, a bald white man with glasses, in his office.

Nathan Sanders (he/him) is an Assistant Professor, Teaching Stream, and Associate Chair Undergraduate in the Department of Linguistics at the University of Toronto (St. George), located in the traditional territory of many nations, including the Mississaugas of the Credit, the Anishinaabe, the Wendat, and the Haudenosaunee. He studied mathematics and linguistics at the Massachusetts Institute of Technology and earned his MA (2000) and PhD (2003) from the University of California, Santa Cruz, with a dissertation on phonology and sound change in Polish. He works on the phonetics and phonology of signed and spoken languages, historical phonology, linguistic typology, and innovative and inclusive teaching in linguistics. He has published articles in LanguageSign Language & Linguistics, Natural History, and Journal for Research and Practice in College Teaching, and he is co-editor of the book Language Invention in Linguistics Pedagogy (2020, Oxford University Press). He lives in downtown Toronto with his extensive board game collection.

Ai Taniguchi (谷口愛)

photo of Ai Taniguchi; Japanese woman in her 30s with black mid-length hair, wearing a maroon turtleneck top under a raspberry red blazer. Back ground features teal and gold abstract art, hung on a forest green wall.Ai Taniguchi (she/her) is an Assistant Professor, Teaching Stream in the Department of Language Studies at the University of Toronto Mississauga, located within Dish With One Spoon territory and the treaty lands of the Mississaugas of the Credit. She earned her PhD in Linguistics in 2017 from Michigan State University, with a dissertation that examined various types of non-at-issue meaning in Japanese and English. Her research specialisation is in formal semantics, formal pragmatics, and the semantics-sociolinguistics interface. She is the winner of the 5-Minute Linguist competition at the 2019 Linguistic Society of America annual meeting, and is an advocate of public outreach and education in linguistics, innovative online teaching, and diversity/inclusion in linguistics. She enjoys drawing and making digital comics, and lives in Toronto with her partner and their two cats, Pancakes and Waffles.



This project is made possible with funding by the Government of Ontario and through eCampusOntario’s support of the Virtual Learning Strategy. To learn more about the Virtual Learning Strategy visit:


Essentials of Linguistics, 2nd Edition is the work of a large team of authors, and an even larger team of supporters whose contributions have made our work immeasurably better.

Our fellow TiLCoP members Daniel Currie Hall, Liisa Duncan, Martin Kohlberger, and Tim Mills offered pedagogical insight and wise counsel from their experience as teachers and researchers of linguistics. This work is enriched by our many conversations with them.

The conversations in Chapter 9: Reclaiming Indigenous Languages are thanks to the generosity and deep wisdom of language experts Chantale Cenerini, Rae Anne Claxton, Mary Ann Corbiere, and David Kanatawakhon-Maracle.

Thoughtful reviews and feedback from our colleagues Tahohtharátye Joe Brant, Jon Henner, Curt Anderson, Barend Beekhuizen, Nathan Brinklow, Kathleen Currie Hall, Masako Hirotani, Henry Ivry, Suzi Lima, Megan Lukaniec, Sara Mackenzie, Alan, Munn, Will Oxford, Pedro Mateo Pedro, and Nicole Rosen allowed us to make substantial improvements to our initial drafts.

Aileen Lin created artwork. David Wiesblatt recorded the ASL video examples. Bartłomiej Czaplicki and Deepam Patel created audio. Bianca James and Tata Ruffle in McMaster’s Department of Linguistics and Languages administered the grant funds. Kate Brown, Accessibility Program Manager at McMaster, advised us on accessible design. We are grateful for their work.

Most of all, we extend our thanks to all our linguistics students, past and present, for inspiring us to strive for excellence and justice in teaching linguistics.


A Note to Instructors


Thank you for considering Essentials of Linguistics, 2nd edition for your Linguistics course! This Open Educational Resource is designed to function as a  stand-alone textbook or as a supplement to a traditional textbook.  It is suitable for an in-person, hybrid, or online course. Because this is an entirely open resource, its content is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License; therefore, you are free to redistribute, revise, remix, and retain any of the parts of the book, provided you attribute the source material.

If you adopt this eBook for your course, either as the required text or as an optional supplement, we ask you to do two easy tasks:

  1. Please fill out this adoption form to allow our funder to track the book’s use.
  2. Send an email to to receive an Instructor’s package that includes answer keys for the exercises at the end of each chapter.


Chapter 1: Human Language and Language Science


In this chapter, we begin to explore what language is, and how language scientists (also known as linguists) think about it and observe it. It might well be that most of experience learning about language has had to do with rules that you could get right or wrong. That’s not the approach we’re taking in this book. Instead, we’re going to look at how to use the tools and techniques of linguistics to observe the patterns of human languages. From these observations, we’ll try to draw some conclusions about the abstract principles and organization of human language in people’s minds and in language communities. Along the way, we’ll also consider the ways that language science and people’s attitudes about language have bolstered colonial structures of power and privilege and have been used to do harm.

When you’ve completed this chapter, you’ll be able to:

  • Differentiate between prescriptive and descriptive ways of thinking about language,
  • Identify components of mental grammar,
  • Explain some properties of all human languages,
  • Describe some techniques for doing language science, and
  • Discuss the ethics of doing language science.


1.1 What even is language?


One or more interactive elements has been excluded from this version of the text. You can view them online here:

We’re all users of language. Right now I’m talking and writing to you in a variety of Canadian English, and I bet many of you also know one or more other languages. Linguistics is the scientific study of human language. That definition is short, but it’s not exactly simple, is it? How do we study language scientifically? And what even is language?

The word language is used for several different complex concepts that are interconnected with each other. One use of the word is to refer to individual languages, like American Sign Language (ASL), Basque, English, Langue des signes québecoise (LSQ), Nishnaabemwin, Xhosa, and many others.

The word language can also refer to other, related notions. If you’re a programmer, you might have a section on your résumé that lists the computer languages you know, like Python, R, C++, or Perl. Computer languages are not usually the focus of linguistics, even though many linguists use them to analyse linguistic data! There are also metaphorical uses of the word language, such as body language or love languages. These uses of the word are also outside of what linguistics usually studies.

For the moment, let’s think about one particular language, because it happens to be the one we’re using now. I’m using one variety of English (a variety spoken by a middle-aged white lady in Ontario). Before I started making this video, I used my fingers to type words on my keyboard. Now as I read those words and talk to you, I’m squeezing the air out of my lungs; I’m vibrating my vocal folds, and I’m manipulating parts of my mouth to produce sounds. Those sounds are getting captured by a microphone and recorded on my computer, then I’ll upload them to the eBook. If you’re listening to this video, the sounds I recorded are playing on your device, and your eardrums are reacting to the auditory information. If you’re reading the text or the captions, your eyes are reacting to the visual information. Your eyes and your ears send signals to your brain. And somehow, after all that, if my communication was successful, you end up with an idea in your mind that’s similar to the idea in mine. There must be something that we have in common to allow that to happen: some shared system that allows us to understand each other’s ideas through language. This shared system is what many linguists call the mental grammar, and one of the goals of linguistics is to find out what that shared system is like.

So we’ve focused our definition of linguistics a little bit, by saying that we’re interested in the scientific study of human language, of the grammar, the shared system that allows us to understand each other. What is the grammar like? Or to put it another way, what do we know when we know a language?

What is grammar?

Imagine you’re an alien, you’ve just arrived on Earth, and you need to figure out how to understand the language used in the particular earthling community that you’ve landed in. What kinds of things do you need to figure out? One of the first things you’ll need to know about that language is what counts as talking. Is this language signed or vocalized? In other words, what is the modality of the language? Many human languages are vocalized (or “spoken”). In this modality, language users make sounds with their larynx, tongue, teeth and lips, and receive sounds with their ears. Other human languages are signed. Language users make signs with their fingers, hands, wrists and forearms, and receive signs by sight or by touch. Even though they have very different modalities, sign and vocal languages share many properties in their grammars. In this book, we’ll try to reserve the words speaking and speech for vocal languages, and refer to language users when we’re talking about languages of any modality. In other places you might also see languaging used as a verb to mean “using language in any modality”.

Eight chocolate chip cookies on a slightly crumpled piece of white paper.
Figure 1.1. Cookies.

Once you’ve figured out the modality, what next? You probably need to segment the stream of auditory or visual information into meaningful units. By observing carefully, you might be able to figure out that a particular sequence of sounds or gestures recurs in this language, and that some consistent meaning is associated with that sequence. For example, maybe you’ve noticed that the language users you’ve encountered make the sounds “cookie” as they’re offering you a round, sweet, delicious baked good. Or maybe you’ve noticed that when that word has a z sound at the end of it, cookies, you’re being offered more than one of them!

The part of the grammar that links up these forms with meanings is the mental lexicon. It’s a bit like a dictionary in your mind. Knowing a word in a language involves recognizing its form – the combination of signs or sounds or written symbols, and its meaning. For the majority of words in the world’s languages, the link between form and meaning is arbitrary.

A small orange pumpkin in a bin with a variety of other squashes.
Figure 1.2. Pumpkin.

For example, the English word for this thing is pumpkin and the Nishnaabemwin word is kosmaan. There’s nothing inherently orange or round or vegetabley about either of those word forms: the pairing of that meaning to that form is arbitrary in each language. (But there are words whose form has an iconic, less arbitrary relationship their meaning; we talk about them more later in this book.)

Suppose you’ve figured out that cookies are delicious and you want to ask your earthling hosts for more of them. To do that, you need to figure out how to control the muscles of your mouth, tongue, and lips to speak the word for cookie, or how to use your hands, fingers, wrists and forearms to sign the word. In other words, you need to know something about the articulatory phonetics of the language. This brings up an important point about grammar: when we know a language fluently, a lot of our grammatical knowledge is unconscious, or implicit. For the languages that you know, your knowledge of the lexicon is probably fairly conscious or explicit, and probably also some of your knowledge about your language’s morphology: that’s the combinations of meaningful pieces inside words (like how if you want more than one cookie you say cookies with a z). But you’re probably not as conscious of things like how you use your articulators to make the sounds k or z.

Our implicit knowledge of language also includes phonology, information about how the physical units of language can be combined and how they change in different contexts. Syntax is the part of your mental grammar that knows how words can or can’t be combined to make phrases and sentences, much of which is implicit. Syntax works hand in hand with semantics to allow the grammar to calculate the meanings of these phrases. And the pragmatics part of the mental grammar can help you to know what meanings arise in different contexts. For example, “I have some news,” could be interpreted as good news or bad news depending on the context.

All of these things are parts of the grammar: the things we know when we know a language. But a lot of this knowledge is implicit, and the thing about implicit knowledge is that it’s hard to observe. One of the most important jobs we’re doing in this textbook is trying to be explicit about what mental grammar is like, and about what kinds of evidence we can use to figure that out. We’ll talk about this challenge more in Section 1.3 below.

What about reading and writing?

I bet you’re wondering why I didn’t include reading and writing as part of the mental grammar above. After all, as a student you probably invested a lot of time into learning how to read and write. And those skills are indeed part of the grammatical knowledge you have about your language. But language users don’t actually need to know how to read and write to have a mental grammar. It’s common for kids in Canada to start learning to read and write around age five, but they are pretty competent in the phonetics, phonology, morphology, syntax and semantics of one or more languages before they ever go to school.

Vertical Mongolian Script
Figure 1.3a. Vertical Mongolian Script.


The word "Mongol" in Cyrillic script.
Figure 1.3b. Cyrillic Mongolian script.




Furthermore, language users could start using a different writing system without changing anything else about the grammar. Mongolian, for example, presently uses two different writing systems: the Cyrillic alphabet and traditional Mongolian script, which is written vertically. Speakers of Mongolian understand each other’s speech no matter which script they use to record the language in writing. And there are plenty of human languages that just don’t have written forms. Signed languages like ASL and LSQ, for example, don’t have written forms. Most signers are bilingual in their sign language and in the written form of another language.

So, because not every human language has a reading and writing system and not every language user has access to reading and writing systems, we consider these skills to be secondary parts of the mental grammar. If you’re literate in your language, then that literacy is certainly woven into your mental grammar. But literacy isn’t necessary for grammatical competence.

Check your understanding

An interactive H5P element has been excluded from this version of the text. You can view it online here:

1.2 What grammars are and aren't


One or more interactive elements has been excluded from this version of the text. You can view them online here:


The previous section was a very quick tour of some of the parts of the mental grammar. We’ll be discovering a lot more about grammar throughout this book. Notice that we’re using the term grammar a little differently from how you might have encountered it before. Maybe your experience of grammar is as a textbook or style guide with a set of rules in it, rules that lead to consequences if you break them — you’ll lose points on your essay or get corrected with a red pen. What we’re most interested in this book is the mental grammar: the system in your mind that allows you to understand and be understood by others who know your language. Every human language has a mental grammar: that’s how the users of each language understand each other!

This is a really important idea. One way that people sometimes express racist, colonialist and ableist ideas is to deny the validity of a language by claiming that it “has no grammar”. But the truth is that all languages have grammar. All languages have a system for forming words, a way of organizing words into sentences, a systematic way of assigning meanings. Even languages that don’t have alphabets or dictionaries or published books of rules have users who understand each other; that means they have a shared system, a shared mental grammar. Using linguists’ techniques for making scientific observations about language, we can study these grammars.

The other important thing to keep in mind is that no grammar is better than any others. Maybe you’ve heard someone say, “Oh, I don’t speak real Italian, just a dialect,” implying that the dialect is not as good as so-called real Italian. Or maybe you’ve heard someone say that Québec French is just sloppy; it’s not as good as the French they speak in France. Or maybe you’ve heard someone say that nobody in Newfoundland can speak proper English, or nobody in Texas speaks proper English, or maybe even nobody in North America speaks proper English and the only good English is the Queen’s English that they speak in England. From a linguist’s point of view, all languages and dialects are equally valid! There’s no linguistic way to say that one grammar is better or worse than another. This is part of what it means to study grammar from a scientific approach: scientists don’t rate or rank the things they study. Ichthyologists don’t rank fish to say which species is more correct at being a fish, and astronomers don’t argue over which galaxy is more posh. In the same way, doing linguistics does not involve assigning a value to any language or variety or dialect. We also need to acknowledge, though, that many people, including linguists, do attribute value to particular dialects or varieties, and use social judgments about language to create and reinforce hierarchies of power, privilege and status. We’ll look at some examples in Section 1.4 and in Chapter 2 we’ll go into more detail about the terms terms language, variety, and dialect.

One of the most fundamental properties of grammar is creativity. One obvious sense of the word creative has to do with artistic creativity, and it’s true that we can use language to create beautiful works of literature. But that’s not the only way that human language is creative. The sense of creativity that we’re most interested in in this book is better known as productivity or generativity. Every language can create an infinite number of possible new words and sentences. Every language has a finite set of words in its vocabulary – maybe a very large set, but still finite. And every language has a small, finite set of principles for combining those words. But every language can use that finite vocabulary and that finite set of principles to produce an infinite number of sentences, new sentences every single day.

A consequence of the fact that grammar is productive is that languages are always changing. Have you heard your teachers or your parents say something like, “Kids these days are ruining English! They should learn to speak properly!” Or if you grew up speaking Mandarin, maybe you heard the same thing, “Those teenagers are ruining Mandarin! They should learn to speak properly!” For as long as there has been language, older people have complained that younger people are ruining it. Some countries, like France and Germany, even have official institutes that make rules about what words and sentence structures are allowed in the language and which ones are forbidden. But the truth is every language changes over time. Languages are used by humans, and as humans grow and change, and as our society changes, our language changes along with it. Some language change is easy to observe in the lexicon: we need to introduce new words for new concepts and new inventions. For example, the verb google didn’t exist when I was an undergraduate student, and now googling is something I do nearly every day. Languages also change in their phonetics and phonology, and in their syntax, morphology and semantics. In Chapter 10 we’ll look at the systematic ways that linguists study variation and change.

An interactive H5P element has been excluded from this version of the text. You can view it online here:

1.3 Studying language scientifically


One or more interactive elements has been excluded from this version of the text. You can view them online here:

We said that linguistics is the science of human language. When we say that linguistics is a science, that doesn’t mean you need a lab coat and a microscope to do linguistics. Instead, what it means is that the way we ask questions to learn about language uses a scientific approach.

The scientific way of thinking about language involves making systematic, empirical observations. That word empirical means that we observe data to find the evidence for our theories. All scientists make empirical observations. Entomologists observe the life cycles and habitats of insects. Chemists observe how substances interact. Linguists observe how people use their language. Just like entomologists and chemists, linguists aim for an accurate description of the phenomenon they’re studying. And like other scientists, linguists strive to make observations that are not value judgments. If an entomologist observes that a certain species of beetle eats leaves, she’s not going to judge that the beetles are eating wrong, and tell them that they’d be more successful in life if only they ate the same thing as ants. Ideally, the same would be true of linguists — we wouldn’t go around telling people how they should or shouldn’t use language. Of course, like all scientists, and like all humans, linguists have biases that often prevent us from reaching this ideal; more on this later in the book. But the goal for doing language science is to do so with a descriptive approach to language, not a prescriptive approach, to describe what people do with their language, but not to prescribe how they should or shouldn’t do it.

For example, you could describe English plurals this way:

Adding -s to a noun allows it to refer to many of something, like apples, books, or shoes.

Or you could prescribe how you think people should form plurals this way:

Because the word virus is derived from Latin, you should pluralize it as viri, not viruses.This prescriptive statement doesn’t reflect what really happens in English, since most English speakers talk about viruses, not viri. And in fact, it doesn’t even reflect what happens in Latin, since the Latin word virus did not have a plural form!

So when we’re doing linguistics, our goal is to make descriptive, empirical observations of language. But one challenge to being a language scientist is that a lot of what you’re studying is hard to observe. Unlike our entomologist friends, we can’t just go out to the garden and poke around and find some grammar crawling on a plant. We have to figure out how to make observations about the mind. Throughout this book you’ll get introduced to the many different tools of language science, which allow us to make systematic observations of how humans use language.

Going meta: Observing what’s possible in a language

As I keep saying, a lot of the linguistic knowledge we have is unconscious. One of the tools we can use to get at our mental grammar is to try to access metalinguistic awareness, that is, the conscious knowledge you have about your grammar, not the grammatical knowledge itself. If you’ve studied a language in school you probably have some metalinguistic awareness about it because you got taught it explicitly. But for your first language, the one you grew up speaking, it can be a little more difficult to access your metalinguistic knowledge because so much of it is implicit. It’s a skill that we’ll keep practicing throughout this book.

Here’s an example of accessing your metalinguistic awareness. Say you want to create a new English word for a character in a game. Are you going to call your cute little creature a blifter or a lbitfer? Neither of those forms exists in English, but they both use sounds that are part of English phonetics. You probably have a strong feeling that blifter is an okay name for your new creature, while lbitfer is a pretty terrible name. Notice that your sense that lbitfer is wrong is not a prescriptive sense — it’s not that it sounds rude or you’ll get in trouble for combining those sounds that way. It just … can’t happen. You’ve made a descriptive observation that lbifter is not a possible word in English. From that observation, we can conclude that lbitfer is ungrammatical in English.

Since linguistics uses the word grammar in a particular way, the words grammatical and ungrammatical also have a specific meaning. An ungrammatical word or phrase or sentence is something that just can’t exist in a particular language: the mental grammar of that language does not generate it. Notice that grammaticality isn’t about what actually exists in a language; it’s about whether a form could exist. In this example, both blifter and lbitfer have the same sounds in them, but blifter could be an English word and lbifter couldn’t. In other words, blifter is grammatical in English and lbifter is ungrammatical in English.

It’s often useful to compare similar words, phrases or sentences to try to access our metalinguistic awareness. Let’s look at another example of observing what’s possible. Here are two similar sentences, both of which are possible (or acceptable) in English.

  1. Sam compared the forged painting with the original.
  2. Sam compared the forged painting and the original.

Let’s try to make questions out of these sentences:

  1. Did Sam compare the forged painting with the original?
  2. Did Sam compare the forged painting and the original?

Observing those two questions, we can see that both (c) and (d) are acceptable in English. Now let’s try a different kind of question:

  1. What did Sam compare the forged painting with?
  2. *What did Sam compare the forged painting and?

Comparing these two sentences gives us a really clear finding: (e) is possible, but (f) is not. We use an asterisk or star at the beginning of sentence (f) to indicate that it just can’t happen. These acceptability judgments (also sometimes known as grammaticality judgments) are our empirical observations: these two similar sentences are both possible as declarative statements (a-b) and as yes-no questions (c-d), but when we try to make a wh-question out of them, the result is acceptable for the first one (e) but not for the second one (f). Having made that observation, now our job is to figure out what’s going on in the mental grammar that can account for this observation. Why is (e) grammatical but (f) isn’t?

More tools for language science

Grey zip-up hoodie on white background.
Figure 1.4. Hoodie.

Because it can be tricky to access metalinguistic knowledge, you might not want to rely on the acceptability judgments of one single language user. Instead, you could use a survey to gather quantitative data about acceptability from many users. We can also use surveys to elicit the words that people use for particular items. From survey data we know that some people call this thing a sweatshirt, other people call it a hoodie, and people in Saskatchewan call it a bunny hug. Surveys are particularly useful for learning about regional variation, which you can learn more about in Chapter 10. If you’re studying regional and social variation you might also gather data using interviews, in which you could ask questions like, “Does the ‘u’ in student sound like the ‘oo’ in too or the ‘u’ in use?”.

A corpus is another tool that allows us to make language observations. A corpus is a big database that collects examples of language as used in the world, from books, newspapers, message boards, videos. Some corpora contain only written text, and others include video of signed language, or audio files with phonetic transcription. The nice thing about tools like acceptability judgments, surveys, and corpora is that they’re relatively easy to use: you don’t need a lot of training or money to ask people what word they use for athletic shoes, or to see how a word or phrase is used in a corpus. We’ll use some of these accessible tools throughout this book.

There are also more specialized tools for doing language science. Phoneticians use a variety of software for analyzing audio and video recordings of speakers and signers. Praat (Boersma & Weenink, 2022) is a popular waveform editor for analyzing audio recordings. While Praat is specialized for linguists, it has some similarities to audio-editing programs for podcasting. ELAN (ELAN | The Language Archive, 2021) is a powerful tool that allows a user to annotate video recordings, and the program SLP-Annotator (Lo & Hall, 2019) also enables phonetic annotations of video-recorded sign language. Some phoneticians also make anatomical measurements of the articulators, using ultrasound or palatography for speech or motion capture for signing.

We can draw on techniques from behavioural psychology to make observations about language use in real-time using experiments. You might measure reaction times and reading times for words and sentences, or ask participants to listen to words that are mixed with white noise. Some experiments use eye-tracking to measure people’s eye movements while reading a text, watching a signer, or listening to a speaker. It’s even possible to use neural imaging techniques like electroencephalography (EEG) and functional magnetic resonance imaging (fMRI) to observe brain activity during language processing. You can learn more about interpreting the data from these experiments in Chapter 13.

When you’re starting out in linguistics, it’s often really exciting to use the scientific method to think about grammar, as you start to see that grammar is not just a set of arbitrary rules to memorize so you sound “proper”. Even if we’re not peering through a microscope wearing a lab coat, the tools of language science allow us to make systematic observations of how humans use language. And we can interpret those observations to draw conclusions about the human mind.

An interactive H5P element has been excluded from this version of the text. You can view it online here:


Boersma, P., & Weenink, D. (2022). Praat: Doing Phonetics by Computer.

ELAN | The Language Archive (6.2). (2021). [Computer software]. Max Planck Institute for Psycholinguistics, The Language Archive.

Lo, R. Y.-H., & Hall, K. C. (2019). SLP-AA: Tools for Sign Language Phonetic and Phonological Research. Proceedings of Interspeech 2019, 3679–3680.


1.4 Doing harm with language science


Content Note: This section includes discussion of residential schools.


One or more interactive elements has been excluded from this version of the text. You can view them online here:

As exciting as it is to think about language scientifically, it’s important to remember that science is not inherently virtuous as a field. Humans can use the tools of science to do harm, and that includes the tools of linguistics. Much of the foundational work in the field of Linguistics was carried out by Christian missionaries whose goal was less to discover the systematic nature of mental grammar and more to convert people to their religion. Their work has several consequences for Indigenous languages.

The missionaries who first came to the land that is currently called Canada were European Christians. They started to document the Indigenous languages they encountered so that they could use them to teach Christian doctrine, and so that they could conduct trade and obtain resources. As they wrote down the words and structures they learned, they used the Roman alphabet and based their assumptions about how language worked on what they knew from studying Latin and other European languages. What this means is that the earliest written documents for these Indigenous languages described the languages through that point of view. The Roman alphabet (the same alphabet that English uses today) developed to represent European languages, so it’s not very accurate at representing the phonetics of other languages.

Once they had enough language written down, the missionaries started translating the Christian Bible into the local languages. Since written documents are permanent in a different way from speech and sign, writing a text has the effect of “freezing” that form of the language. So when the Europeans started teaching literacy using their written texts, the result was that some of the variation across languages fell out of use as the written forms took priority. And these effects weren’t accidental or benign. From the missionaries’ own writings we can see that they considered Indigenous languages to be inferior to European languages. They complained, incorrectly, that the languages didn’t have words for soul and belief and angel, and they thought that the complex grammars, which we’ll learn more about in later chapters, were barbaric. In the History of the Language Sciences, Edward Gray writes:

“Jesuits generally derided the languages, characterizing the polysynthetic character of American languages as a symptom of social decay. In keeping with their heathen character, missionaries widely assumed, [Indigenous people] had failed to impose grammatical discipline on their languages.” (Gray 2000, p. 934)

On the one hand, we might point to the work of these European Christian missionaries in documenting Indigenous languages as foundational to the field of linguistics. And in some cases, those written documents have served as source material for work to reawaken sleeping languages like Huron-Wendat. But at the same time, we have to acknowledge that the missionary work did real harm: the documentation itself was inaccurate and led to the loss of many features of the languages. As the Christian church gained power on this continent, they stopped trying to teach in the local languages and instead imposed English or French, often violently. In fact, eliminating Indigenous languages and cultures was the stated goal of the Canadian government. Well into the 20th century, police, church and government officials forcibly seized Indigenous children, removed them from their families, and sent them to residential schools. In these schools, children were separated from their siblings and cousins and forbidden to speak their families’ languages. They were starved, physically and sexually abused, and some of them murdered. Under these conditions, it’s not surprising that they stopped using their family languages: using English or French was a matter of sheer survival. In spite of the colonial government’s attempts to assimilate Indigenous people into “the habits and modes of thought of white men” (MacDonald, quoted in TRC (2015)), some Indigenous languages still have living speakers, while others are asleep. You can learn more about the work that Indigenous people are doing to reclaim their languages in Chapter 9.

When we’re doing language science, it might be tempting to try to dissociate ourselves from the harm those missionaries did, to say, “they were doing religion, not linguistics.” But modern scientific practices of linguistics have also done harm to Indigenous and other minoritized languages. Linguists rely on language users to provide language data, but those who spend their time and energy answering our questions don’t always get much in return. Sometimes linguists gather data to test a particular scientific hypothesis, and the data ends up existing only in obscure scholarly publications when it could also have been made available to the community of language users themselves, for preserving and teaching their language. Sometimes what is merely data to a linguist is a sacred story or includes sensitive personal information, and publishing it might violate someone’s beliefs or privacy. Even if a linguist is careful to work descriptively, there’s a real risk of linguistic and cultural appropriation if they become the so-called authority on the language without being a member of the language community. And sometimes linguists’ attempts at descriptive statements can turn into prescriptive norms: if a linguist writes “In Language X, A is grammatical and B is ungrammatical” based on what they’ve learned from one set of speakers, that observation can become entrenched as the standard variety of Language X, even if there’s another group of speakers out there for whom B is perfectly grammatical.

It’s not only in settler states like Canada that languages are harmed by colonialism. Capitalism offers a strong incentive for people all around the world to speak English so they can participate in the labour market. And the more they use English, the less they use their local languages. In Chapter 12 we’ll consider some of the ways that the field of Applied Linguistics and the teaching of English as a Foreign Language (EFLAlso known as English as a Second Language (ESL) or English as an Alternate Language (EAL).) can reinforce racist norms.

As a field, linguistics is also responsible for harms to disabled people and their language practices. In Chapter 11 we’ll see how deaf kids are often deprived of language input because of oralism, the view that vocal language is more important than signed language. Oralism is prevalent in the field of linguistics, which often fails, like the first edition of this book did, to study or teach the linguistic structures of sign languages. The practice of observing patterns of language across many users, even from a descriptive point of view, has the tendency to identify norms of language use which then makes it all too easy to describe anything that differs from the norm as disordered. For example, Salt (2019) showed that when linguists used standard interview techniques to research autistic people’s conversation, they found “deficits” in their pragmatic abilities. But when the autistic participants were observed in conversation with each other, no such deficits were apparent. Salt concluded that it was the research method itself, namely, the interview, that gave rise to the so-called pragmatic disorders of autism. Similarly, MacKay (2003) reported his experience of aphasia resulting from a stroke. His account eloquently illustrates how the standard diagnostic and treatment techniques ignored his communicative adaptations and treated him as incompetent.

What’s the lesson for us, then, as 21st-century linguists? I’m going to aim for some humility in my scientific thinking. I love using the tools of science to observe language. But I try to remember that science is one way of knowing, which brings its own cognitive biases. In other words, doing linguistics is not a neutral exercise. One of the fundamental lessons of this book is to move from thinking about grammar as a set of prescriptive rules in a book to seeing grammar as a living thing in our minds. But let’s not get stuck in that way of thinking either. In addition to thinking about language as something that lives in the individual minds of individual humans, let’s also remember that language is something that lives in communities and is shared among users, in the conversations we have and the stories we tell. We’ll continue to explore these ways of thinking about language throughout the rest of this book.

An interactive H5P element has been excluded from this version of the text. You can view it online here:


Gray, E. G. (2000). Missionary linguistics and the description of ‘exotic’ languages. In S. Auroux, E. F. K. Koerner, H.-J. Niederehe, & K. Versteegh (Eds.), History of the Language Sciences (Vol. 1). Walter de Gruyter.

Mackay, R. (2003). ‘Tell them who I was’: The social construction of aphasia. Disability & Society, 18(6), 811–826.

Salt, M. (2019). Deficits or Differences? A New Methodology for Studying Pragmatic Language in Autism Spectrum Disorder [Thesis].

The Truth and Reconciliation Commission of Canada. (2015). Honouring the Truth, Reconciling for the Future: Summary of the Final Report of the Truth and Reconciliation Commission of Canada.

1.5 Doing good with language science


One or more interactive elements has been excluded from this version of the text. You can view them online here:

In the previous section we tried to acknowledge the ways that linguistics has done and continues to do harm, like many fields of academic inquiry. Acknowledging those harms is only part of our responsibility. In this book, we’re trying to focus on ways we can use the tools of language science to address some of those harms and even more importantly, do some good in the world. We also hope that working with this book will make you excited to carry on doing linguistics! So let’s think about some of the things linguistics can prepare you to do.

In the tech sector, people with linguistics training use their skills to improve software that summarizes texts, translates from one language to another, synthesizes natural-sounding speech for your voice assistant or your GPS, helps your voice assistant understand your speech! As I’m writing this book, speech recognition systems do an okay job on standardized American English accents, especially when spoken by lower voices, but are much less accurate for higher voices and for the many different accents that English speakers use. Maybe you’ll be one of the linguists who pushes back against these biases that are built into the algorithms!

Speaking of tech, another field where language science is valuable is in developing language-learning apps. That owl that scolds you if you skip your daily Esperanto practice was designed by linguists! Many people who are learning a new language find that their learning is enhanced by gaining the kind of metalinguistic awareness that you’ll acquire from this book.

That brings us to another really important area where linguistics is important: in supporting Indigenous people who want to reclaim, revive, or revitalize their languages. As we’ll see in later chapters, linguistic analysis of these grammars can be useful for creating teaching materials and supporting adult language learners who did not have the chance to learn their languages as children. Speaking for myself as a settler, I would want to be careful not to position myself as the expert who’s here to save the language! Instead, I’d want to follow the lead of Indigenous community members in deciding how and where to deploy my linguistics skills.

Linguistics training is not only good for language learning, but also for language teaching! Studying linguistics is often a good entry point to getting certified as an ESL teacher, or learning how to teach any other language for that matter.

A lot of students are drawn to studying linguistics because they want to pursue a clinical career in speech-language pathology. Ideally, evidence from language science informs the treatments that clinicians offer. For example, if someone has a brain injury, their ability to produce or understand language might be impaired, and speech therapy can sometimes recover some of that function. Or a trans person who wants their voice to sound different might seek the advice of a speech-language pathologist as part of their transition. Some clinicians take their careers in a more Hollywood direction and offer accent or dialect coaching for actors!

Linguists find their skills called upon in many other industries. I personally know linguists who have been paid for their expertise in:

Language is everywhere. It’s fundamental to how humans interact with each other, so understanding how language works is part of understanding people. And understanding people just might be a step towards doing some good in the world.

An interactive H5P element has been excluded from this version of the text. You can view it online here:

1.6 Exercise your linguistics skills


Exercise 1. The terms first language and L1 (or sometimes, native language) refer to the language you learned from the people around you from your very early childhood. Many people have more than one L1. What is your L1? Do you have more than one? Make two scientific observations about your L1. Remember that scientific observations are descriptive, not prescriptive.

Exercise 2. Pretend you’re working for a start-up that has developed a cool new product. Your company turns to you, the in-house linguist, to come up with a name for this new product. It has to be a unique name that doesn’t already exist. What will you name your company’s cool new product?

Now, look at this list of product names generated by other students. Which of them are good product names and which aren’t?  What makes something a good name? How do you know?

mentocular swoodiei torrix baizan
jibberdab keerild euquinu tuitionary
kzen zirka hbiufk fluxon

Exercise 3. One of the many ways that mental grammar is generative is that it is always possible to create new words in a language. English often allows the creation of new verbs from existing nouns, even from proper names like in the following sentences:

      • We’re Megabussing to Montreal this weekend.
      • You can find out the answer by Googling.
      • The kids got Pfizered before going back to school.

Create three new verbs from English nouns (common nouns or proper names), and put each one in a sentence to illustrate its meaning.

Exercise 4. Think of a word that has only recently entered English, so it’s not yet in mainstream dictionaries. Observe some examples of the word being used in context, either in your regular conversations or by searching online.  Based on your observations of the word in context, write a dictionary definition of the word.


Chapter 2: Language, Power, and Privilege


Language is a central part of how we interact with one another as humans. Through language, we not only communicate ideas and information, we also express and construct aspects of our identity. That’s why we start this book by considering some of the social aspects of language.

When you’ve completed this chapter, you’ll be able to:

  • Describe the relationships between power and language in a variety of scenarios.
  • Find real-world examples of relationships between power and language.
  • Use your metalinguistic awareness to interpret your own and others’ attitudes about language.

2.1 Language and Identity


Language can change the world. For real!

Imagine you’re at the wedding of two friends. They’ve walked down the aisle, they’ve said lovely things about each other, they’ve exchanged rings, some people are shedding happy tears, and things are approaching the big moment: “I do.” “I do.” “I now pronounce you married”. They kiss, you cheer, and the world is a little bit different now. Just a few moments ago your friends were unmarried and now they are married! This has real-world, material consequences for them. Perhaps your friends filed separate taxes last year, now they must file together. Maybe they had distinct medical insurance policies, now one can be a dependent on the other’s plan. If they live in a common law country, they now have spousal privilege. All of these changes to the world can be traced to those three utterances: “I do” “I do” “I now pronounce you married”!  By uttering these words, your friends and the officiant have changed the world, ever so slightly.

There are other words and phrases, like pronounce and I do, that affect the world. For example, if you’re playing a game of chess and realize that your chance of winning is exceedingly low, you may tell your opponent “I concede”. The game is now over and you lost (sorry). After a successful job interview, if your hopefully-soon-to-be boss says “you’re hired!”, well congrats, you’ve got a new job now! These are examples of how we can “do things with words”, as the philosopher of language, J. L. Austin put it. They are examples of the performativity of language. Words and phrases like I concede are called performative speech acts. These are utterances that not only convey some kind of information but also perform a function or an action that affects reality. We will revisit performative speech acts in Section 8.9 when we discuss theories of meaning.

As much as performative speech acts are powerful in the sense that they change the world, they also require the right context to do this. When a group of kids on recess put on a ‘wedding’ and two of them ‘get married’, it doesn’t matter how many times or how loud another kid says “I now pronounce you married”, the world hasn’t changed in the same way that the same words changed your friends’ lives. If you simply shout “I DECLARE BANKRUPTCY”, that’s not enough to actually change your financial situation! The ability for certain words and phrases to perform real-world actions depends on a combination of the authority and sincerity of the utterer and the uptake of the audience and general population. In other words, does the audience recognize the authority and sincerity behind the words and, therefore, accept their power to perform the intended action? The child officiant at a fake wedding doesn’t have the authority to pronounce anyone married and filing for bankruptcy requires more than one’s simple declaration! For words to do things, society must agree that certain words can do certain things in certain contexts; they have the power they do because we recognize that they have this power.  That said, not 100% of people will agree on which words have power in which contexts, and disagreements over these questions can be contentious.

The philosopher Judith Butler extended the idea of performativity from certain speech acts, like I pronounce, I concede, you’re fired, I promise, I hereby declare etc., to suggest that aspects of our identities are forged into reality by way of our language use and other social practice. From this perspective, all language is performative, not just particular speech acts. Butler’s focus was on gender as a ‘performative accomplishment’: certain social practices (including language) come to be associated with men or women (or not) and these social practices then come to be seen as masculine or feminine (or not), and as people who express themselves as masculine or feminine (or not) repeat these patterns over and over again, a link between certain social practice and gender is reinforced. Social practices that reinforce gender include things like wearing a tie or wearing a pink skirt, picking flowers or cutting the lawn, walking into certain bathrooms, and, most importantly for us, using language in certain ways. As we’ll discuss in Chapter 10, language features are used, both directly and indirectly, in the performance of different ways of being a man, or being a woman, or not.

We can extend Butler’s idea beyond gender and understand all aspects of our identity as being performative accomplishments. Our identity is something socially constructed, and through sustained social practice that we mutually agree has certain meaning, we are active in its construction. Every time I say the Canadian English linguistic stereotype eh, I am both carving out my identity as a Canadian and reinforcing the link between eh and Canadian-ness. If, at the beginning of a lecture, I ‘drop my gs’ (e.g., I might say good mornin’ and how is everybody doin’ today instead of morning or doing), I signal that I am a laid-back person who isn’t interested in abiding by the general expectation that a university lecture is a formal context. If I take a sip of a beer and, using the jargon of craft beer connoisseurship, ask “am I detecting a hint of Cascade hops on the finish?” I am staking a claim as a member of the craft beer drinking community; I am expressing that I not only enjoy and am knowledgeable about the drink but that I am the kind of person who enjoys and is knowledgeable about the drink and all the social associations that might entail (perhaps masculine, millennial, a hipster, and not too serious like those *wine* people!) (see Konnelly 2020).

When we talk about performativity it’s important to make a distinction with another common understanding of performance. When we say that aspects of our identity are performative, we aren’t saying that, for example, our gender is like a role we play in stage performance. It’s not about acting and definitely not about acting like someone who isn’t you. When we talk about performativity and language, we mean that language performs certain functions for us. It’s the idea that we make ourselves through our behaviours and language performs that function for us.

If language can be used to perform actions, then language has power to do both good and harm.


Austin, J. (1962). How to Do Things with Words. Harvard University Press.

Butler, J. (1990). Gender Trouble: Feminism and the Subversion of Identity. Routledge.

Konnelly, L. (2020). Brutoglossia: Democracy, authenticity, and the enregisterment of connoirsseurship in ‘craft beer talk’. Language & Communication, 75, 69–82.

2.2 Language and Offense


Taboos and offence

In the previous section, we saw that language has a power beyond communicating the literal meaning: things can happen in the world as a result of us uttering something. Language is very powerful in this way. We just discussed how language is how we perform our identity. Another power that language has is the emotional effect that producing and/or perceiving certain expressions can have on us. Let’s unpack what that means in this section and the next section.

What do we mean by emotional effect? One example of this is swears. Although the English words poop and shit have the same basic meaning and refer to the same physical substance, they are not completely interchangeable in conversation. While poop is fairly innocuous, almost childish, shit is considered taboo, which means that its use is avoided in except under certain circumstances. Violating this taboo by using it in the wrong circumstances is likely to cause offence. For example, if you break your toe and shout, “Oh, poop!”, no one will be offended (though you may not feel as satisfied!), but if you ask a young child, “Do you need to take a shit?”, this will surely offend many adults. Context matters, and using taboo language in the wrong context is culturally offensive, rather than just amusing or awkward.

Contrast this with other pairs of words that have the same basic meaning but different associations, but neither of which are considered offensive. For example, the English words odour and aroma both refer to smells. We tend to talk about unpleasant odours and pleasant aromas, but mixing up these associations won’t offend anyone. The negative association of the word odour is not sufficient to make the word taboo. 

It’s easy to think that taboos are avoided in conversation because the taboo words themselves are bad somehow. This might be true for cases like shit where the word itself has a lot of negative emotional content attached to it. However, this is not always the case. In Kambaata (a Highland East Cushitic language of the Afro-Asiatic family, spoken in Ethiopia), it is traditionally taboo for a woman to use any words that begin with the same sounds as the name of her spouse’s parents, so she is expected to use taboo avoidance, which is the replacement of taboo words with other words (Treis, 2005). This is not because of any negativity towards the in-laws or their names, but rather, it is a sign of respect for these relatives.

So language taboos can exist for either negative or positive reasons. What makes an expression taboo is that uttering it breaks the taboo and causes offence. Across the world’s languages, there are many types of taboo language. The names of respected people are often taboo: in-laws (as in Kambaata and many other languages), community elders, emperors, etc. Many languages also have taboo words for bodily waste, sexual organs and functions, death, and religious items or ideas.

Likewise, the strategies for taboo avoidance are varied. In Kambaata, many words have alternate forms that would be used to avoid matching an in-law’s name. For example, a married Kambaata woman whose father-in-law is named Tiráago might replace the word timá ‘leftover dish’ with ginjirá to avoid using the beginning ti– sound that matches her father-in-law’s name.

Instead of avoiding similarity, another taboo avoidance strategy is to replace the taboo word with a similar form, to help evoke the taboo word without actually uttering it. This is common for swear words, which may often be replaced with less offensive words that have similar form. An English speaker might yell out sugar or shoot when they stub their toe instead of shit. The matching initial sound retains some of the emotional power of uttering the actual swear word, while minimizing the offence that could result from violating the taboo against swearing.

Swearing and physical pain

The power of swearing is very real! Many studies (e.g. Stephens et al. 2009; 2020) show that swearing in response to pain actually seems to reduce the pain we feel. Interestingly, though similar-sounding words from taboo avoidance might carry some of the same emotional power, they don’t seem to help with the pain itself in the way that the original swear words do. So the next time you drop a hammer on your foot, take comfort in knowing that your offensive swears can be beneficial!

Using a word versus mentioning a word

Most of the time when communicate, each word is just one of many words strung together in an utterance. This is the ordinary use of these words. So in an English sentence like wheat is a grain, each of the words are used to say something about the world. In this case, the sentence is discussing wheat as a real world object, the actual physical grain itself, with the word wheat being used to refer to the grain.

However, an important feature of language is that it can be used to describe itself. That is, we can use language metalinguistically to discuss properties of language. This is what makes the entire field of linguistics possible! But it’s not just linguists who do this. Ordinary language users frequently have metalinguistic conversations about language, and they do so by mentioning words and expressions as linguistic objects, rather than using them to refer to real world concepts. For example, while we use the word wheat in sentences like wheat is a grain, a sentence about the word itself would be mentioning it rather than using it, for example, if we were talking about how the word wheat is historically related to the word white. Here, we are not talking about the physical grain itself or the literal colour white, but rather, we are talking about the English words for that grain and that colour.

How to be a linguist:

The convention when we’re being meta, that is, mentioning words that we need to talk about metalinguistically, is to present the mentioned words and expression in italics. We will revisit this convention in Chapter 7, when we talk about the meaning of linguistic expressions.

This difference between using a word and mentioning it is sometimes called the use-mention distinction. It is helpful to keep this in mind especially when discussing taboo words. In some cases, mentioning the taboo word doesn’t seem to call up the taboo the same way that using it does. To return to our shit example, there’s a fairly strong taboo against using swear words in professional writing like textbooks, so it would be surprising if we used the word shit in this chapter. However, when the topic under discussion is swear words as linguistic objects, as a phenomenon within language, we need to be able to mention the word shit, which is what we’ve done here.

The use-mention distinction does not give us the free pass to utter all taboo words in all contexts. There are especially offensive and highly volatile taboo words, like racial slurs (words that insult and denigrate certain marginalised groups of people, in this case based on perceived race), whose mere mentions are known to cause visceral emotions in hearers. Even for non-slurring taboo words like shit,  some contexts are so sensitive to the taboo, that even mentions of them violate the taboo. In both of these cases, if you need to allude to the word, you can use a different strategy. You might choose a complex circumlocution like a four-letter word referring to excrement, or you might mask the word in some way by replacing some letters with asterisks or dashes (s***, s–t), referring to the word by its first letter (s-word), bleeping an audio track, or blurring an image or video.


An interactive H5P element has been excluded from this version of the text. You can view it online here:


Anderson, L., & Lepore, E. (2013). What did you call me? Slurs as prohibited words setting things up. Analytic Philosophy54(3), 350-63.

Lepore, E., & Anderson, L. (2013). Slurring words. Noûs47(1), 25-48.

McCready, E., & Davis, C. (2019). An Invocational Theory of Slurs. LENLS 14, Tokyo.

Davis, C., & McCready, E. (2020). The instability of slurs. Grazer Philosophische Studien97(1), 63-85.

⚠️ Rappaport, J. (2020). Slurs and Toxicity: It’s Not about Meaning. Grazer Philosophische Studien, 97(1), 177-202.

Snefjella, B., Schmidtke, D., & Kuperman, V. (2018). National character stereotypes mirror language use: A study of Canadian and American tweets. PLOS ONE, 13(11), e0206188.

Stephens, R., Atkins, J., & Kingston, A. (2009). Swearing as a response to pain. NeuroReport, 20(12), 1056–1060.

Stephens, R., & Robertson, O. (2020). Swearing as a Response to Pain: Assessing Hypoalgesic Effects of Novel “Swear” Words. Frontiers in Psychology, 11.

Treis, Y. (2005). Avoiding Their Names, Avoiding Their Eyes: How Kambaata Women Respect Their In-Laws. Anthropological Linguistics, 47(3), 292–320.


⚠️ Content note: This paper mentions a highly volatile racial slur without censoring it.

2.3 Derogation, toxicity, and power imbalances


Offense revisited

At the beginning of this chapter, we discussed how language can be used to perform actions and construct our identity. This can be a good thing: we might use language, for example, to establish positive social relationships with people. When I went to elementary school in Japan, it was typical for my classmates to call me Taniguchi-san towards the beginning of the school year. -san is a suffix you can add to the end of names in Japanese: -san is fairly polite, but not too polite. When my classmates in elementary school started to get to know me better, they started to call me Ai-chan. That linguistic act let me know that we’re friends! -chan is a suffix for names, used for endearment. Some of my super close friends even gave me in-group exclusive nicknames like Ai-pyon (-pyon is roughly a hopping sound in Japanese; but they gave me this nickname not because I hopped a lot, but mostly because it sounded cute)! Now that linguistic act said that we were really, really good friends. In this way, language can be an act of expressing solidarity with others.

It’s important to recognize, however, that language can do harm, too. We introduced the notion of offense in the previous section. Vulgarities

The term vulgarity refers to expressions that involve taboo bodily references (e.g., shit, ass). The term expletive refers to expressions that are used for outbursts (e.g., damn). Some expressions can be both a vulgarity and an expletive.

like shit can cause offense in some contexts. Offense is a kind of social and/or psychological harm that is done to discourse participants. This means that if you say shit in a spoken conversation where it is taboo, it’s the people who hear it that the harm is done to. If you sign what is shown in Figure 1 in a signed conversation where it is taboo, then it’s the people who see it that the harm is done to. The harm can range from fairly mild to more severe, depending on how offensive the expression itself is, and what context it was produced in. For example, for some people, I don’t give a damn generally may not be as offensive as I don’t give a shit. If a small Japanese child says kuso omoshire: (roughly ‘fuckin’ hilarious’) during dinner out of rebellion, that might not be as offensive as an adult saying the same thing in a room full of children.


ASL sign for "bullshit". Arms crossed in front of body. Right hand forms a fist with index finger and pinky extended, like horns. Left hand is in a closed fist shape, then transitions to all fingers extended (as if sprinkling something).
Figure 1. American Sign Language (ASL) sign for bullshit.

Offense can happen regardless of speaker/signer intent.  Let’s say you were learning Japanese and you had no idea that kuso omoshire: was vulgar (maybe you thought it just meant ‘extremely funny’), and you say it in front of children. People who hear you can still be offended by this, regardless of your lack of malicious intent. Unintentional utterances of vulgarities are likely perceived to be less offensive than intentional ones, but nevertheless, the harm that it caused at the time of utterance cannot be undone — however small it is. It is similar to how stepping on someone’s foot does its harm, regardless of whether it was intentional or not.


Another kind of harm that language can do is derogation (or pejoration). Some linguistic expressions are derogatory (or pejorative), which means that these expressions disparage people. For example, the word jerk and asshole in English are derogatory: they express the utterer’s condemnation of the referent. Offense and derogation are not the same thing. Offense has to do with how discourse participants are affected: if you spill coffee on yourself and say “Shit!” in front of your grandmother, your grandmother may take offense upon hearing that vulgarity. However, what you have said is not derogatory towards her (or anyone); it’s not an insult towards her (or anyone) in any way in this context. So the vulgar expletive shit is offensive (in this context) but not derogatory. Speaking about taboo topics, even if you do not use vulgar terms (e.g., using more “neutral” terms to discuss bodily functions over dinner), may also be offensive but not necessarily derogatory.

Of course, many expressions that are derogatory are also offensive. The vulgarity asshole is taboo in some contexts and therefore offensive in those contexts. It’s also derogatory because you’re putting someone down with that term.

It is also possible for expressions to be derogatory but not offensive. This one is a little bit trickier because many derogatory things also cause offense. One example where something may be derogatory but not offensive on the surface might be coded slurs. In 2012, a police officer was fired partially because he called one baseball player a “Monday”. Monday is sometimes used as a coded racial slur. This means that for those who share the knowledge that Monday is code for certain racialised groups, they can say things like I hate Mondays to express their bigoted ideologies to each other — and the people who are targeted by the slur will be unaware of this derogation. So in this case, Monday is (secretly) derogatory, but would not cause offense without the in-group knowledge.


Slurs, toxicity, and power imbalances

In summary of what we have learned so far: offense has to do with the impact that a linguistic expression has on the discourse participants, and derogation has to do with the attitude that the utterer of the linguistic expression has. Derogatory expressions like jerk, idiot, and asshole are sometimes called particularistic insults or general pejoratives. They are used to condemn a specific person (and not an entire group of people) for some specific behavior at some specific time. When you use particularistic insults, you are expressing your strong disapproval of the other person based on something they did.

Other derogatory terms may disparage an entire group of people, rather than a particular person for a specific incident. Slurs are insults that denigrate specific marginalized groups of people. For example, femoid is a slur against women, used in certain online subcultures. Calling someone a femoid expresses the utterer’s attitude that this person is condemnable because this person is a woman. This is not a particularistic insult, because it is not the case that the utterer is expressing disapproval of this person (that happens to be a woman) for some specific incident. Rather, they are expressing disapproval of women in general, and therefore by extension disapproval of this person who is a woman.

Slurs are powerful, highly taboo, and can cause a lot of harm. The great emotional weight of slurs arises from the power differential between the person using the slur and the person targeted by it. Where such a power differential exists, the person wielding the slur is invoking and reenacting an entire historical context of violence against the targeted group (Davis & McCready, 2020). Expressing racism without a slur (e.g., “I hate Japanese people”) and expressing racism with a slur (e.g., “She is a ___”) are both terrible things to do, but using a slur causes extra visceral emotional harm. In fact, some studies show that slurs are processed in a different part of the brain than other forms of language (Singer, 1997). This particular kind of offensive emotional power that slurs have is sometimes called the toxicity of slurs (Rappaport, 2020). As alluded to in the previous section, some slurs are so toxic that even mentioning them or accidentally using words that sound or look similar to them can do harm.

Because power imbalance is a crucial component of a slur, insults aimed at high-status groups of people don’t have the same effect. Such an insult can be impolite or even offensive, but without the associated invocation of targeted violence, it doesn’t achieve the same level of harm that a true slur does.

Another consequence of this understanding of slurs is the possibility to reclaim a slur as a means of empowerment, as a marker of shared identity and solidarity against oppression. For example, the word queer was long used as a slur for members of the 2SLGBTQ+ community, but in the 1990s activists and academics began to reclaim the word and use it to express queer solidarity among themselves. These days, queer is a common umbrella term for this community, and Queer Studies is a recognized area of academic study. At the same time, some members of the community who’ve been targeted by this slur are not yet ready to embrace it.

On the other hand, some slurs have been so thoroughly rehabilitated that they’ve become mainstream. Women fighting for equal voting rights, or suffrage, were originally called suffragists. A British journalist coined the term suffragette in 1906, using the diminutive, feminine –ette ending in an attempt to insult. But the activists adopted the term themselves and it is no longer considered a slur.

A recurring theme of this chapter and of this book is that language is about more than grammar, and words do more than just refer to literal things in the world. Slurs provide one example of how language encodes and enacts social relationships: we can use language to express our status relative to others,  and we also use language to enforce other people’s status relative to ourselves. With your linguistics training in hand, you can use your metalinguistic awareness to examine some of these power relations, and maybe even to resist or correct the damage that can be wielded through language.

An interactive H5P element has been excluded from this version of the text. You can view it online here:


Anderson, L., & Lepore, E. (2013). What did you call me? Slurs as prohibited words setting things up. Analytic Philosophy54(3), 350-63.

Bach, K. (2018). Loaded words: On the semantics and pragmatics of slurs. Bad Words: Philosophical Perspectives on Slurs, 60-76.

Bolinger, R. J. (2017). The pragmatics of slurs. Noûs51(3), 439-462.

Bolinger, R. J. (2020). Contested slurs: Delimiting the linguistic community. Grazer Philosophische Studien97(1), 11-30.

Davis, C., & McCready, E. (2020). The instability of slurs. Grazer Philosophische Studien97(1), 63-85.

Hess, L. F. (2019). Slurs: Semantic and pragmatic theories of meaning. The Cambridge Handbook of The Philosophy of Language.

Jeshion, R. (2020). Pride and Prejudiced: on the Reclamation of Slurs. Grazer Philosophische Studien97(1), 106-137.

Jeshion, R. (2021). Varieties of pejoratives. Routledge Handbook of Social and Political Philosophy of Language, 211-231.

Lepore, E., & Anderson, L. (2013). Slurring words. Noûs47(1), 25-48.

McCready, E., & Davis, C. (2019). An Invocational Theory of Slurs. LENLS 14, Tokyo.

Nunberg, G. (2018). The social life of slurs. New Work on Speech Acts, 237-295.

Popa-Wyatt, M. (2020). Reclamation: Taking Back Control of Words. Grazer Philosophische Studien97(1), 159-176.

⚠️ Rappaport, J. (2020). Slurs and Toxicity: It’s Not about Meaning. Grazer Philosophische Studien, 97(1), 177-202.

Saka, P. (2007). How To Think About Meaning. Dordrecht: Springer.

Singer, C. (1997). Coprolalia and other coprophenomena. Neurologic Clinics15(2), 299-308.


⚠️ Content note: This paper mentions a highly volatile racial slur without censoring it.

2.4 The Power of Names


One or more interactive elements has been excluded from this version of the text. You can view them online here:


Our names are intimately entwined with our personhood. In addition to pointing to you as an individual, your name also provides many clues about your membership in social categories. People make guesses about your gender, age, and ethnicity on the basis of the clues they infer from your name. For example, imagine you’re moving into residence at a Canadian university and you see your neighbours’ names on their doors. On one side is Kimberley and on the other is Kimiko. Even before you meet Kimiko and Kimberley, you’ve probably made a guess about what they look like based on their names. Your guess might be wrong because these clues arise from general patterns, not absolutes, but your experience gave you some expectations.

Matched-Guise Study

It can be hard to make direct observations of people’s attitudes about social difference, because it’s generally not socially acceptable to express negative attitudes towards minority groups. So instead, we can use a technique called a matched-guise study to try to draw conclusions about attitudes. It works like this: The researchers present participants with some kind of stimulus. In one study (Oreopoulous 2011), the stimuli were a set of résumés. The researchers held the stimulus constant and changed the guise that it appeared in — in this case, the guise was the name at the top of the résumé. Different employers received the same résumés (the same stimuli) under different names (different guises).

The core idea in a matched-guise study is that if you find a difference in your participants’ ratings, that difference isn’t because of the stimulus, because you’ve held the stimulus constant. Any difference in ratings must be because of the guise — the way you labelled your stimuli.


There’s evidence from social science research that employers and landlords also make guesses about people based on their names. And as you might expect, the guesses they make are shaped by societal structures of power and privilege. In a matched-guise study in Toronto, (Oreopoulous, 2011) the research team submitted thousands of mock résumés to job postings. They found that a given résumé with an English-sounding name like Matthew Wilson was much more likely to get a callback than the same résumé under the name Rahul Kaur, Asif Sheikh, or Yong Zhang, even when the résumé listed a Canadian university degree and indicated fluency in English and French. That same year, another matched-guise study (Hogan & Berry 2011) sent email inquiries to Toronto landlords who had advertised apartments on Craigslist. The landlords responded to emails from typically Arabic male names like Osama Mubbaarak at much lower rates than to inquiries from typically English names like Peter McDonald. It’s clear that the hiring managers and the landlords in these studies used applicants’ names to make judgments about their ethnicity and about their value as a potential employee or tenant.

I’m guessing that many of you reading, watching, or listening to this have names that are not traditionally English, and maybe you’ve grappled with this question: do I use my own name, or do I choose an English name that will be easier for my teachers and classmates to pronounce? On one hand, using an English name might just make daily life a little bit simpler in an English-dominant society. On the other hand, it’s not fair that this pressure to conform to English even exists! Your name doesn’t just do the job of signaling things about you to other people; your name can also be a vital expression of your own individual identity, representing a profound connection to your family, language, and community.

This is the case for many people who are working to reclaim their Indigenous languages: using a name from that language not only connects them to their ancestors, but also expresses resistance to the colonial names assigned in residential schools. When children arrived at residential school for the first time, they were given an English or French name and their hair was cut, two powerful symbols that the school intended to sever the children’s connections to their home communities. Because of that trauma, many survivors of the schools also chose English or French names for their children and grandchildren rather than names from their own languages. This was the case for Ta7talíya Nahanee, a Sḵwx̱wú7mesh decolonial facilitator and strategist, whose grandfather gave her the English name that appears on her official Canadian documents. In June 2021, in response to Call To Action 17 of the Truth and Reconciliation Commission (2015), Canada launched a program that allows Indigenous people to reclaim their Indigenous names on passports and other official documents free of charge. But when Ta7talíya applied to have her documents changed to her Sḵwx̱wú7mesh sníchim name, the government denied her request, because of a rule that forbids numerals like “7” in legal names. But in Sḵwx̱wú7mesh sníchim orthography, 7 is not a numeral — it’s a letter that corresponds to the glottal stop [ʔ], a contrastive phoneme in the language (see Chapter 4). Ta7talíya Nahanee is currently fighting for the right to her name. In an interview with the Toronto Star, she argued: “If all of us are able to share with the world every time we show our ID, it just opens up that normalizing Indigenous language, normalizing Indigenous teachings and normalizing Indigenous ways. So please make policy that works for us.” (Keung, 2021)

Trans folks also know how powerful names are for expressing identity. If you’ve gone through a gender transition you might have experienced a sense of liberation when others call you by a name of your choice that matches your gender. And maybe you’ve also experienced the pain of being deadnamed, when someone uses your old name either accidentally or deliberately.

Deadnaming, forcible renaming, and mispronouncing names are all ways that people use language, specifically names, to enforce social structures of power. In the early 1900s when travelling by train was a luxurious experience for middle class white people in Canada, most of the train porters were Black, and all of them were called George. As historian Dr. Dorothy Williams says:

“Using Black men at that period, just 10, 20, 30 years from the end of slavery was a signal or a signpost to whites that these men should still be servants to them. […] So they didn’t have to have an identity. Just like in slavery, they didn’t have to have an identity as these Black men were now going to be called George, because that was the easiest reference most whites could make to get attention. Just call him George.” (Bowen & Johnson, 2022)

And it’s not just in the olden days that Canadians expressed white supremacy through names. During the 2021 federal election, at least one person on Twitter repeatedly referred to NDP leader Jagmeet Singh as Juggy. Of the three leaders of the main federal parties, Singh was the only person of colour. Calling him Juggy, with English spelling and that diminutive affix -y, not only erased his Punjabi-Canadian identity but also infantilized him.

These examples all illustrate what Mary Bucholtz (2016) calls indexical bleaching. Replacing someone’s name with an English one, or mispronouncing it so it sounds more English, are ways of “bleaching” that person’s identity: it strips away their connection to family, community and language, and in place calls them by a name that sounds more English, that is, more white. In other words, it’s a way of reinforcing existing structures of power and privilege.

[self-test questions coming soon]


Bowen, L.-S., & Johnson, F. (Hosts). (2022). Why were all porters called “George”? [Audio podcast episode]. In Secret Life of Canada. CBC Podcasts.

Bucholtz, M. (2016). On being called out of one’s name: Indexical bleaching as a technique of deracialization. In H. S. Alim, J. R. Rickford, & A. F. Ball (Eds.), Raciolinguistics (pp. 273–289). Oxford University Press.

Hogan, B., & Berry, B. (2011). Racial and Ethnic Biases in Rental Housing: An Audit Study of Online Apartment Listings. City & Community, 10(4), 351–372.

Keung, N. (2021, August 28). Yes, her name is Ta7talíya, but you won’t see it on her passport. Toronto Star.

Oreopoulos, P. (2011). Why Do Skilled Immigrants Struggle in the Labor Market? A Field Experiment with Thirteen Thousand Resumes. American Economic Journal: Economic Policy, 3(4), 148–171.

Truth and Reconciliation Commission of Canada. (2015). Truth and Reconciliation Commission of Canada: Calls to Action.

2.5 Pronouns, Language Change, and the Grammar Police


One or more interactive elements has been excluded from this version of the text. You can view them online here:


Analogously to names, we also use pronouns to express things about our own identity and make guesses about other people’s identities. We’ll learn more about pronouns in Chapter 6, but for now here’s a simple explanation. In standardized varieties of EnglishMany languages have more subtle distinctions than these in their pronoun systems but all languages encode at least a three-way difference between first-, second- and third-person pronouns., first-person pronouns (I, me, we, us) refer to the person who is speaking, signing, or writing and second-person pronouns (you) are for the person being addressed. Third-person pronouns refer to someone else, and can often replace a noun phrase in a sentence. Here are some examples of English third-person pronouns.

inanimate singular it Samnang really enjoyed the latest book by Ivan Coyote.
Samnang really enjoyed it.
animate singular masculine he, him Samnang invited Steve to a movie.
Samnang invited him to a movie.
animate singular feminine she, her Samnang thinks the woman who lives next door is a good gardener.
Samnang thinks she is a good gardener.
animate singular ungendered they, them The passenger in Seat 3A forgot their coat.
They forgot their coat.

In the sentence, “Samnang really enjoyed the latest book by Ivan Coyote”, we can replace that noun phrase, the latest book by Ivan Coyote with it. “Samnang invited Steve to a movie.” We can replace Steve with him:  “Samnang invited him to a movie.” In the next sentence, “Samnang thinks the woman who lives next door is a good gardener”, we can replace that phrase with she: “Samnang thinks she is a good gardener”. In, “The passenger in Seat 3A forgot their coat”, we can replace that noun phrase with, “They forgot their coat.”

Notice that third-person singular pronouns give some vague clues about their referent: we assume that it refers to a thing, he to a boy or a man, and she to a woman or girl. Those three categories — thing, human male, human female — are very broad, and yet, they can still be used to do harm and exclude people. In many cultures there’s a general expectation that we use appropriately-gendered pronouns when we’re referring to people. Even when we meet a tiny baby who can’t possibly be offended, we’re still careful to ask “boy or girl?” and to use the relevant pronoun. After infancy, getting misgendered with the wrong pronouns can range from embarrassing to outright dangerous. Furthermore, a two-way distinction between masculine and feminine is too simple to describe the rich variation among human genders. A person who’s neither male nor female (for example, non-binary, genderqueer, or gender-fluid) can experience both he/him and she/her as misgendering. Here’s where the pronouns they and them are useful.

The pronoun they doesn’t offer many clues: it doesn’t specify whether the referents are animate or inanimate, masculine or feminine. Here are some examples of plural they:

plural ungendered
animacy unspecified
they, them The pistachio cupcakes are delicious.
They are delicious.The prof told the students that class was cancelled.
The prof told them that class was cancelled.


In fact, they doesn’t always even specify whether it’s singular or plural. Here are some more examples.

 number unspecified I don’t know who was in here but they left a big mess.
singular, gender unspecified One of my students told me they needed an extension.

In “I don’t know who was in here but they left a big mess”, we don’t know how many people left the big mess – it could be one, two, or twenty, and the pronoun they doesn’t give us any clues. In this next one, “One of my students told me they needed an extension”, it’s clearly only one student who asked for an extension, and either we don’t know their identity or it just isn’t relevant to the story, so they also does the job. This singular use of they has been common in English for about 600 years. These days, English is changing to include the use of they to refer to a single person whose identity we do know, as in, “Samnang told me they needed an extension.”

In many ways, this shift from unspecified-singular-they to specific-singular-they feels like a tiny change to the grammar of English. But since this change is related to a change in patriarchal gender norms, people who benefit from those norms tend to get prescriptive, insisting that singular they is always ungrammatical in every circumstance. The Chicago Manual of Style tells people “it is still considered ungrammatical”, and the AP Stylebook tells you it’s “acceptable in limited cases” but they’d really prefer if you didn’t use it. And then there are the extremely crabby folks like Jen Doll, who complains, “The singular they is ear-hurting, eye-burning, soul-ravaging, mind-numbing syntactic folly. Stop the singular they. Stop it now.” (Doll 2013). But no matter how much the prescriptivists complain, specific-singular-they is getting used more and more widely. In 2015 the American Dialect Society voted it the Word of the Year and the Merriam-Webster Dictionary did the same in 2019.

The funny thing is, the English pronoun system went through a very similar change hundreds of years ago. In the 16th century, English used to have both a singular and a plural second-person pronoun. If you were talking to a group of people, you’d say you just like we do now. But if you were talking to just one person, you’d address them as thou or thee, like, “What classes art thou taking this term?” or “Can I buy thee a drink?”. By the 17th century, thou and thee had all but disappeared and were only reserved for conversations with people you’re very close to. So the pronoun you became both singular and plural. In modern English, we don’t have thou or thee at all unless we’re trying to be funny or old-fashioned. But it can be pretty useful to have a way of distinguishing between singular and plural, so some varieties of spoken English have other plural forms, like y’all or you guys or youse. Maybe your variety of English has one of these.

Linguists are conducting systematic research on how the change to English they is unfolding. Bjorkman (2017) found that English speakers with a conservative grammar didn’t use they in this way, but those with an “innovative” grammar did. Ackerman (2019) has proposed that the more trans and non-binary friends you have, the likelier your grammar is to have specific-singular-they. Conrod (2019) showed in their dissertation that older people were less likely to use it and younger people were more likely, and Konnelly & Cowper (2020) tracked the three stages of grammatical change that are in progress.

No one can stop language from changing. But language users can speed up language change. Misgendering people does real harm. One way to make it less likely that non-binary people will be misgendered is for English to make this small change to include specific-singular-they. And the way that language changes is for people to change how they use it. If you already have specific-singular-they in your grammar, use it as much as you can! And if you’d like to change your own mental grammar, Kirby Conrod (2017) gives some good advice — slow down, listen to people who use it in their own language, and practice! The more you use it, the more natural it will feel.

[self-test questions coming soon]


2015 Word of the Year is singular “they.” (2016, January 9). American Dialect Society.

Ackerman, L. (2019). Syntactic and cognitive issues in investigating gendered coreference. Glossa: A Journal of General Linguistics, 4(1).

Bjorkman, B. M. (2017). Singular they and the syntactic representation of gender in English. Glossa: A Journal of General Linguistics, 2(1), 80.

Conrod, K. (2017, December 4). How to do the absolute minimum (with pronouns). Medium.

Conrod, K. (2019). Pronouns Raising and Emerging [PhD Thesis]. University of Washington.

Doll, J. (2013, January 17). The Singular “They” Must Be Stopped. The Atlantic.

Konnelly, L., & Cowper, E. (2020). Gender diversity and morphosyntax: An account of singular they. Glossa: A Journal of General Linguistics, 5(1).

Merriam-Webster’s Words of the Year 2019. (2019). Retrieved April 28, 2022.

2.6 Linguistic Law Enforcement


One or more interactive elements has been excluded from this version of the text. You can view them online here:

In many cultures there’s a general sense that it’s rude to criticize or call attention to various kinds of social difference. In Canada, most kids learn in school that it’s impolite to stare at a person who has a visible disability, to make jokes about fat bodies, or to comment on someone’s gender-nonconforming appearance. Or at least, we learn not to express these opinions in public.

In contrast, it’s not only socially acceptable but even expected and encouraged to criticize language use that deviates from the privileged standard, calling it improper, ungrammatical, or worse. In this unit we’ll look at some of the domains where prescriptive standards of grammar get wielded like law enforcement, to keep social order.

Policing Voices

We saw in the previous unit that people who object to using they/them pronouns for non-binary people often phrase their objections not in terms of gender norms but terms of grammar, insisting that they can’t possibly be singular because that would be ungrammatical! Bradley’s (2019) work has shown that people with prescriptive views of grammar also tend to have conservative views about the gender binary — in other words, it’s not just about grammar.

Another way that people police language use to enforce gender norms is by criticizing women’s voices. When I was young, the older generation complained about uptalk? When your pitch rises? At the end of a sentence? Beginning sometime in the 2010s, the moral panic started to center on vocal fry. Chapter 3 will give us a chance to explore more about how humans make speech sounds in the vocal tract. For now, you should know that vocal fry is a way of producing speech with very low frequency vibrations of the vocal folds, so that it sounds creaky. Creak is actually one of the technical linguistic terms for this voice quality, and creak is a systematic part of the phonetics, phonology, and prosody of many spoken languages around the world (Davidson 2020).

In addition to the jobs the vocal fry does in the grammar, it also provides social cues that listeners interpret. Davidson’s (2020) review article mentions studies that found that speakers who use vocal fry are perceived as more bored, more relaxed, less intelligent and less confident, among other attributes. But even though men and women speaking English are about equally likely to creak, for some reason listeners, or at least listeners older than 40, find it wildly more irritating when women do it. Ira Glass, host of the podcast This American Life and frequent vocal fryer himself, reports that he’s received dozens of emails complaining about his female colleagues’ vocal fry, “some of the angriest emails we ever get. They call these women’s voices unbearable, excruciating, annoyingly adolescent, beyond annoying,” (Glass, 2015) but no emails complaining about his voice or those of his male colleagues. Confirming Glass’s anecdotal report, Anderson et al.No relation to the Anderson of this textbook! (2014) found that, “The negative perceptions of vocal fry are stronger for female voices relative to male voices” and they recommend that “young American females should avoid using vocal fry speech in order to maximize labor market opportunities.” Does that sound familiar? Just like the résumé study we learned about in the previous unit, this is another instance of job candidates being judged not for their qualifications and experience, but for the social cues being indexed by their voice. It’s not too likely that the pitch of your speaking voice is related to your job performance, so rather than telling job candidates to change their name or change how they use language to conform to the biases of the hiring manager, how about we train hiring committees to overcome these biases?

Policing Accents

Besides voice, another part of language use that is subject to linguistic law enforcement is accent. Everybody has an accent, but we tend to notice only the accents that are different from our own. In an earlier unit, we learned about the common belief that a standardized variety is the best or most correct way of using language. That logic extends to accents as well: a non-standard accent is often stigmatized. The accent itself is neither bad nor good, but the stigma means that people have negative attitudes and expectations about it. Where English is the majority language, people who learned English later in life often encounter that stigma. And there are also L1 varieties whose speakers experience stigma, such as Black English, the varieties spoken in the southeastern United States, and Newfoundland English.

Chapters 11 and 12 deal with how children and adults learn language in much more detail. Here, we’ll use the term first language or L1 to refer to the language(s) that you learned from birth from the people around you, and L2 for any language you learned after you already had an L1, even if it’s actually your third or fourth language.

Why do L2 users have different accents from L1 users? The short answer is that, when you learn an L2, your mental grammar for that L2 is influenced by the experience you have in your L1. (The longer answer comes in a later chapter!) So your accent in your L2 is shaped by the phonology of your L1. What this means is that if your L1 is English and you learn Japanese as an L2, your accent in Japanese is likely to be different from that of your classmate whose L1 is Korean.

For people whose accents are different from the mainstream, there can be many negative consequences. You’re less likely to get a job interview (Oreopoulos, 2011), and your boss might not recognize your skills (Russo et al., 2017). It’s harder to find a landlord who’s willing to rent you an apartment (Purnell et al., 1999; Hogan & Berry, 2011). If you have to go to court, what you say won’t be taken as seriously (Grant, 2019), and the court reporter is likelier to make mistakes in transcribing your testimony (Jones et al., 2019). Kids whose accents aren’t mainstream are disproportionately labelled with learning disabilities and streamed out of academic classrooms into special ed (Adjei, 2018; Kooc & Kiru, 2018). And probably Alexa, Siri, and Google won’t understand your requests (Koenecke et al., 2020)!

Why do these things happen? Well, in the case of Alexa, it’s because the training data doesn’t include enough variation in dialects and accents. But the rest of these situations arise from people’s expectations, and their expectations come from their experiences and their attitudes. Two linguists at the University of British Columbia conducted a matched-guise study with UBC students as listeners (Babel & Russell, 2015). They recorded the voices of several people who had grown up in Canada and had English as their L1. When they played these recordings to the listeners, they presented them either as audio-only, with a picture of the face of a white Canadian person, or with a picture of a Chinese Canadian person. For any given voice, the listeners rated the talker as having a stronger accent when they saw a Chinese Canadian face than when they saw a white Canadian face, and they were also less accurate at writing down the sentences the talker said. Apparently the faces influenced how well the listeners understood the talkers.

The researchers interpret their results as a mismatch of expectations. In Richmond, BC, where they conducted their study, more than 40% of the population speaks either Cantonese or Mandarin. If you live in Richmond, you have a greater chance of encountering L1 Chinese speakers in your daily life than L1 English speakers. So when you see a face that appears Chinese, you have an expectation, based on your daily experience, that that person’s English is going to be Chinese-accented. If the person’s accent turns out to be that of an L1 English speaker, the mismatch with your expectations makes it harder to understand what they say.

So we’ve seen that people’s expectations, their experiences and their attitudes can lead to stigma for language users with accents that are different from the mainstream. And that stigma can have serious, real-life consequences on people’s employment and housing and education. In addition to the consequences for the person producing an unfamiliar accent, there can also be consequences for the person trying to understand an unfamiliar accent. Those consequences can be pretty serious if you’re finding it difficult to understand the person giving you medical advice (Lambert et al., 2010), or teaching you differential equations (Ramjattan, 2020; Rubin, 1992). Accent “neutralization” is big business and L2 English speakers experience a lot of pressure to “reduce” their accents (Aneesh, 2015). As we’ll see in more detail in Chapter 12, it’s hard to change your accent after childhood, because your L2 grammar is shaped by your L1 experience. And your accent is part of who you are — it’s part of your story and your community. As linguists, let’s resist the narrative that pressures everyone to conform to some arbitrary standard accent. Luckily enough, psycholinguistic research shows us that it’s much easier to change your comprehension of unfamiliar accents than it is to change your L2 production.

Just as our experience and our expectations can lead to stigma, our experience also influences our perception. The more experience we have paying attention to someone, the better we understand them: this is called perceptual adaptation. Perceptual adaptation was first shown for a single talker: the longer people listened to an unfamiliar talker, the more they understood of what the talker said (Nygaard, 1994). Extensions of that research have also shown that experience listening to several speakers with a particular accent makes it easier to understand a new speaker with that same accent (Bradlow & Bent, 2008). And it turns out that listening to a variety of unfamiliar accents then makes it easier to understand a new talker with a completely different accent (Baese-Berk et al., 2013). In short, the more experience we have paying attention to someone, the more familiarity we have with the way they produce language, and the more familiarity we have, the better we’ll understand what they’re saying.

So if you want to better understand someone whose accent is different from yours, the best way to accomplish that is to pay attention to them for longer. Likewise, if someone thinks your accent is hard to understand, you can just tell them to pay attention!

[self-test questions coming soon]


Adjei, P. B. (2018). The (em)bodiment of blackness in a visceral anti-black racism and ableism context. Race Ethnicity and Education, 21(3), 275–287.

Anderson, R. C., Klofstad, C. A., Mayew, W. J., & Venkatachalam, M. (2014). Vocal Fry May Undermine the Success of Young Women in the Labor Market. PLOS ONE, 9(5), e97506.

Aneesh, A. (2015). Neutral accent: How language, labor, and life become global. Duke University Press.

Babel, M., & Russell, J. (2015). Expectations and speech intelligibility. The Journal of the Acoustical Society of America, 137(April), 2823–2833.

Baese-Berk, M. M., Bradlow, A. R., & Wright, B. A. (2013). Accent-independent adaptation to foreign accented speech. The Journal of the Acoustical Society of America, 133(3), EL174–EL180.

Bradley, E. D. (2019). Personality, prescriptivism, and pronouns. English Today, 1–12.

Bradlow, A. R., & Bent, T. (2008). Perceptual adaptation to non-native speech. Cognition, 106(2), 707–729.

Cooc, N., & Kiru, E. W. (2018). Disproportionality in Special Education: A Synthesis of International Research and Trends. The Journal of Special Education, 52(3), 163–173.

Davidson, L. (2020). The versatility of creaky phonation: Segmental, prosodic, and sociolinguistic uses in the world’s languages. WIREs Cognitive Science, e1547.

Gillon, C., & Figueroa, M. (Hosts.) (2017). Uppity Women [Audio Podcast Episode]. In The Vocal Fries. 

Glass, I. (Host). (2015). Freedom Fries | If You Don’t Have Anything Nice to Say, SAY IT IN ALL CAPS [Audio podcast episode.] In This American Life. WBEZ Chicago. 

Kayaalp, D. (2016a). Living with an accent: A sociological analysis of linguistic strategies of immigrant youth in Canada. Journal of Youth Studies, 19(2), 133–148.

Koenecke, A., Nam, A., Lake, E., Nudell, J., Quartey, M., Mengesha, Z., Toups, C., Rickford, J. R., Jurafsky, D., & Goel, S. (2020). Racial disparities in automated speech recognition. Proceedings of the National Academy of Sciences, 117(14), 7684–7689.

Lambert, B. L., Dickey, L. W., Fisher, W. M., Gibbons, R. D., Lin, S.-J., Luce, P. A., McLennan, C. T., Senders, J. W., & Yu, C. T. (2010). Listen carefully: The risk of error in spoken medication orders. Social Science & Medicine, 70(10), 1599–1608.

Nygaard, L. C., Sommers, M. S., & Pisoni, D. B. (1994). Speech Perception as a Talker-Contingent Process. Psychological Science, 5(1), 42–46.

Oreopoulos, P. (2011). Why Do Skilled Immigrants Struggle in the Labor Market? A Field Experiment with Thirteen Thousand Resumes. American Economic Journal: Economic Policy, 3(4), 148–171.

Purnell, T., Idsardi, W., & Baugh, J. (1999). Perceptual and Phonetic Experiments on American English Dialect Identification. Journal of Language and Social Psychology, 18(1), 10–30.

Ramjattan, V. A. (2020). Engineered accents: International teaching assistants and their microaggression learning in engineering departments. Teaching in Higher Education, 1–16.

Rubin, D. L. (1992). Nonlanguage factors affecting undergraduates’ judgments of nonnative English-speaking teaching assistants. Research in Higher Education, 33(4), 511–531.

Russo, M., Islam, G., & Koyuncu, B. (2017). Non-native accents and stigma: How self-fulfilling prophesies can affect career outcomes. Human Resource Management Review, 27(3), 507–520.

2.8 Legally Enshrined Harms


In previous sections we saw how people police each other’s language as a means of asserting power. But what about language policies imposed by actual governments in power? Governments and institutions use language to create unity in some cases and division in others. One way that governments wield their power is through language policies, which can be used to erase or reinforce social identities. They can be used to encourage or force people to speak or not speak particular languages, to prove competency in a language, or affect the physical landscape of our communities by regulating the language that appears on signs. Language policies can be implemented for positive or negative motivations, or sometimes they are well-intentioned, but short-sighted.

Canada, of course, has two official languages, French and English, but as we will see in Chapter 15, there are over 80 languages Indigenous to Canada from 9 different language families! Why should French and English, both imported languages, be the official languages?

The Official Languages Act was instituted in 1969 by Prime Minister Pierre Trudeau in order to maintain national unity between English and French Canada, in response to increasing francophone nationalism in the province of Quebec. At the time, the anglophone minority dominated the industrial, commercial, and financial sectors in Quebec. The Official Languages Act also led to Canada’s policy of multiculturalism.

The Official Languages Act had some positive effects across Canada; it improved education and employment opportunities for francophones outside of Quebec. New Brunswick became officially bilingual. The Supreme Court of Canada overturned a law that had been in place since 1890 that made Manitoba officially monolingual, despite the fact that, when Manitoba joined Confederation in 1870, it had approximately equal numbers of francophones (often Métis) and anglophones. However, the Official Languages Act has a major shortcoming. What about Canada’s Indigenous peoples? It does not offer protection or even recognition of the importance of these languages.

In Canadian history, language policies have been used as a part of the oppression of Indigenous peoples. Sections 1.4 and 2.4 in this book introduce the harms done to Indigenous people and communities by the residential school policy in Canada, which forced Indigenous children’s attendance at the government and church-run facilities. The Government of Canada policed Indigenous people’s language even prior to the creation of residential schools in the 1880s, with policies pushing toward assimilation and the loss of Indigenous languages and cultures. The government’s goal was the assimilation of Indigenous children, to train them for menial jobs and weaken their claims to their land. Official policy dictated that English and French be the only languages of instruction at residential schools. Schools forbade children from using their home languages, and enforced the ban with cruel punishments. These policies were examples of linguistic imperialism or linguistic colonialism, wherein the suppression of language is part of a more general oppression of Indigenous cultures by settler-colonial powers (see Griffith, 2017). They further constituted attempted linguicide (the killing of a language), because children were prevented from practicing their first languages and associated their use with punishment and feelings of shame. These children also felt isolated from their home cultures as the ability to communicate in their home languages were lost (see e.g., Fontaine, 2017). The harms to people and communities due to the loss of languages at the hands of residential schools are lasting and ongoing. The parent-to-child transmission of language has been broken in a majority of Canada’s Indigenous communities. Some residential school survivors still find it difficult to speak their native tongue because it is associated with trauma from their time at school. New legal policy may however be a positive part of the process of reclamation of Indigenous languages. For example, in a 2017 article, Fontaine calls for legal policy entitling children to education in their ancestral language, beyond the right to education in English and French (see the Canadian Charter of Rights and Freedoms, 1982, Section 23). The calls to action of the Truth and Reconciliation Commission of Canada included calls for the protection of Indigenous languages (see The Truth and Reconciliation Commission 2015), leading to the establishment of the Indigenous Languages Act in 2019. This act provides legal protection to Indigenous languages including funding for reclamation and revitalization.

The consequences of Canada’s official focus on English-French bilingualism are still evident today. When Mary Simon was appointed Governor General in 2021, she was criticized for her lack of French proficiency, even though she is bilingual in English and Inuktitut. Simon promised to learn French and also related her language experience to educational policy in Canada, stating that “Based on my experience growing up in Quebec, I was denied the chance to learn French during my time in the federal government day schools” (as reported by CTV news, 2021).

The province of Quebec has its own language laws, with the goal of protecting French from assimilation into the anglophone majority in Canada. Quebec’s language laws limit who is allowed to attend an anglophone school and require the French on signs to come first and to be twice as large as other languages. Unfortunately, though, these laws do not apply only to English, but to all languages, which negatively affects the Indigenous peoples of Quebec. The majority of Cree and Mohawk speakers in Quebec, for example, have English as their second language, and so these laws increase their difficulty in accessing education and other provincial services.

The Quebec Cree passed their own language act in 2019, the first law passed since they achieved self-governance in 2017. In contrast to Quebec’s laws, however, they are not enforcing compliance for now. Instead, local governments, businesses, and others will need a Cree language plan for increasing the use of Cree in their organizations.


Behiels, Michael D. and R. Hudon. 2013. Bill 101 (Charte de la langue française). The Canadian Encyclopedia.

Bell, Susan and Christopher Herodier. Sep 24, 2019. Quebec Cree pass language act as its 1st-ever legislation. CBC News.

Fontaine, Lorena Sekwan. 2017. Redress for linguicide: Residential schools and assimilation in Canada / Réparations pour linguisicide: Les pensionnats et l’assimilation au Canada. British Journal of Canadian Studies 30(2), 183-204.

Griffith, Jane. 2017. Of linguicide and resistance: children and English instruction in nineteenth-century Indian boarding schools in Canada. Paedagogica Historica, 53:6, 763-782.

Haque, Eve  and Donna Patrick. 2015. Indigenous languages and the racial hierarchisation of language policy in Canada. Journal of Multilingual and Multicultural Development 36(1), 27-41.

‘Honoured, humbled and ready’: Mary Simon’s first speech as incoming Governor General. 2021, July 6. CTV News. Retrieved May 30, 2022 from

Laing, G. and Celine Cooper. 2019. Royal Commission on Bilingualism and Biculturalism. THe Canadian Encyclopedia.

Powless, Ben. Jun 5, 2021. Critics say Quebec legislation to defend French could harm Indigenous languages. Nation News.

Truth and Reconciliation Commission of Canada. 2015. Canada’s Residential Schools: the Legacy : The Final Report of the Truth and Reconciliation Commission of Canada, Volume 5, Ch. 3, “I lost my talk”: The erosion of language and culture. Montreal: McGill-Queen’s University Press.

Truth and Reconciliation Commission of Canada. 2015. Truth and Reconciliation Commission of Canada: Calls to Action. Winnipeg:

Verrette, Michel. 2006. Manitoba Schools Question. The Canadian Encyclopedia.

Wood, Nancy. 2021. Next governor general’s inability to speak French leaves francophone communities conflicted. CBC News. Retrieved May 30, 2022, from

2.9 Exercise your linguistics skills


Exercise 1. Many linguists avoid the term standard language and instead refer to these dialects as standardized. Use your morphology skills to think about the following words: prioritized, finalizedAmericanized, revitalized.

  1. What does the -ize morpheme add to the meaning of these words?
  2. What difference in meaning is conveyed by the difference between the terms standardized language and standard language?
  3. Why might you choose one term over the other?

Exercise 2. If you speak an L2, how do you feel about your accent in that language? What factors do you think influence your feelings about your accent? Are any of those factors related to the language itself, or are they about prestige or power?

Exercise 3.  Do you have different names or nicknames that you use with different people in your life? What factors influence what name you use with different groups? What happens if someone uses your “other” name?

Exercise 4.  There are some social contexts where using swear words is offensive, and others where swearing is acceptable. Give an example of at least one of each kind of context, and for each context, describe the role that swearing plays in terms of face. In each context, is positive or negative face threatened or enhanced?

Chapter 3: Phonetics


A key aspect of any language is its physical reality in the world: how we transmit linguistic signals from one person to another. This chapter explores this physical reality by looking at the body parts used for language, how they move to create a linguistic signal, and how linguists categorize, describe, and notate these physical properties so they can record and access information about a language.

When you’ve completed this chapter, you’ll be able to:

  • Identify the locations and functions of parts of the human anatomy relevant to the articulation of spoken and signed languages,
  • Provide articulatory descriptions of given examples of phones and signs, and
  • Identify the meanings of many common symbols from the International Phonetic Alphabet.


3.1 Modality


The major components of communication

An act of communication between two people typically begins with one person constructing some intended message in their mind (step ❶ in Figure 3.1). This person can then give that message physical reality through various movements and configurations of their body parts, called articulation (step ❷). The physical linguistic signal (step ❸) can come in various forms, such as sound waves (for spoken languages) or light waves (for signed languages). The linguistic signal is then received, sensed, and processed by another person’s perception (step ❹), allowing them to reconstruct the intended message (step ❺). The entire chain of physical reality, from articulation to perception, is called the modality of the language.

Two stick figures. The figure on the left has a thought bubble saying “Hello, fellow human”, which is labelled as step 1. The left figure itself has wiggly lines near its mouth and arms to indicate motion. These are labelled as step 2, articulation. Pointing away from the left figure and towards the right figure is a large outlined arrow, labelled as step 3, linguistic signal. Inside the arrow is a row of sound waves and a row of rainbow light waves. The figure on the right has orange circles on its eyes and ears, with orange wavy lines leading from the circles to the brain, labelled at step 4, perception. The right figure has a thought bubble saying “Ah, that was ‘Hello, fellow human’!”, labelled as step 5. An overall branching structure at the top connects from the word modality down to the articulation figure on the left, the linguistic signal in the middle, and the perception figure on the right.
Figure 3.1. Steps in the transmission of a linguistic signal from one person to another.

Spoken and signed languages

The modality of spoken languages, such as English and Cantonese, is vocal, because they are articulated with the vocal tract; acoustic, because they are transmitted by sound waves; and auditory, because they are received and processed by the auditory system. This modality is often shortened to vocal-auditory, leaving the acoustic nature of the signal implied, since that is the ordinary input to the auditory system.

Signed languages, such as American Sign Language and Chinese Sign Language, also have a modality: they are manual, because they are articulated by the hands and arms (though most of the rest of the body can be used, too, so this component of modality might best be called corporeal); photic, because they are transmitted by light waves; and visual, because they are received and processed by the visual system. This modality is often shortened to manual-visual.

Other modalities are also possible, but full discussion is beyond the scope of this textbook. One notable example is the manual-somatic modality of tactile signing, in which linguistic signals are articulated primarily by the hands and are perceived by the somatosensory system, which is responsible for sensing various physical phenomena on the skin, such as pressure and movement. This modality can be used for deafblind people to communicate, often by adapting aspects of an existing signed language in such a way that the signs are felt rather than seen. Some examples of such languages in tactile Italian Sign Language (Checchetto et al. 2018) and a tactile version of American Sign Language called Protactile (Edwards and Brentari 2020).

Finally, it is important to note that actual instances of communication are often multimodal, with language users making use of the resources of more than one modality at a time (Perniss 2018, Holler and Levinson 2019). For example, spoken language is often accompanied by various kinds of co-speech behaviours, such as shrugging, facial expressions, and hand gestures, which are used for many meaningful functions in the linguistic signal: emphasis, emotion, attitude, shifting topics, taking turns in a conversation, etc. (Hinnell 2020; also see Sections 8.7 and 10.4 for discussion of some related issues and examples). A full analysis of how language works must ultimately take into account its multimodal nature and the complexity and flexibility of how humans do language.

Terminological note: Signed languages are sometimes called sign languages. Both terms are generally acceptable, so you may encounter either one in linguistics writing. Sign languages has long been the more common term, but signed languages has recently been gaining popularity among deaf scholars.

Another piece of relevant terminology that is in flux is the long-standing distinction in capitalization between uppercase Deaf (a sociocultural identity) and lowercase deaf (a physiological status). However, this distinction has been argued to contribute to elitist gatekeeping within deaf communities, so many deaf people have pushed to eliminate this distinction (Kusters et al. 2017, Pudans-Smith et al. 2019).

In this textbook, we follow these prevailing modern trends by using signed languages and by not using the Deaf/deaf distinction. However, the alternatives are still widespread in linguistics writing, so you may still encounter them.

For these issues, it is important to proceed with caution and follow the lead of anyone more knowledgeable than you, especially if they are deaf. If you are uncertain what usage is appropriate in a given situation with a given deaf person, ask what they prefer.

The study of modality

Because spoken languages have long been the default object of study in linguistics, and because the vocal-auditory modality is centred on sound, the study of linguistic modality is called phonetics, a term derived from the Ancient Greek root φωνή (phōnḗ) ‘sound, voice’. However, all languages have many underlying similarities, so linguists have long used many of the same terms to describe properties of different modalities, even when the etymology is specific to spoken languages. This includes the term phonetics, which is now commonly used to refer to the study of linguistic modality in general, not just the vocal-auditory modality.

This is an important reminder that the etymology of a word may give you hints to its meaning, but it does not determine its meaning. Instead, the meaning of a word is determined by how people actually use that word (for more discussion of meaning, check out Chapter 7 on semantics). This usage-based meaning can diverge and even contradict historical etymology, especially in scientific fields where our knowledge of the world is constantly evolving.

An example of such a divergence between etymology and current usage for a scientific term can be seen with the English word atom, which comes from the Ancient Greek ἄτομος (átomos) ‘indivisible’. This term was used by Ancient Greek philosophers to represent their belief that atoms were the smallest building blocks of matter. However, more than 2000 years later, we discovered that atoms are in fact divisible, being made up of protons, neutrons, and electrons. Rather than rename atoms, we just kept the old name and accepted that its etymology was no longer an accurate representation of our current scientific knowledge. The same is true for the term phonetics.

However, be aware that many linguists still hold biased views about language and linguistics, and they often forget to include signed languages and other modalities when talking about phonetics or language in general. Some may even think signed languages cannot have phonetics at all. As linguists have become more knowledgeable about linguistic diversity and more sensitive to challenges faced by marginalized groups, there has been an ongoing shift towards increased inclusivity in how we talk about language. As with any such shift, some people will remain in the past, while others will be proactively part of the inevitable future.

In this chapter, we focus on articulatory phonetics, which is the study of how the body creates a linguistic signal. The other two major components of modality also have dedicated subfields of phonetics. Perceptual phonetics is the study of how the human body perceives and processes linguistic signals. We can also study the physical properties of the linguistic signal itself. For spoken languages, this is the field of acoustic phonetics, which studies linguistic sound waves. However, there is currently no comparable subfield of phonetics for signed languages, because the physical properties of light waves are not normally studied by linguists. Perceptual and acoustic phonetics are beyond the scope of this textbook.

Check your understanding

An interactive H5P element has been excluded from this version of the text. You can view it online here:


Checchetto, Alessandra, Carlo Geraci, Carlo Cecchetto, and Sandro Zucchi. 2018. The language instinct in extreme circumstances: The transition to tactile Italian Sign Language (LISt) by Deafblind signers. Glossa: A Journal of General Linguistics 3(1): 66(1–28).

Edwards, Terra, and Diane Brentari. 2020. Feeling Phonology: The conventionalization of phonology in protactile communities in the United States. Language 96(4): 819–840.

Hinnell, Jennifer. 2020. Language in the body: Multimodality in grammar and discourse. Doctoral dissertation, University of Alberta.

Holler, Judith, and Stephen C. Levinson. 2019. Multimodal language processing in human communication. Trends in Cognitive Sciences 23(8): 639–652.

Kusters, Annelies, Maartje De Meulder, and Dai O’Brien. 2017. Innovations in Deaf Studies: Critically mapping the field. In Innovations in Deaf Studies: The role of deaf scholars, ed. Annelies Kusters, Maartje De Meulder, and Dai O’Brien, Perspectives on Deafness, 1–56. Oxford: Oxford University Press.

Perniss, Pamela. 2018. Why we should study multimodal language. Frontiers in Psychology 9(1109): 1–5.

Pudans-Smith, Kimberly K., Katrina R. Cue, Ju-Lee A Wolsley, and M. Diane Clark. 2019. To Deaf or not to deaf: That is the question. Psychology 10(15): 2091–2114.

3.2 Speech articulators


Overview of the vocal tract

Spoken language is articulated by manipulating parts of the body inside the vocal tract, such as the lips, tongue, and other parts of the mouth and throat. The vocal tract is often depicted in a midsagittal diagram, a special kind of diagram that represents the inside of the head as if it were split down the middle between the eyes. Midsagittal diagrams are conventionally oriented as in Figure 3.2, with the nostrils and lips on the left and the back of the head on the right, so that we are viewing the inside of the human head from its left side. The main regions and individual articulators of the vocal tract labelled in Figure 3.2 are defined and described in more detail in the rest of this section and the following sections.

Midsagittal view of the vocal tract, facing left, with various body parts labelled.
Figure 3.2. Midsagittal diagram of the human vocal tract.

Open spaces in the vocal tract

There are three main open regions of the vocal tract. The oral cavity is the main interior of the mouth, taking up space horizontally from the lips backward. The pharynx is behind the oral cavity and tongue, forming the upper part of what we normally think of as the throat. Finally, the nasal cavity is the open interior of the head above the oral cavity and pharynx, from the nostrils backward and down to the pharynx.

The bottom of the pharynx splits into two tubes: the trachea (also known as the windpipe), which leads down to the lungs, and the esophagus, which leads down to the stomach. The esophagus is not normally relevant for phonetics, but the trachea is important, since the vast majority of our phones are articulated with air coming from the lungs, and as discussed later in Section 3.3, there are ways we can manipulate that airflow when it passes from the trachea to the pharynx.

Phones as a basic unit of speech

The pieces of the vocal tract can be articulated in various ways to create and manipulate a wide range of sounds. In the phonetics of spoken languages, we are primarily interested in studying units of speech called phones or speech sounds. It is difficult to provide a precise definition of what a phone is, either in general or for a specific spoken language, but roughly speaking, a phone in a spoken language is a linguistically significant sound, which means that can be used as part of an ordinary word in that language. For example, the ordinary English words spill, slip, lisp, and lips each contain four phones; in fact, these words have the same four phones, just in different orders (with some slight variation in how they are pronounced; see Chapter 4 for more information).

There are many other sounds we can produce with the vocal tract or even with other body parts, such as burps, snorts, finger snaps, etc., However, these are not typically studied in phonetics, because they are not known to be phones in any spoken language. However, even though they do not occur in ordinary words, they may still be used to express non-linguistic meaning. For example, in some cultures, snapping fingers can indicate quickness or a desire for attention.

Note that spoken languages may differ in how they use phones and whether they even use the same phones at all. For example, English speakers may use clicking sounds to express disapproval (the soft teeth-sucking tsk-tsk click) or to urge a horse to go faster (the loud popping giddyup click), but they are not phones in English, because they are not used within ordinary words. However, these same sounds do occur as phones in some other languages, such as Hadza (a language isolate spoken in Tanzania; Sands et al. 1996) and isiZulu (a.k.a. Zulu, a Southern Bantu language of the Niger-Congo family, spoken in southern Africa; Poulos and Msimang 1998).

We have to be careful about what kinds of words we look at to determine the phones of a language, because there are some marginal word-like expressions that can be used while speaking, but which may contain sounds that are not phones. For example, the English word ugh is often pronounced with a rough gravelly sound that is otherwise in not used in English, and we can say things like Kaoru noticed their car was making a glzzk-glzzk-glzzk sound, where glzzk is some impromptu sound produced to mimic the noise made by a vehicle in desperate need of repair.

One of the most fundamental distinctions between phones is whether they are consonants or vowels. The next three sections address how consonants and vowels are articulated and how they are described and categorized in meaningful ways by linguists.

Check your understanding

An interactive H5P element has been excluded from this version of the text. You can view it online here:


Poulos, George, and Christian T. Msimang. 1998. A linguistic analysis of Zulu. Pretoria: Via Afrika.

Sands, Bonny, Ian Maddieson, and Peter Ladefoged. 1996. The phonetic structures of Hadza. Studies in African Linguistics 25(2): 171–204.

3.3 Describing consonants: Place and phonation


Consonants as constrictions

Consonants are phones that are created with relatively narrow constrictions somewhere in the vocal tract. These constrictions are usually made by moving at least one part of the vocal tract towards another, so that they are touching or very close together. The moving part is called the active or lower articulator, and its target is called the passive or upper articulator. Vowels have wider openings than consonants, so they are not usually described with the terms used here; more appropriate terminology for vowel articulation is discussed in Section 3.5.

Active articulators

The active articulators we find in phones across the world’s spoken languages are listed below, in order from front to back. They are also labelled in the midsagittal diagram in Figure 3.3.

  1. the lower lip, which is used for the consonants at the beginning of the English words pin and fin
  2. the tongue tip (the frontest part of the tongue; also called the apex), which is used for the consonants at the beginning of the English words tin and sin
  3. the tongue blade (the region just behind the tongue tip; also called the lamina), which is used for the consonants at the beginning of the English words thin and chin
  4. the tongue front (the tip and blade together as a unit, also called the corona); it is useful to have a unified term for the tip and blade together, since they are so small and so close, and languages and even individual speakers of the same language may vary in which articulator is used for similar phones; for example, while many English speakers use the tongue tip for the consonant at the beginning of the word tin, other speakers may use the tongue blade or even the entire tongue front; however, while there may be variation in some languages, the distinction between the tip and blade is crucial in others, such as Basque (a language isolate spoken in Spain and France), which distinguishes the words su ‘fire’ and zu ‘you’, both of which sound roughly like the English word sue, with the tongue tip used for su and the tongue blade used for zu (although this distinction has been lost for some speakers under influence from Spanish; Hualde 2010)
  5. the tongue back (the upper portion of the tongue, excluding the front; also called the dorsum), which is used for the consonant at the beginning of the English word kin and gone
  6. the tongue root (the lower portion of the tongue in the pharynx; also called the radix), which is not used for consonants in English but is used for consonants in some languages, such as Nuu-chah-nulth (an endangered language of the Wakashan family, spoken in British Columbia)
  7. the epiglottis (the large flap at the bottom of the pharynx that can cover the trachea to block food from entering the lungs, forcing it to go into the esophagus instead), which is not used for consonants in English but is used in for consonants in some languages, such as Archi (a Lezgic language of the Northeast Caucasian family, spoken in Russia)
Midsagittal view of the active articulators, labelled from left going clockwise: lower lip, tongue tip, tongue blade, tongue front, tongue back, tongue root, and epiglottis.
Figure 3.3. Midsagittal view of the active articulators of the vocal tract.

Note that while the lower teeth could theoretically be an active articulator (we can move them towards the upper lip, for example), it turns out that no known spoken language uses them for this purpose, so we do not include them here.

Each of the active articulators has a corresponding adjective to describe phones with that active articulator. These adjectives are given in the list below, again from front to back:

  1. labial (articulated with the lower lip)
  2. apical (articulated with the tongue tip)
  3. laminal (articulated with the tongue blade)
  4. coronal (articulated with the tongue front)
  5. dorsal (articulated with the tongue back)
  6. radical (articulated with the tongue root)
  7. epiglottal (articulated with the epiglottis)

Thus, we could say that the English words pin and fin begin with labial consonants, while thin and chin begin with laminal consonants. Note that all apical and laminal consonants are also coronal, so thin and chin can also be said to begin with coronal consonants.

Passive articulators

The passive articulators we find in phones across the world’s spoken languages are listed below, in order from front to back. They are also labelled in the midsagittal diagram in Figure 3.4.

  1. the upper lip, which is used for the consonants at the beginning of the English words pin and bin
  2. the upper teeth, which is used for the consonants at the beginning of the English words fin and thin
  3. the alveolar ridge (the firm part of the gums that extends just behind the upper teeth, recognizable as the part of the mouth that often gets burned from eating hot food), which is used for the consonants at the beginning of the English words tin and sin (though some speakers may use the upper teeth instead or in addition)
  4. the postalveolar region (the back wall of the alveolar ridge), which is used for the consonants at the beginning of the English words shin and chin
  5. the hard palate (the hard part of the roof of the mouth; sometimes called the palate for short), which is used for the consonant at the beginning of the English word yawn
  6. the velum (the softer part of the roof of the mouth; also called the soft palate), which is used for the consonants at the beginning of the English words kin and gone
  7. the uvula (the fleshy blob that hangs down from the velum), which is not used for consonants in English but is used for consonants in some languages, such as Q’anjob’al (a language of the Mayan family, spoken in Guatemala)
  8. the pharyngeal wall (the back wall of the pharynx), which is not used for consonants in English but is used in languages that have consonants with the tongue root or epiglottis as an active articulator (such as Nuu-chah-nulth and Archi mentioned earlier)
Midsagittal view of the passive articulators, labelled from left going clockwise: upper lip, upper teeth, alveolar ridge, postalveolar region, hard palate, velum, uvula, and pharyngeal wall.
Figure 3.4. Midsagittal diagram of the passive articulators in the vocal tract.

Each of the passive articulators has a corresponding adjective to describe phones with that passive articulator. These adjectives are given in the list below, again from front to back:

  1. labial (articulated at the upper lip)
  2. dental (articulated at the upper teeth)
  3. alveolar (articulated at the alveolar ridge)
  4. postalveolar (articulated at the back wall of the alveolar ridge)
  5. palatal (articulated at the palate)
  6. velar (articulated at the velum)
  7. uvular (articulated at the uvular)
  8. pharyngeal (articulated at the pharyngeal wall)

Thus, we could say that the English words tin and sin begin with alveolar consonants, while kin and gone begin with velar consonants.

Since all consonants have two articulators, they could be described by either relevant adjective. For example, the consonant at the beginning of the English word shin could be described as a laminal consonant (because of its active articulator) as well as a postalveolar consonant (because of its passive articulator).

Note that the term labial is ambiguous in whether it refers to the lower or upper lip. In general, this ambiguity is not a problem, so labial consonants include those with the lower lip as an active articulator as well as those with the upper lip as a passive articulator.

Place of articulation

The overall combination of an active articulator and a passive articulator is called a consonant’s place of articulation, or simply place for short. Places of articulation are usually described with a compound adjective that refers to both articulators, with the adjective for the active articulator first (without the –al ending), then a linking –o-, followed by the adjective for passive articulator.

For example, the consonant at the beginning of the English word fin is a labiodental consonant, because it is articulated with the lower lip (labi-) at the upper teeth (-dental), as circled in the midsagittal diagram in Figure 3.5a. Similarly, the consonant at the beginning of the English word kin is dorsovelar, because it is articulated with the tongue back (dors-) at the velum (-velar), as circled in the midsagittal diagram in Figure 3.5b.

Midsagittal diagram of the lower lip approaching the upper teeth.
Figure 3.5a. Midsagittal diagram of a labiodental place of articulation.
Midsagittal diagram of tongue back making contact with the velum.
Figure 3.5b. Midsagittal diagram of a dorsovelar place of articulation.

Not all combinations of lower and upper articulators are used in the world’s spoken languages. Some are simply impossible. For example, ordinary humans cannot stretch their lower lip to reach all the way back to the pharyngeal wall, so there is no such thing as a labiopharyngeal consonant.

Other combinations are physically possible, but no spoken language is known to use them. For example, most people have no difficulty touching their tongue tip to the velum, but that articulation is awkward enough that no language seems to have apicovelar consonants. Of course, our knowledge of language is constantly expanding, so we could theoretically come across apicovelar consonants in some language someday, but it is unlikely.

Many of the compound adjectives for places of articulation are used frequently enough that they are normally replaced with a shorter adjective, often highlighting just the passive articulator. For example, apicoalveolar (tongue tip and alveolar ridge) is often shortened to just alveolar, because the tongue tip is more commonly used than the tongue blade when the alveolar ridge is the passive articulator. The full adjective can be used as needed to distinguish apicoalveolar from laminoalveolar (tongue blade and alveolar ridge), such as when a language has both kinds of alveolar consonants, as with Basque mentioned earlier.

Other shortened adjectives are used because one of the articulators is predictable from the other. For example, dorsovelar is often shortened to just velar, because no other active articulator is ever used with the velum as a passive articulator.

Other common shortened adjectives for places of articulation are listed below:

Note that palatal phones move the entire upper part of the tongue, both the front and the back, so they technically count as both coronal and dorsal. However, the tongue back is often considered the more important active articulator, and it is often the only active articulator listed for palatals.

Some other alternative adjectives are used in certain special cases. For example, consonants that use both lips as articulators, such as the consonants at the beginning of the English words pin and bin, are called bilabial rather than the inelegant term labiolabial. Note that for bilabial phones, both lips are involved roughly equally, with each actively moving toward the other as mutual targets. For the bilabial place of articulation, it is technically more accurate to say that the lips are the lower and upper articulators, but it is common to refer to the lower lip at the active articulator and the upper lip as the passive articulator.

Another special case are dental consonants. There are two main types: the default case in which the tongue blade is on or near the back of the teeth (as circled in Figure 3.6a), and those in which the tongue protrudes between the two sets of teeth, with the tongue blade below the bottom edge of the upper teeth (as circled in Figure 3.6b). As the default, the first type is called dental (as a shortening for laminodental), while the second type is called interdental. The consonants at the beginning of the English words thin and then are commonly articulated as interdental.

Midsagittal diagram of tongue blade making contact with the back of the upper teeth
Figure 3.6a. Midsagittal diagram of a dental (laminodental) articulation.
Midsagittal diagram of tongue front protruding between the upper and lower teeth, with the tongue blade making contact with the upper teeth.
Figure 3.6b. Midsagittal diagram of an interdental articulation.

Finally, there are also two main types of postalveolars: the default case in which the tongue blade is on or near the alveolar ridge (as circled in Figure 3.7a), and a special case in which the tongue tip curls backward, so that the tip points towards the hard palate and the underside of the tongue tip is at or near the back wall of the alveolar ridge (as circled in Figure 3.7b). As the default, the first case is called postalveolar, while the second case is called retroflex. The consonant at the beginning of the English word run is articulated as retroflex by some speakers, but there is much variation.

Midsagittal diagram of the tongue blade making contact with the back of the alveolar ridge.
Figure 3.7a. Midsagittal diagram of a postalveolar articulation.
Midsagittal diagram of the front of the tongue curled back, with the underside of the tongue tip making contact with the back of the alveolar ridge.
Figure 3.7b. Midsagittal diagram of a retroflex articulation.

Glottal articulation

There is one important place of articulation that is somewhat different from the ones discussed so far. At the top of the trachea is the larynx (or voice box). Inside the larynx are the vocal folds (or vocal cords), which are two membranes that stretch from front to back. The vocal folds are separated by an empty space, the glottis. These structures are shown from the side in the midsagittal diagram in Figure 3.8a and in more detail as viewed looking down the throat in Figure 3.8b.

Midsagittal diagram of the larynx, showing the vocal folds stretched from front to back.
Figure 3.8a. Midsagittal diagram of the larynx.
Top view of the larynx showing the epiglottis, front and back of the larynx, and the left and right vocal folds.
Figure 3.8b. Top view of the larynx.

Some consonant phones seem to consist only of articulation of the vocal folds, with no significant movement of any of the active articulators discussed previously. We see this with the first consonant in the English words he and who. Such phones are said to have a glottal or laryngeal place of articulation. Unlike other consonants, they have no clear active or passive articulator (since both vocal folds move equally) and no clear lower or upper articulator (since the vocal folds are at the same height).

Additionally, across the world’s spoken languages, we find that glottal consonants behave somewhat oddly, often acting more like vowels than consonants. Because of this, glottal consonants are sometimes said to have no inherent place of articulation of their own. Instead, they seem to pick of aspects of the articulation of the phones around them, especially vowels. You should be able to observe this in the first consonant of the English words he and who, especially in the configuration of the lips, which should be spread into a smile throughout he but rounded throughout who. See Section 3.5 for more information about lip configuration and other vowel properties.

Regardless of the problematic nature of the glottal place of articulation, it is still typically counted among the possible places of articulation for consonants, and we follow that practice here. A full table of all places of articulation discussed in this chapter is given in Table 3.1, showing the usual shortened adjective and the active and passive articulators (with none given for glottal, since neither vocal fold has a privileged status over the other).

Table 3.1. Places of articulation
place of articulation active articulator passive articulator
bilabial lower lip upper lip
labiodental lower lip upper teeth
dental / interdental tongue blade upper teeth
alveolar tongue tip alveolar ridge
postalveolar tongue blade postalveolar region
retroflex underside of the tongue tip postalveolar region
palatal tongue front and back hard palate
velar tongue back velum
uvular tongue back uvula
pharyngeal tongue root pharyngeal wall
epiglottal epiglottis pharyngeal wall

In addition to acting as a place of articulation for some consonants in some languages, the vocal folds are also used to regulate airflow through the vocal tract for most consonants and vowels in all spoken languages. In particular, when the vocal folds are configured in the right way, airflow through the glottis will cause the vocal folds to vibrate.

You can feel this vibration by placing your fingers on the front of your throat where the larynx is, while making the sound of a bee buzzing, like the sound of the consonant at the end of the English word buzz. If instead you make the sound of a snake hissing, like the sound of the consonant at the end of the English word bus, you should feel that there is no vocal fold vibration. Switch between buzzing and hissing to feel the change in the presence versus absence of vibration: zzzzz-sssss-zzzzz-sssss-zzzzz-sssss.

Vocal fold vibration is often called voicing, and a phone with vocal fold vibration is called voiced, while a phone without it is called voiceless or unvoiced. There are many other ways the vocal folds can play a role in shaping airflow. This larger category of manipulating airflow with the vocal folds in different ways is called phonation; you may sometimes see the term voicing refer to phonation generally, but this should be avoided, so that voicing can refer specifically to phonation in which the vocal folds vibrate. In this textbook, we will only discuss voiced and voiceless phonation.

Check your understanding

An interactive H5P element has been excluded from this version of the text. You can view it online here:


Hualde, J. Ignacio. Neutralización de sibilantes vascas y seseo en castellano. Oihenart 25: 89–116.

3.4 Describing consonants: Manner


Manners of articulation

Consonant phones can also be categorized by their manner of articulation (or manner for short), which is how air flows through the vocal tract, based on the size and shape of the constriction between the articulators.


The most basic manner of articulation is stop, in which the active articulator presses firmly against the passive articulator to make a complete closure, blocking all airflow at that point. There are many kinds of stops across the world’s languages. One important distinction for stops is based on the position of the velum. If a stop is articulated with a lowered velum to allow airflow into the nasal cavity, as shown in Figure 3.9a, then the resulting stop is called a nasal stop, or sometimes just a nasal for short. The English words met and net both begin with nasal stops. If instead the velum is raised against the upper pharynx to block off access to the nasal cavity, as shown in Figure 3.9b, then any airflow must go through the oral cavity only, and the resulting stop is called an oral stop. The English words pet and get both begin and end with oral stops.

[FIGURES 3.9a and b nasal versus oral stop]

The most common type of oral stops are plosives, which have airflow from the lungs that gets trapped behind the stop closure, until the air can be released in explosive release. Most spoken languages have plosives as the only oral stops, so the terms plosive and oral stop are often used interchangeably, but in more careful work, they are distinguished, because plosives are only one kind of oral stop. Other oral stops include ejectives (in which air is pushed up by raising the vocal folds rather than from the lungs), implosives (in which air is sucked in by lowering the vocal folds), and clicks (in which air is sucked in by quickly lowering the tongue).


If the active and passive articulators are very close but not touching, creating a very narrow constriction, airflow through this constriction becomes very turbulent, resulting in highly random noisy airflow called frication, which sounds like hissing or buzzing. A phone articulated this way is called a fricative. The English words set and vet begin with fricatives.


If the active and passive articulators are not touching and are spaced far enough apart to create little or no frication in the airflow, then the resulting phone is called an approximant. Most approximants have relatively unrestricted airflow through the middle of the oral cavity are called central approximants. However, during the articulation of an approximant, part of the tongue may instead make full contact with an upper articulator, causing the airflow to be diverted along one or both sides of the tongue, but still without frication. Such an approximant is called a lateral approximant. The English words yet and wet begin with central approximants, while the English word let begins with a lateral approximant.

These four manners of articulation are schematized in the diagrams in Figure 3.9, in which the flat bars across the top of each diagram represent the midsagittal view of some arbitrary passive articulator, such as the alveolar ridge or hard palate, while the rounded shapes represent some active articulator, such as the tongue tip or tongue back, and arrows represent the nature of the airflow during the consonant.

Figure 3.9. Manners of articulation, from left to right: stop, fricative, central approximant, and lateral approximant.

The stop (far left in Figure 3.9) has a complete closure between the two articulators, preventing airflow from getting past. The fricative (second from the left) has a narrow opening with tightly constrained, fricated airflow, indicated in Figure 3.9 with a wavy line in the airflow arrow. The central approximant (third from the left) has a wider aperture with relatively unobstructed airflow, indicated by the gently curved arrows. Finally, the lateral approximant (far right) also has a wide aperture, but with a small central obstruction that forces airflow to be diverted around the sides of the obstruction.


Normally, the stop closure for plosive is released relatively quickly, allowing the air to begin flowing almost immediately. However, it is also possible to release the closure slowly, so that a very brief fricative-like sound is created, causing the release of the plosive to be fricated. A plosive with such a fricated release is often referred to as an affricate, which is a fifth kind of manner of articulation. The English word jet begins with an affricate.

Careful study of the language is usually required to determine what is really going on when you encounter some sort of plosive followed by some frication. In this textbook, you will be told whether you are dealing with an affricate or not. It can be difficult to determine the difference between a true affricate versus a sequence of a true plosive followed by a true fricative, because they are both very similar. In some languages, the difference between an affricate and plosive-fricative sequence can change the meaning, as in English with ratchet (with an affricate) versus rat shit (with a plosive followed by a fricative). The distinction is marginal in English, but it is robust in other languages, such as Polish (a West Lechitic language of the Indo-European family, spoken in Poland). In Polish, the word czy ‘if, whether’ begins with an affricate, while trzy ‘three’ begins with plosive followed by a fricative. The pronunciation of these two Polish words can be heard in the following sound file, first czy with an affricate, then trzy with a plosive plus a fricative.

One or more interactive elements has been excluded from this version of the text. You can view them online here:

Although most affricates are articulated as plosives with a fricated release, it is possible for other kinds of oral stops to have fricated releases, so ejective affricates, implosive affricates, and click affricates also exist in some of the world’s spoken languages.

Other manners of articulation

There are many other manners of articulation that are beyond the scope of this textbook. The two most notable ones are taps (also called flaps) and trills. Taps are like stops, except that the closure is so short that airflow is barely interrupted. The consonant in the middle of the English word atom is articulated as a tap for most North American speakers.

Trills are like repeated taps, in which one articulator vibrates quickly against the other, usually 2–3 times. Most dialects of English do not have trills, though some speakers of Scottish English may have a trill for the first consonant of run. Some languages have both a tap and a trill, such as Spanish (a Western Romance language of the Indo-European family, spoken in Spain and its former colonies), which has a tap in the middle of the word pero ‘but’ and a trill in the middle of the word perro ‘dog’. The pronunciation of these two Spanish words can be heard in the following sound file, first pero with a tap, then perro with a trill.

One or more interactive elements has been excluded from this version of the text. You can view them online here:

Other classes of consonants

A few larger groupings of these manners of articulations are also often useful to talk about because of their common patterns in the world’s spoken languages (see Chapter 4 for more information). Oral stops, fricatives, and affricates together form the class of obstruents, which are defined by having an overall significant obstruction to free airflow in the vocal tract. Consonants with the remaining manners of articulation (nasal stops, approximants, taps, and trills) form the class of sonorants, which have fairly unrestricted airflow, either through the nasal cavity (for nasal stops) or through the oral cavity (for approximants, taps, and trills).

Fricatives and approximants, because of their continuous airflow through the oral cavity, can also be referred to collectively as the class of continuants (sometimes trills are grouped with the continuants as well).

Note that the terms sonorant and continuant are typically used to refer only to consonants, but it is sometimes useful to define these classes to include vowels as well.

Putting it all together!

We now have three different ways to talk about how a consonant phone is articulated: its place of articulation, its phonation, and its manner of articulation. We can put these three together to give a complete description of the most common consonant phones. There are many consonants that go beyond this three-part description and require a bit more information to be fully specified, but for the purposes of this textbook, these three categories will be sufficient.

Consider the consonant phone at the beginning of the English word met, which has the articulation shown in the midsagittal diagram in Figure 3.10, with crucial aspects of the articulation circled.

Midsagittal diagram showing the upper and lower lips touch, the velum lowered, and the vocal folds vibrating.
Figure 3.10. Midsagittal diagram of a voiced bilabial nasal stop.

This consonant involves articulation of both lips, so it has a bilabial place of articulation. While saying this consonant, our vocal folds vibrate, so it has a voiced phonation. Finally, the two articulators are pressed firmly together, allowing no airflow through the oral cavity, but the velum is lowered to allow airflow through the nasal cavity, so this consonant has a nasal stop manner of articulation.

Conventionally, these three components in the description of a consonant phone are put in the order phonation – place – manner, so the consonant at the beginning of the English word met would be fully described as a voiced bilabial nasal stop. Because it is a nasal stop, we can also further classify this consonant as a sonorant.

For another example, consider the consonant phone at the beginning of the English word set, which has the articulation shown in the midsagittal diagram in Figure 3.11, with crucial aspects of the articulation circled.

Midsagittal diagram showing the tongue front near the the alveolar ridge, the velum raised, and the vocal folds not vibrating.
Figure 3.11. Midsagittal diagram of a voiceless alveolar fricative.

This consonant involves active articulation of the front of the tongue; most speakers use the tongue tip, but some may use the tongue blade, either instead of the tip or in addition to it. The passive articulator of this phone can be hard to determine, since the front of the tongue is not touching it, but rather, is separated slightly from it. However, you can sometimes feel what the passive articulator is by breathing in instead of blowing out, because this can cause the passive articulator to become slightly cooler. If you do that with this consonant, you should feel the alveolar ridge getting cooler. Thus, this consonant has an alveolar place of articulation, and whether that place is the default apical or laminal depends on the individual speaker.

For this consonant, the vocal folds do not vibrate, so it has a voiceless phonation. Finally, the two articulators are separated very slightly, creating loud, very turbulent airflow, so this consonant has a fricative manner of articulation. Putting it all together, this consonant phone is a voiceless alveolar fricative, or voiceless apicoalveolar fricative, if we want to be overly precise, or for some speakers, a voiceless laminoalveolar fricative. Because it is a fricative, we can also further classify this phone as both an obstruent and a continuant.

Also note that for this consonant, the velum is raised and backed to block airflow from entering the nasal cavity. Fricatives are nearly always oral consonants, because having any airflow leak into the nasal cavity makes it difficult to produce high enough air pressure to force airflow through the narrow fricative opening.

We will look more closely at how to describe some more consonants at the end of this chapter, but for now, focus on understanding the definitions of the various terms that are used to fully describe a consonant’s phonation, place, and manner.

Check your understanding

An interactive H5P element has been excluded from this version of the text. You can view it online here:

3.5 Describing vowels


Vowel quality

Vowel phones can be categorized by the configuration of the tongue and lips during their articulation, which determines the vowel’s overall vowel quality. Vowel quality is often much more of a continuum than consonant categories like place and manner. A slight change in articulation makes little difference in what a vowel sounds like, but it can have a drastic effect on a consonant. For example, moving an active articulator away from a passive articulator by just a tiny bit, less than 1 mm, is enough to turn a stop into a fricative, but that same distance for a vowel will have no noticeable effect. However, we can still identify several broad categories of vowels based on dividing up this continuum into a few major regions.


Vowels are articulated with a larger opening in the oral cavity than approximants are, requiring the tongue to move much farther down. This is typically facilitated by also moving the jaw down to allow the tongue to move even lower. The height of the tongue during the articulation of a vowel is called vowel height, or simply height for short.

A vowel with a very high tongue position, as in the English word beat, is called a high vowel (some linguists instead call this a close vowel, but we will not use that terminology). High vowels have an opening just slightly larger than for approximants. Indeed, high vowels and approximants are often related in many languages, with one turning into the other in certain positions. Compare the different pronunciations of the final vowel of uni– in the English words unique (with a high vowel) and union (with an approximant).

A vowel with a very low tongue position, as in the English word bat, is called a low vowel (again, some linguists have a different term we will not use, calling these vowels open). Low vowels have the largest opening of any phone.

A vowel with an intermediate tongue position between high and low, as in the English word bet, is called a mid vowel. The differences in vertical tongue position for these three categories of vowel height are shown in Figure 3.12.

Three midsagittal diagrams, showing a high tongue position, a mid tongue position, and a low tongue position.
Figure 3.12. Three categories of vowel height: high, mid, and low.


The horizontal position of the tongue, known as its backness, also affects vowel quality. Backness could equally be called frontness, and sometimes this term is used, but backness is more standard and preferred. If the tongue is positioned in the front of the oral cavity, so that the highest point of the tongue is under the front of the hard palate, as for the vowel in the English word beat, the vowel is called a front vowel.

If the tongue is positioned farther back in the oral cavity, so that the highest point of the tongue is under the back part of the hard palate or under the velum, as in the English word boot, the vowel is called a back vowel.

If the tongue is positioned in the centre of the oral cavity, so that the highest point of the tongue is roughly under the centre of the hard palate, in between the positions for a front and a back vowels, as for the English word but, the vowel is called a central vowel. Be careful not to confuse the technical terms central and mid. Central refers to an intermediate position in backness, while mid refers to an intermediate position in height. These two terms are not interchangeable! The differences in horizontal tongue position for these three categories of vowel backness are shown in Figure 3.13.

Three midsagittal diagrams, showing a front tongue position, a central tongue position, and a back tongue position.
Figure 3.13. Three categories of vowel backness: front, central, and back.

Note that what counts as front for a vowel depends on its vowel height, because of how the jaw moves. Humans have a hinged jaw, which means that as the jaw moves down to allow for a lower tongue position, the jaw also swings backward, carrying the tongue with it. As the tongue moves backward due to this hinged movement, its centre position also moves backward, and it becomes more difficult for this lowered tongue to move as far forward as for a higher vowel.

In fact, the frontest position for a low vowel (as in the English word bat) typically has an actual overall backness a bit farther back than for a front high vowel (as in the English word beat). Thus, backness must be defined relative to the possible range of horizontal positions at a given height, rather than being defined in absolute terms with respect to the roof of the mouth. This results in a skewed shape of the possible combinations of vowel height and backness, with more room for differences in backness for high vowels than for low vowels.

This is often graphically represented as in Figure 3.14, with the total vowel space drawn as an asymmetric quadrangle, like a rectangle with the bottom left corner cut off. This missing corner represents the space where we cannot produce a vowel because of the how the range of possible horizontal positions for backness differs based on vowel height, with higher vowels able to have fronter absolute positions than lower vowels can. A few example words of English are listed in Figure 3.14 as rough indications for what tongue position many speakers use for the vowels in these words.

Quadrangle with lower left corner cut off, divided into nine cells, labelled front, central, and back across the top, and high, mid, and low down the right side. Inside the cells are the English words: beat in the front high cell, bait in the front mid cell, bat in the front low cell, but in the central mid cell, boot in the back high cell, boat in the back mid cell, and bot in the back low cell.
Figure 3.14. Standard vowel quadrangle with example English words.

The cells in this quadrangle represent possible positions of the tongue within the oral cavity. For example, beat is shown in the high front cell, which indicates that it is pronounced with a high front tongue position. Note that there is much variation in English vowels across speakers, so the positions in Figure 3.14 are only meant to be suggestive. The positions of the tongue for the vowels in these words may be somewhat different for you or other speakers. For example, some speakers may have a low or back vowel for but, and some may have a more central vowel for bot or boat.


Vowel quality also depends on the shape of the lips, generally referred to as the vowel’s rounding. If the corners of the mouth are pulled together so that the lips are compressed and protruded to form a circular shape, as for the vowel in the English word boot in many dialects, the lips are said to be rounded and the corresponding vowel is called a round or rounded vowel.

If the corners of the mouth are pulled apart and upward so that the lips are thinly stretched into a shape like a smile, as for the vowel in the English word beat, the lips are said to be spread.

The lips may also be in an intermediate configuration, neither rounded nor spread, as for the vowel in the English word but, in which case, the lips are said to be neutral. Spread and neutral vowels are collectively referred to as unrounded or non-rounded vowels, because the distinction between spread and neutral lips seems never to be needed in any spoken language, whereas the distinction between rounded and unrounded frequently is. The differences in lip shape for these three categories of vowel rounding are shown in Figure 3.15.

Three sets of lips in different configurations: round, neutral, and spread. Neutral and spread are also both labelled as unrounded.
Figure 3.15. Three categories of rounding.


The position of the tongue root may also play a role in vowel quality. If the tongue root is advanced forward away from the pharyngeal wall, as for the vowel in the English word beat, it pushes into the rest of the tongue and causes the tongue to be somewhat denser and firmer overall, so a vowel with an advanced tongue root is sometimes called a tense vowel. If the tongue root is instead in a more retracted position closer to the pharyngeal wall, as for the vowel in the English word bit, it keeps the tongue somewhat more relaxed, so a vowel with a retracted tongue root is sometimes called a lax vowel. The property of whether a vowel is tense or lax is called tenseness. The different positions of the tongue root for tense and lax vowels are shown in Figure 3.16.

Two midsagittal diagrams, the left with an advanced tongue root and the right with a retracted tongue root.
Figure 3.16. Midsagittal diagrams showing an advanced tongue root for tense vowels (left) and a retracted tongue root for lax vowels (right).

For many spoken languages, vowel tenseness is not a relevant property. Languages like Taba (a.k.a. East Makian, a Central-Eastern Malayo-Polynesian language of the Austronesian family, spoken in Indonesia) have only five vowels that are spread quite far apart. There is only one high front vowel, one mid front vowel, etc. These vowels can vary in how tense or lax they might be, so there is no need to use the terminology tense and lax to describe them.

However, other languages have more complex vowel systems, with vowel pairs articulated in roughly the same way, except for tenseness. For example, most dialects of English have multiple pairs of vowels that are distinguished primarily by tenseness, such as the vowels in beat and bit. Both of them are high front and unrounded, but the beat vowel is tense, while the bit vowel is lax. Similarly, the vowels of the words bait and bet are both front mid and unrounded, but the bait vowel is tense, while the bet vowel is lax. For languages like English, the tense/lax terminology is often necessary.

That said, low vowels are very rarely tense in any language, because the tongue lowering and tongue root advancing are making almost contradictory demands on the tongue. However, tense low vowels are not physically impossible, so there are still some languages that have them, such as Akan (a Kwa language of the Niger-Congo family, spoken in Ghana), which has both a tense and a lax low vowel.


In Section 3.4, we talked about how the velum can move to make a distinction in oral and nasal stops based on whether or not air can flow into the nasal cavity. Moving the velum can made the same distinction for vowels. If a vowel is articulated with a raised velum to block airflow into the nasal cavity, the vowel is called oral. If instead the velum is lowered, allowing airflow into the nasal cavity, the vowel is called nasal or nasalized. The property of whether a vowel is oral or nasal is called its nasality. The different positions of the velum for oral and nasal vowels are shown in Figure 3.17, with arrows indicated direction of airflow. Note that for nasal vowels, there is airflow through both the oral and nasal cavities.

Two midsagittal diagrams, showing oral only airflow due to a raised velum (left) and simultaneous oral and nasal airflow due to a lowered velum (right).
Figure 3.17. Midsagittal diagrams showing a raised velum for oral vowels (left) and a lowered velum for nasal vowels (right).


In addition to differences in vowel quality and nasality, vowels may also differ from each other in length, which is a way of categorizing them based on their duration. In most languages where vowel length matters, there is just a two-way distinction between long vowels and short vowels, with long vowels having a longer duration than their short counterparts. For example, in Japanese (a Japonic language spoken in Japan), the word いい ii ‘good’ has a long vowel, while the word 胃 i ‘stomach’ has a short vowel, although they both have the same vowel quality: they are both high front unrounded vowels. The pronunciation of these two Japanese words can be heard in the following sound file, first いい ii with a long vowel, then 胃 i with a short vowel.

One or more interactive elements has been excluded from this version of the text. You can view them online here:

In most dialects of English, vowel length is not used to distinguish words with completely different meanings like it is in Japanese. However, English vowels can still differ in vowel length in some circumstances. For example, English vowels are often pronounced a bit longer before voiced consonants than before voiceless consonants. Thus, the vowel in the English word bead is usually pronounced longer than the vowel in the word beat, although they both have the same vowel quality: high front unrounded. The tense vowels of English also tend to inherently be a bit longer than their lax counterparts. For example, the tense vowel in the English word beat is longer than the lax vowel in bit.

Consonants may also differ from each other in length. Long consonants are often called geminates, while short consonants are called singletons. English does not really make regular use of consonant length, though there are some marginal examples for some speakers, such as unnamed (with a geminate alveolar nasal stop) versus unaimed (with a singleton alveolar nasal stop). However, many other languages have widespread distinctions based on consonant length.

For example, geminates and singletons are contrasted in Hindi (a Central Indo-Aryan language of the Indo-European family, spoken in India). Hindi has word pairs like सम्मान sammān ‘honour’ (with a geminate bilabial nasal stop in the middle of the word) versus समान samān ‘equal’ (with a singleton bilabial nasal stop in the middles of the word). The pronunciation of these two Hindi words can be heard in the following sound file, first सम्मान sammān with a geminate consonant, then समान samān with a singleton consonant.

One or more interactive elements has been excluded from this version of the text. You can view them online here:

Multiple vowel qualities in sequence

Many vowels of the world’s spoken languages have a relatively stable pronunciation from beginning to end. These kinds of stable vowel phones are called monophthongs. However, just as there are dynamic consonant phones (affricates), vowel phones may also change their articulation from beginning to end. Most of these are diphthongs, which begin with one specific articulation and shift quickly into another, as with the vowel in the English word toy, which begins with a mid back round quality but ends high front and unrounded. As with affricates, it can be difficult to determine whether a given change in vowel quality is best treated as a true diphthong or instead as a sequence of two separate vowels.

Some languages can even have triphthongs, which are vowel phones that change from one vowel quality to another and then to a third, as in rượu ‘alcohol’ in Vietnamese (a Viet-Muong language of the Austronesian family, spoken in Vietnam and China). The word rượu has a vowel phone that begins with a high central unrounded quality, then lowers to a mid position, and then finally ends in a high back position with rounding. The pronunciation of this Vietnamese word can be heard in the following sound file.

One or more interactive elements has been excluded from this version of the text. You can view them online here:

Putting it all together!

There is not as much consistency in the order of descriptions for vowels as for consonants. Perhaps the most common order is heightbacknessrounding, but rounding is sometimes given first instead, and though height is usually given immediately before backness, these can also be switched. Thus, the vowel in the English word bat might be described as a low front unrounded vowel, as an unrounded low front vowel, as a front low unrounded vowel, or as an unrounded front low vowel. All of these would be considered correct, and other combinations may be used.

When descriptions of nasality are needed, they almost always placed after the description of vowel quality. Thus, the vowel in the English word ban might be described as a low front unrounded nasal vowel, as an unrounded low front nasal vowel, as a front low unrounded nasal vowel, or as an unrounded front low nasal vowel. Other combinations are also possible.

If descriptions of tenseness and/or length are needed, these are often placed before the other descriptions, but sometimes either or both may be placed after vowel quality, but usually still before the position for the description for nasality. Thus, the vowel in the English word bean could be described as a long tense high front unrounded nasal vowel, as a tense high front unrounded long nasal vowel, as an unrounded high front long tense nasal vowel, or many other combinations!

Check your understanding

An interactive H5P element has been excluded from this version of the text. You can view it online here:

3.6 The International Phonetic Alphabet



Note that we have been talking about phones as if it is obvious what they are, but this is not always the case. It is sometimes easy to find a clear separation between the phones in a given word, that is, to segment the word into its component phones, but sometimes, it can be very difficult. We can see this by looking at waveforms, which are special pictures that graphically represent the air vibrations of sound waves. The two waveforms in Figure 3.18 show a notable difference in how easy it is to segment the English words nab and wool.

Two waveforms. Left waveform for the word nab is segmented into three distinct regions, labelled n, a, and b. The right waveform for the word wool has no clear segmentation between w, oo, and l.
Figure 3.18. Waveforms for the English words nab and wool.

The waveform for nab contains abrupt transitions between three very different regions corresponding to three phones, while the waveform for wool has smooth transitions from beginning to end, with no obvious divisions between phones.


When we can identify the individual phones in a word, we want to have a suitable way to notate them that can be easily and consistently understood, so that the relevant information about the pronunciation can be conveyed in an unambiguous way to other linguists. Such notation is called a transcription, which may be very broad (giving only the minimal information needed to contrast one word with another), very narrow (giving a large amount of fine-grained phonetic detail), or somewhere in between. Whether broad, narrow, or in between, phonetic transcription is conventionally given in square brackets [ ], so that, for example, the consonant at the beginning of the English word nab could be transcribed as [n], with the understanding that the symbol [n] is intended to represent a voiced alveolar nasal stop.

As linguists, we are interested in studying and describing as many languages as we can, so we want to use a transcription system that can be used for all possible phones in any spoken language. This means we cannot simply use any one existing language’s writing system, because it would be optimized for representing only the phones of that language, so it would not have easy ways to represent phones from other languages.

In addition, many writing systems are filled with inconsistencies and irregularities that make them unsuitable for any kind of rigorous and unambiguous transcription. For example, the letter <a> in the English writing system is used to represent different phones in the words nab, father, halo, and diva, while the phone represented by the letter <i> in the word diva is represented with different letters or letter combinations in other words: <ee> in meet, <ea> in meat, <e> in me, and <y> in mummy.

Note that symbols from a writing system are represented here with surrounding angle brackets < >. This is a common notational convention in linguistics that helps visually distinguish symbols in a writing system from symbols used for the transcription of phones, which are enclosed in square brackets.

Furthermore, even if English spelling were perfectly regular, many specific English words can be pronounced in different ways, such as either and route, which have different equally valid pronunciations. This kind of variation is particularly common between different dialects.

For example, the word mop has a vowel that is typically pronounced differently by speakers from Los Angeles (with the tongue low and back in the mouth), London (similar to the Los Angeles vowel, but with some lip rounding), and Chicago (more central in the mouth, making mop sound nearly like map to other speakers). If we tried to describe in writing how to pronounce a vowel from another language, but we said that was pronounced the same as in the English word mop, we could not guarantee that the reader would know whether the vowel is back and unrounded, back and round, or central and unrounded.

The International Phonetic Alphabet

To avoid these problems, linguists have devised more suitable transcription systems, each with their own strengths and weaknesses. In this textbook, we will use a widespread standard transcription system called the International Phonetic Alphabet (IPA). The IPA was created by the International Phonetic Association (unhelpfully also abbreviated IPA), which was founded in 1886. The first version of the IPA transcription system was published shortly after, and it has undergone many revisions since then as our knowledge and understanding of the world’s spoken languages have evolved. The most recent symbol was added in 2005: [ⱱ] for the labiodental tap, a phone found in many languages of central Africa, such as Mono, a Central Banda language of the Ubangian family, spoken in the Democratic Republic of the Congo.

For reference, the full chart for the IPA is given in Figure 3.19. This chart is available under a Creative Commons Attribution-Sharealike 3.0 Unported License, copyright © 2020 by the International Phonetic Association. It is also available online at the IPA’s homepage, and there are also some online versions that are accessible for screenreaders, such as this one.

Full International Phonetic Alphabet chart.
Figure 3.19. Full chart of all symbols in the International Phonetic Alphabet.

Learning the IPA takes a lot of time, practice, and guidance. But learning the IPA is not just about memorizing symbols. The underlying structure and principles behind the organization of the table is what matters. The IPA is like the periodic table of elements in this way. It is helpful to know that Na is the chemical symbol for sodium, or that [m] is the IPA symbol for a voiced bilabial nasal stop, but it is much more important to know what these concepts are. What is sodium? What does it mean for a phone to be voiced? How is the vocal tract configured for a bilabial nasal stop?

This is why this chapter has focused on defining concepts, so that you can build a solid foundation in understanding how phones are articulated. The notation is secondary to that.

Using the IPA

A full discussion of how to use the IPA is beyond the scope of an introductory textbook like this one. Here, we discuss a few guidelines and some concrete examples from English. For any transcription, it is important to keep in mind who your audience is and what the purpose of the transcription is. Most of the time, we normally only need a fairly broad transcription to get across a basic idea of the most important aspects of the articulation.

One important guiding recommendation from the IPA for broad transcription is to use the typographically simplest transcription that still conveys the most crucial information. That is, when possible, choose symbols like upright ordinary Roman symbols like [a] and [r] rather than their inverted counterparts [ɐ] and [ɹ]. Ordinary symbols are easier to type, easier to read, and more reliable in how they are displayed in different fonts.

Another aspect of typographic simplicity is to avoid diacritics, which are special marks like [  ̪] and [ʰ] that are placed above, below, through, or next to a symbol to give it a slightly different meaning. These are often necessary for certain contexts, but sometimes, they are superfluous.

Typographic simplicity is good practice when dealing with a lot of variation that is not relevant to the main point. For example, the English consonant typically spelled by the letter <r> is pronounced in many different ways by different speakers. Many North Americans have some sort of central approximant, but it varies from alveolar [ɹ], to postalveolar [ɹ̱], to retroflex [ɻ]. Some speakers may also have a pharyngeal constriction, indicated in the IPA with a superscript [ˁ] diacritic after the symbol. Some speakers may also have lip rounding, indicated in the IPA with a superscript [ʷ] diacritic after the symbol. Some speakers may have both pharyngealization and rounding!

That’s at least twelve different possible articulations, each with its own transcription in the IPA, depending on the place  of articulation, and whether or not there is pharyngealization and/or rounding. The IPA symbols for these twelve possibilities are given in the list below; for each, the symbols are in order by place articulation: alveolar, postalveolar, and retroflex.

Furthermore, looking at English more broadly, there are many other pronunciations beyond those in North American varieties, such as an alveolar tap [ɾ] or trill [r] in Scotland, a voiced uvular fricative [ʁ] in Northumbria, and a labiodental approximant [ʋ] in London.

Thus, when transcribing English, there is no one single symbol that accurately represents the pronunciation of this consonant, so [r] is a reasonable choice because of its typographic simplicity. Of course, when transcribing a specific articulation from a specific speaker, it may make sense to use a more precise symbol, especially if the details of the articulation are important. But generally speaking, a plain [r] is normally fine for English, though some linguists may prefer to use [ɹ] or [ɻ] for North American English, even though there are at least a dozen equally valid North American pronunciations. If you are taking a course in linguistics, be sure to follow the standards and conventions set by your instructor.

Why is there so much variation in the pronunciation of English <r>? These phones belong to an unusual class called rhotics, named after the Greek letter rho ρ, which itself represents a rhotic phone. Across the world’s languages, we find a lot of variation in rhotics. Many languages only have one rhotic, but which particular rhotic they have can be very different from related or neighbouring languages. The pronunciation of rhotics in a language can also shift over time, especially if the language only has one, as English does. There seems to be no single overarching phonetic similarity in the various rhotics, and linguists are still trying to figure out what makes this class of consonants so special.

However, even when the pronunciation of a given phone is fairly consistent across speakers, many linguists still choose a typographically simpler transcription. Consider the consonant at the beginning of the English word chin, which is a voiceless postalveolar affricate. Affricates in the IPA are normally transcribed by writing the corresponding plosive symbol for the stop closure, followed by the corresponding fricative symbol for the fricated release, both united under a tie-bar [  ͡  ].

The symbol for a voiceless postalveolar affricate is [ʃ]. We can find this in the IPA chart by looking in the section devoted to consonants. Places of articulation are listed across the top, while manners of articulation are listed down the left. Within a given cell, if there are two symbols, the one on the left is voiceless, and the one on the right is voiced. So looking in the postalveolar column and the fricative row, we find the symbols [ʃ] and [ʒ], and since we are interested in the voiceless fricative, we pick the symbol [ʃ].

However, there is no basic symbol for a voiceless postalveolar plosive in the IPA. That part of the chart is blank, so we have to create our own symbol by using the base symbol for a similar phone and adding one or more diacritics. In this case, we can use alveolar [t] and put a retraction diacritic [  ̱] under it to indicate that its place of articulation is slightly farther back, as we did for the postalveolar central approximant [ɹ̱] above. Thus, we get [ṯ] as the symbol for a voiceless postalveolar plosive.

In addition, most English speakers also pronounce this affricate with some amount of lip rounding, so a fully accurate transcription would be something like [ṯ͡ʃʷ]. But hardly any linguist transcribes this affricate with that much phonetic detail. It is almost never relevant that it is round, and the postalveolar location of the stop closure is implied by the fact that it has a postalveolar release; you cannot release a stop closure in a position different from where it is made. So the affricate is more commonly transcribed as [t͡ʃ]. As with [r] for the English rhotic, [t͡ʃ] is not technically accurate for most speakers, but it is typographically simpler and conveys all the crucial information needed to understand the transcription. The tie-bar on the affricate may also sometimes be left off in transcriptions, so [tʃ] is also a common transcription for this affricate.

Even without these issues, there is still usually no such thing as “the” correct transcription. Two pronunciations of the same word will always have some differences, because we live in a physical world where we cannot avoid slight imperfections and fluctuations, and we cannot capture all of those differences with the IPA. It is simply not designed for that level of phonetic detail. When such detail is important, it needs to be conveyed in other ways, such as with diagrams and numerical measurements (loudness in decibels, duration in milliseconds, etc.).

Transcribing English with the IPA

Despite all these pitfalls, it is still important to get some basic skill in transcription, and since this textbook is presented in English, English is a good starting point to give you something concrete in which to ground your understanding of transcription. However, this is much dialectal variation, so the transcriptions offered here are very general and may differ from English you are familiar with. We begin with consonants, where there is less variation across dialects.

Table 3.2 lists some plosives of English, with their IPA symbol (keeping in mind the principle of simplicity) and words containing each plosive in various positions, where possible. A full phonetic description is also given. The portion of the spelling that corresponds to the phone is in bold.

Table 3.2. English plosives and affricates.
symbol example
[p] pan rapid lap voiceless bilabial plosive
[b] ban rabid lab voiced bilabial plosive
[t] tan atop let voiceless alveolar plosive
[d] den adopt led voiced alveolar plosive
[t͡ʃ] chin batches rich voiceless postalveolar affricate
[d͡ʒ] gin badges ridge voiced postalveolar affricate
[k] can bicker lack voiceless velar plosive
[ɡ] gain bigger lag voiced velar plosive
[ʔ] uhoh voiceless glottal plosive

Most of these are straightforward. As discussed in Section 3.3, the alveolar consonants are normally apicoalveolar, but some speakers may pronounce them with the blade of the tongue. If that detail is necessary, they can be transcribed as [t̻] and [d̻], using the laminal diacritic. Alternatively, some speakers may pronounce them on the back of the teeth, in which case they would be transcribed as [t̪] and [d̪], using the dental diacritic.

The glottal plosive (also frequently called a glottal stop) is only a marginal consonant in English, showing up as the catch in the throat in the middle of the interjection uh-oh. Some speakers also have it elsewhere, such as in the middle of the British English pronunciation of the word bottle. It is articulated by making a full stop closure with the vocal folds, blocking all airflow through the glottis.

Table 3.3 lists some fricatives of English.

Table 3.3. English fricatives.
symbol example
[f] fan wafer leaf voiceless labiodental fricative
[v] van waver leave voiced labiodental fricative
[θ] thin ether truth voiceless interdental fricative
[ð] than either smooth voiced interdental fricative
[s] sin muscle bus voiceless alveolar fricative
[z] zone muzzle buzz voiced alveolar fricative
[ʃ] shin Haitian rush voiceless postalveolar fricative
[ʒ] Asian rouge voiced postalveolar fricative
[h] hen ahead voiceless glottal fricative

The most notable variation here is that some speakers do not have [θ] and [ð], and instead used [t] and [d] or [f] and [v], depending on the dialect and the position in the word. As with the postalveolar affricates mentioned before, the postalveolar fricatives are also usually somewhat rounded, so they could be more narrowly transcribed as [ʃʷ] and [ʒʷ]. The voiced postalveolar fricative [ʒ] is also one of the rarest consonants in English, and many speakers pronounce it as an affricate in some positions instead of a fricative.

Table 3.4 lists some sonorants of English. Note that the sonorants of English are generally voiced, so that is not listed here. Across the world’s spoken languages, sonorants are tend to be voiced by default, because their high degree of airflow causes the vocal folds to spontaneously vibrate if extra effort is not put in to keep them still.

Table 3.3. English sonorants.
symbol example
[m] man simmer ram bilabial nasal stop
[n] nun sinner ran alveolar nasal stop
[ŋ] singer rang velar nasal stop
[l] lane folly ball alveolar lateral approximant
[r] run sorry bar (various! see earlier discussion)
[j] yawn reuse palatal central approximant
[w] won awake labial-velar central approximant

A few of these sonorants warrant special attention. The alveolar nasal stop [n] has much of the same variation as the alveolar plosives, with some speakers having a laminoalveolar articulation [n̻] and some having a dental articulation [n̪]. The velar nasal stop is often one of the most surprising phones of English to people who are new to phonetics, because is not easily identifiable as its own phone. Many people are mislead by the spelling and think they say words like singer with [ɡ], but in fact, most speakers have only a nasal there, so that singer differs from finger, with singer having only [ŋ] and finger having [ŋɡ]. However, there are speakers who do genuinely pronounce all words like these with a [ɡ] after the nasal stop, but even then, the nasal stop they have is still velar [ŋ], not alveolar [n].

A notable consonant here is [w], which is special among the consonants of English in being doubly articulated, which means that it has two equal places of articulation. It is both bilabial (with an approximant constriction between the two lips) and velar (with a second approximant constriction between the tongue back and the velum). Its place of articulation is usually called labial-velar. English used to have two labial-velar approximants, a voiced [w] and a voiceless [ʍ]. Very few speakers today have both of these, but those who do pronounce the words witch and which differently, with voiced [w] in witch and voiceless [ʍ] in which.

Next, we can move to the vowels. This is where much of the variation in pronunciation occurs across English dialects, and fully describing all the vowels across English could take up a textbook of its own. Table 3.5 lists some monophthongs of English, with a focus on the English vowels as they are broadly pronounced across North American dialects. However, there is still much variation just in North America, and this discussion should not be taken to represent any particular speaker or region, let alone any sort of idealized standard or target. This is simply a convenient abstraction that provides a useful baseline, though it is still only a very rough guide, and individual speakers can vary quite a lot from what is discussed here. The vowels of English are generally all voiced and oral, so that is not listed here. Example words are given that show the vowel in a stressed syllable, an unstressed syllable, and at the end of the word (see Sections 3.10 and 3.11 for more about syllables and stress).

Table 3.4. English monophthongs.
symbol example
[i] beater radius see high front unrounded tense
[ɪ] bitter high front unrounded lax
[e] baker say mid front unrounded tense
[ɛ] better mid front unrounded tense
[æ] batter low front unrounded
[ɑ] father saw low back unrounded
[ɒ] bonnet saw low back round
[ɔ] border saw mid back round lax
[o] boater sew mid back round tense
[ʊ] booker hiɡh back round lax
[u] boomer manual sue hiɡh back round tense
[ʌ] butter mid central unrounded lax
[ə] animal sofa mid central unrounded lax

As noted before, there is a lot of variation that cannot be adequately discussed here, so we only cover a few notable deviations. First, while many speakers pronounce the four tense vowels as monophthongs as transcribed here, many speakers pronounce some or all of them as diphthongs instead, perhaps even having an approximant at the end rather than a vowel. For example, high front unrounded tense [i] may be pronounced more like [ɪi] or [ij] by some speakers. It is especially common for the two tense mid vowels to be pronounced as diphthongs, something like [eɪ] and [oʊ] or perhaps [ej] and [ow].

Many of the back round vowels, especially [ʊ], are fronter and/or unrounded for some speakers in some dialects. The back vowels in bore and bought are pronounced similarly to each other by some North Americans, and so they are often represented with the same symbol [ɔ], though note there may still be some differences, with [ɔ] before a rhotic often pronounced somewhat higher, closer to [o]. However, many speakers in Canada and in the western United States have a very different vowel in bought from bore. Their bought vowel is much lower, and for some speakers, it is also unrounded. These speakers use the same low vowel in bought that they use in bot. For most North Americans, the low vowels in bot and balm are pronounced the same, either as back round [ɒ] or back unrounded [ɑ]; in some dialects, it may be central unrounded [a]. Others have two different vowels for these words, usually [ɒ] in bot and [ɑ] or [a] in balm. Needless to say, this part of the vowel system of English is particularly troublesome, and even many expert linguists get aspects of it wrong.

The two central vowels [ʌ] and [ə] are often treated as related pronunciations of the same vowel, based on whether or not they occur in a stressed syllable (again, see Sections 3.10 and 3.11 for more about syllables and stress). For now, just note that some vowels of English are pronounced louder and longer than others, which we call “stress”, while the other vowels are said to be unstressed. We can see the difference in stress in pairs like billow and below, which differ mostly in which syllable is stressed: the first syllable in billow and the second syllable in below. The two central vowels of English differ in stress: the first syllable of the name Bubba is stressed, and the second is unstressed, so we might transcribe this name as [bʌbə]. However, these two vowels sound very similar and could easily be notated with the same symbol [ə]. However, there is a long tradition of notating the unstressed central vowel of English with [ə] and the stressed central vowel with [ʌ], based on historical pronunciations in which the stressed vowel used to be pronounced much farther back (and still is, in some dialects).

Finally, we consider diphthongs and syllabic consonants, which are phones that have consonant-like constrictions, but which function more like vowels within English. Some diphthongs and syllabic consonants of English are given in Table 3.5.

Table 3.5. English diphthongs and syllabic consonants.
symbol example
[aɪ] biter sigh low central unrounded to high front unrounded diphthong
[aʊ] browner how low central unrounded to high back round diphthong
[ɔɪ] boiler soy mid back round to high front unrounded diphthong
[r̩] burning interval sir syllabic rhotic
[l̩] hazelnut saddle syllabic alveolar lateral approximant
[n̩] calendar sudden syllabic alveolar nasal stop
[m̩] bottomless seldom syllabic bilabial nasal stop

For the diphthongs, the symbols used here represent a rough average over where they typically start and end, but the actual pronunciation varies quite a lot from speaker to speaker and even for the same speaker. The low starting point for [aɪ] and [aʊ] may be closer to [ɑ] or [æ], the mid back starting point for [ɔɪ] may be closer to [o], the hiɡh front ending point for [aɪ] and [ɔɪ] may be closer to [i] or [j], and the hiɡh back ending point for [aʊ] may be closer to [u] or [w].

Syllabic consonants are transcribed by using the syllabic diacritic [ˌ] under the relevant consonant symbol. However, sometimes these are transcribed with a preceding [ə], so that hazlenut could be transcribed either as [hezl̩nʌt] or as [hezəlnʌt]. Syllabic rhotics (also called rhotacized vowels or r-coloured vowels) are so common that they have their own dedicated symbols: [ɝ] for stressed syllables and [ɚ] for unstressed syllables. Thus, burning could be transcribed as [br̩nɪŋ] or [bɝnɪŋ], while interval could be transcribed as [ɪntr̩vl̩] or [ɪntɚvl̩].

With all of this variation, not just in pronunciation, but in transcription choices by individual linguists, it can be difficult to figure out what is really intended by a given transcription. This is why when exact phonetic details matter, it is a good idea not to rely just on the IPA, but to include prose descriptions, midsagittal diagrams, and other tools that can help clarify exactly what is meant.

Check your understanding

An interactive H5P element has been excluded from this version of the text. You can view it online here:

3.7 Signed language articulators


The phonetic units of signed languages

As discussed in Section 3.1, articulatory phonetics is concerned with how the body produces a linguistic signal, regardless of modality. We do not normally want to describe the overall articulation of an entire word in spoken language, so we break it down into phones for easier discussion. So what is the comparable unit for signed languages?

In signed languages, the basic independent meaningful unit, the equivalent of a spoken language word, is generally an individual sign. But signs do not seem to have a direct equivalent of phones. Phones can generally be spoken on their own, separate from any other phones. For example, we can take any of the individual phones in the English word [bɛd] ‘bed’, and say each one separately. It may be awkward, especially for plosives like [b] and [d], but it is not impossible. The independence of phones from words and their ability to be recombined in different ways is a key property of spoken languages.

But the corresponding sign BED in ASL cannot quite be broken down in the same way. For example, no matter what we try to do with our hand, it always exists in some shape and some location. We cannot just blank out the fingers or put the hand in some mysterious null dimension. So if we want to shape the hand in some particular way, we must necessarily do so somewhere, and similarly, if we want to put the hand in some location, we must necessarily also configure the fingers somehow.

This is how the articulatory properties of phones work, too. We cannot make an articulation with the tongue tip and alveolar ridge without also deciding how far apart they are, and we cannot articulate a manner of articulation without choosing an active and passive articulator, and thus, without choosing a place of articulation. That is, properties of phones like place and manner are interdependent, in the same way that the shape of the hand and its location are interdependent. But this is not how phones behave, since phones have independent existence and can be articulated separately from other phones.

So, the units of articulation in signed language that we are concerned about are signs (whole independent words) and the individual articulatory properties of a sign (how various articulators are shaped and moved). There seems to be no intermediate signed language unit that corresponds to spoken language phones (however, see Section 3.10 for discussion of syllables as a kind of intermediate organizational unit that both spoken and signed languages seem to have).

Notation of signs: Linguistic units from spoken languages are often given in italics, with the gloss (meaning) in single quotes, while signs from signed languages are often given in all capitals or small capitals. So we would write bed when referring to the English word, lit ‘bed’ when referring to the equivalent French word, and BED or BED when referring to the equivalent sign in ASL.

There is some variation in how to write about signs from languages not connected to the written language used to talk about them. The content of this textbook is presented in written English, so we use English to write signs in ASL, but what should we do for a signed language like langue des signes québécoise (LSQ, Quebec Sign Language), which has no connection to English?

One option is to write the sign in English, so we could write about the ASL sign BED and the LSQ sign BED. Another option is to write the sign using the ambient written language most connected to that signed language, and add a gloss in English. In this case, LSQ has a connection to French, so we could write about the ASL sign BED but the LSQ sign LIT ‘bed’, using the French word lit. Both options have advantages and drawbacks, and you will see both used in the linguistics literature, though the first option is perhaps the most common.

For signed languages, we have two main categories of articulators to analyze the properties of. The manual articulators are the arms, hands, and fingers, which are the primary articulators used for signing (and the source of the articulatory half of the name of the signed language modality, manual-visual). However, most of the rest of the body is also used in signed languages, especially the torso, head, and facial features. All of these other articulators are called the nonmanual articulators or sometimes just nonmanuals.

Manual articulators

The manual articulators move by means of joints, which are points in the body where two or more bones come together to allow for some kind of movement. There are six joint types in the manual articulators: shoulder, elbow, radioulnar joint (or simply radioulnar), wrist, base knuckles, and interphalangeal joints (or simply interphalangeal), arranged as shown in Figure 3.20.

Skelton arms, with the shoulder, eblow, radioulnar, wrist, base knuckles, and interphalangeals labelled.
Figure 3.20. Joints in the manual articulators.

Shoulder articulation

The shoulder rotates inside the shoulder blade, allowing for a wide range of motion for the upper arm, as shown in Figure 3.21. The motion we use for jumping jacks, with the arms making up and down arcs out to the left and right of the torso is called abduction (for the upward/outward direction) and adduction (for the downward/inward direction). The motion for raising and lowering the arm up in front of us is extension (for raising) and flexion (for lowering). Finally, the shoulder can keep the upper arm in a fixed position while changing the position of the forearm through rotation. Movement at the shoulder joint can be any combination of these three kinds of movement.

Figure 3.21. Shoulder movement.

Elbow articulation

The elbow is the joint between the upper arm and forearm, and it has a more restricted range of motion than the shoulder, allowing only flexion (bending to bring the forearm closer to the upper arm) and extension (bending the opposite way), as shown in Figure 3.22. Other kinds of movements at the elbow are heavily restricted or impossible. Note that unlike the shoulder, the elbow cannot typically extend backwards from a hanging position, only from a flexed position.

Figure 3.22. Elbow movement.

Radioulnar articulation

The forearm contains two large bones, the radius (which is on the the thumb side of the arm) and the ulna (on the pinky side). The radius and ulna come together in three different places for three different kinds of movement: at the elbow, at the wrist, and in the middle of the forearm. All three of these points of movement are considered radioulnar joints biologically and have separate names (the superior radioulnar joint at the elbow, the inferior radioulnar joint at the wrist, and the medial radioulnar joint inside the forearm), but in the context of signed language phonetics, we normally only need to talk about one of them, since their movements are connected. By convention, the one we discuss is the medial radioulnar joint. At this radioulnar joint, the radius and ulna pivot around each other, allowing the forearm to rotate, as show in Figure 3.23.

Figure 3.23. Radioulnar movement.

Wrist articulation

The wrist is the joint between the forearm and the hand, and it is almost as mobile as the shoulder, as shown in Figure 3.24, allowing for abduction (sideways towards the thumb), adduction (sideways towards the pinky), extension (bending backwards), and flexion (bending forwards), but no rotation. Note that what we might initially think of as rotation at the wrist is actually due to radioulnar articulation. Like the shoulder, the wrist can typically extend backwards.

Figure 3.24. Wrist movement.

Base knuckle articulation

The base knuckles are the joints where the fingers meet the palm of the hands. Like the wrist, these joints allow for abduction, adduction, extension, and flexion, but no rotation, and like the elbow, the base knuckles cannot typically extend very far from a straightened position. The main movements available for the base knuckles are shown in Figure 3.25. Each base knuckle can generally move independently of the other, those some movements are more difficult than others.

Figure 3.25. Base knuckle movement.

Interphalangeal articulation

The interphalangeal joints are the various joints between the individual bones of the fingers. The thumb has only one interphalangeal joint, while the other four fingers have two interphalangeal joints each. Most humans cannot easily articulate the two interphalangeal joints of the same finger separately, so they are usually analyzed together for the purpose of signed languages. Like the elbow, the interphalangeal joints can only extend and flex, as shown in Figure 3.26, and they typically cannot extend much from a straightened position.

Figure 3.26. Interphalangeal movement.

Describing manual movement

An important aspect of describing manual movement in a sign is being able to identify which joints are moving and what kind of movement they are using to move. This can be quite difficult, because many signed language articulations use multiple joints moving in different ways. In the following discussion, we explore a few example signs from ASL to determine kind of movement is occurring.

First consider the ASL sign SORRY in the following video clip.

One or more interactive elements has been excluded from this version of the text. You can view them online here:

First, note that the hand must raise up to the chest before the sign begins. This movement does not really count as part of the sign. It is similar to how phones are articulated. In order to make an alveolar plosive like [t], the tongue tip must make full contact with the alveolar ridge. The movement to the alveolar ridge is not part of [t] itself, but a necessary bit of incidental movement to get ready to articulate [t]. We can tell that this movement is not part of [t] because of the behaviour or words like ant [ænt]. Since the tongue tip is already on the alveolar ridge for [n], we do not need to move it away and the move it back for [t]. The same is true for the positioning movement for the beginning of SORRY, as well as the final movement to return the hands in the lower position. We only care about the core movement that happens during the sign itself, not the transitional movement into or out of the sign. Sometimes, it may be difficult to determine whether a movement is an incidental transitional movement or not, but most of the time, it should be clear what initial and final movements can be ignored.

Now consider the articulation of the hand. It is shaped into a rigid fist, which requires flexing the base knuckles and the interphalangeal joints (except for the thumb, which is extended). However, this is a fixed configuration, not a movement, so we can ignore those joints. The actual movement is the hand tracing a small circle on the chest. There seems to be no significant movement at the wrist or radioulnar joints either, since the entire forearms down through the hand and fingers all act as a single fixed unit. Thus, we can also ignore these two joints as well.

That leaves the elbow and shoulder as our joints of interest for manual movement. Looking carefully at the elbow, we see that it flexes and extends slightly during the circle, causing a change in the angle between the upper arm and forearm. In addition, the elbow itself also changes position in space, moving a bit out to the signer’s right side and back again. This cannot be due to elbow movement, since joint movement cannot change the position of a joint itself, but rather, the position of the other body parts it is connected to. Thus, there must also be some other movement elsewhere, and the only joint we have left is the shoulder. The relevant shoulder movement appears to be a small amount of abduction and adduction, perhaps combined with very slight flexion and extension as well.

So, we would say that SORRY in ASL has elbow and shoulder movement, and if we need to be more precise, we would say that there is repeated flexion and extension of the elbow, and repeated abduction and adduction of the shoulder, and perhaps also some amount of repeated flexion and extension of the shoulder.

Next, consider the ASL sign APPLE in the following video clip.

One or more interactive elements has been excluded from this version of the text. You can view them online here:

Again, we must ignore the transitional movements into and out of the sign, and as with SORRY, we see that the hand in APPLE is in a fixed shape, this time with the index finger extended at the base knuckle but flexed at the interphalangeal joints, while all of the other fingers are closed loosely together with flexed base knuckles and interphalangeal joints.

For APPLE, there is movement of the forearm in the form of rotation. We know that the wrist cannot rotate, so this must be due to radioulnar rotation. There is no other movement, so we can ignore the elbow and shoulder joints.

Thus, for APPLE in ASL, we would say that it has radioulnar movement only, and more precisely, that it has repeated radioulnar rotation.

Finally, consider the ASL sign CHOOSE in the following video clip.

One or more interactive elements has been excluded from this version of the text. You can view them online here:

For CHOOSE, the initial transitional movement almost looks like it could be part of the sign, with the hand raising, then flicking backward, as all part of one motion. In this particular case, we can ignore this initial raising, but in general, and it can be difficult to know whether to ignore it or not.

The core movements that we are concerned with here are the movements of the fingers and the backward wrist flick. For the finger movement, we see that the index and thumb come together in a pinching motion. This requires flexion of the base knuckles, and perhaps a very small amount of interphalangeal flexion. The backwards wrist flick is articulated by extending the wrist backward. There is no radioulnar twisting and no notable elbow or shoulder movement.

Thus, for CHOOSE in ASL, we would say that it has base and radioulnar movement, and perhaps some minor interphalangeal movement, and more precisely, that it has non-repeated base (and maybe interphalangeal) flexion and wrist extension.

Nonmanual articulators

The rest of the body, the nonmanual articulators, especially the torso and the parts of the face, have complex and varied movement, such as eye gaze direction, eyelid aperture, eyebrow raising or lowering, torso leaning or rotation, head tilting or rotation, cheek puffing, lip rounding or spreading, teeth baring, etc. Nearly any other body part can be a nonmanual articulator, including the feet and buttocks in some signed languages, such as Adamorobe Sign Language in Ghana (Nyst 2007) and Kata Kolok in Indonesia (Marsaja 2008).

If you look back at the ASL sign SORRY, you should notice some of these nonmanual articulations. The signer furrows his brow, pushes his lips together, slightly puffs his cheeks, and gives a slow head shake. All of these nonmanual movements are part of the sign. For any given sign, the nonmanual articulations may not all be necessary to understand the sign, but they are still part of its articulation.

Nonmanual articulation is beyond the scope of an introductory textbook like this, but it plays a crucial role in signed languages and cannot be ignored in a full analysis of signed languages. This is one of the drawbacks for tools like “signed language gloves”, such as those that regularly pop up in popular media (a typical example is presented in Chin 2020). Since these gloves only capture some aspects of manual articulation, but no nonmanual articulation at all, they cannot fully translate signed languages. See Hill 2020 for further discussion of this issue, in particular, the need for creators to involve deaf people when designing signed language technology, to ensure that the technology is actually useful to deaf people.

Check your understanding

Coming soon!


Chin, Matthew. 2020. Wearable-tech glove translates sign language into speech in real time. UCLA Newsroom.

Hill, Joseph. 2020. Do deaf communities actually want sign language gloves? Nature Electronics 3(9): 512–513.

Marsaja, I. Gede. 2008. Desa Kolok — A deaf village and its sign language in Bali, Indonesia. Nijmegen: Ishara Press.

Nyst, Victoria. 2007. A descriptive analysis of Adamorobe Sign Language (Ghana). University of Amsterdam PhD dissertation.

3.8 Describing signs


Signed language parameters

Many of the possible types of manual articulations occur so frequently in certain combination that we can describe them in more efficient ways. We begin by dividing the various manual articulations into four main categories, often referred to collectively as parameters (also called primes):


The static configuration of the base knuckles and interphalangeal joints in a sign is called its handshape. A handshape can be identified by which fingers have which base knuckles and/or which interphalangeal joints flexed and by how much, as well as whether there is any abduction or adduction of the base knuckles. For example, we could describe the handshapes in Figure 3.27 with the given descriptions.

Figure 3.27. Various handshapes and their detailed descriptions of base knuckles and interphalangeal articulation.

However, these kinds of prose descriptions are a bit cumbersome, just like constantly writing “voiceless postalveolar fricative” for the phone [ʃ]. When possible, handshapes are often graphically depicted with iconic pictures (like those on the left in Figure 3.27). However, these are not always easily available in certain media. There are many possible solutions.

Since many handshapes are used to represent numbers or letters from the writing system of the ambient spoken language, a common solution is to use numbers and letters as convenient shorthand labels, somewhat like the IPA. Thus, since the first handshape Figure 3.27 is used to represent FIVE in ASL, we could call this handshape the 5 handshape. Similarly, the second handshape in Figure 3.27 is used to represent the English letter <S> in ASL, so it can be called the S handshape.

However, this system has to be used with caution. The S handshape is used in ASL to represent the English letter <S>, but in Swedish Sign Language (Svenskt teckenspråk, STS), the same handshape represents the Swedish letter <G>. In a discussion in English about STS, the term S handshape could be confusing if it has not been made clear which handshape it refers to. This is not an isolated problem. There are many other mismatches between these two languages, as shown in Figure 3.28, and of course, there are similar mismatches across signed languages generally.

Figure 3.28. Differences between ASL and STS in handshape meanings.

This issue is further complicated by the fact that writing systems do not all use the same characters. For example, it would not even make sense to talk about an “R handshape” for Russian Sign Language, because there is no letter <R> in the Russian alphabet. There is a Russian letter that represents a rhotic, but that letter is <Р>. This letter is represented in Russian Sign Language by the handshape in Figure 3.29, which is the handshape used for EIGHT in ASL! So when discussing many different signed languages, this way of describing handshapes can be confusing if not carefully defined.

Figure 3.29. The handshape for <Р> in Russian Sign Language and EIGHT in ASL.

However, even just for a single signed language, there are far more handshapes than are used for numbers and letters. For example, ASL makes frequent use of the first two handshapes in Figure 3.30, but these are not used to represent any number or English letter, although they are similar to the third handshape in Figure 3.30, which is used to represent the English letter <B>.

Figure 3.30. Similar handshapes.

For these situations, certain descriptors (flat, open, bent, closed, etc.) can be used to describe slight differences between similar handshapes, as in the examples from ASL in Figure 3.31.

Figure 3.31. Various descriptions for related handshapes.

Because of the problems with this notation and terminology, we will use images where necessary, but we will also use the names typically used for describing handshapes in ASL. Just remember that these names are not universal and would not normally be appropriate when describing handshapes in other languages.

Sometimes, a distinction is made between two types of handshapes. Some of these are unmarked handshapes (Battison 1978), which tend to be the most common handshapes, both across signed languages and within a single language, as well as the earliest ones acquired by children. They are typically the easiest handshapes to both articulate and visually distinguish from each other. They are often used as default or substitute handshapes in certain circumstances. More recent work argues that the set of unmarked handshapes is probably smaller than previously thought, perhaps containing only those in Figure 3.32 (Henner et al. 2013), 

Figure 3.32. Unmarked handshapes.

Marked handshapes are all other possible configurations of the fingers. These are not universal, so a particular marked handshape may be used in some languages but not others, and when it is used, it may be common in some languages and rare in others.


The other four joints (wrist, radioulnar, elbow, and shoulder) can be used to change the orientation of the hand, so that it can face different directions while maintaining the same handshape. For example, the following diagrams show four different orientations of the flat-B handshape for the right hand as seen by the signer.

Figure 3.33. Different orientations of the hand.

The orientation of the hand is divided into two components: the palm orientation (which way the palm is facing) and the finger orientation (the direction the bones inside the hand are pointing, which is where the fingers would point when straightened). Note that if the fingers are bent, this may be a bit confusing, but you can determine orientation by straightening out the fingers to see where the fingers point. Remember that handshape depends only on the base knuckles and interphalangeals, while orientation depends only on the other four joints. Bending at the base knuckles to change handshape will not affect orientation.

The orientation of either the palm or the fingers may be absolute (up, down), relative to the signer’s position (in, out, left, right), or relative to a particular body part that (face, other hand, etc.). Compare the ASL signs in the following video clips, first YOUR, then THANK-YOU, and finally BED. All three of these signs use the same open-B handshape, but they have different orientations. For YOUR, the palm is oriented out from the signer, and the fingers are oriented up:

One or more interactive elements has been excluded from this version of the text. You can view them online here:

For THANK-YOU, the palm is oriented in towards to the signer, and the fingers are oriented diagonally up and to the signer’s left:

One or more interactive elements has been excluded from this version of the text. You can view them online here:

And for BED, the palm is oriented in towards the signer’s left cheek, and the fingers are oriented up along the side of the signer’s face:

One or more interactive elements has been excluded from this version of the text. You can view them online here:

Since the head is tilted in BED, the absolute orientation of the palm and fingers in overall space is diagonal, but in signs like this, what matters is the relative orientation with respect to a specific body part, since the absolute orientation would depend on just how much the head is tilted.


The location of a sign is where in space or on the body it is articulated. Signs can be articulated in a variety of locations. The default location is neutral signing space, the area just in front of the signer’s torso (as in ASL YOUR), but locations can be nearly anywhere on the body. They tend to be around some specific part of the head (like the chin in ASL THANK-YOU or the side of the face in ASL BED), but other body parts are also possible locations. For example, the chest is the location for the ASL sign MY in the following video clip:

One or more interactive elements has been excluded from this version of the text. You can view them online here:

The side of the forehead is the location for the ASL sign KNOW in the following video clip:

One or more interactive elements has been excluded from this version of the text. You can view them online here:

And the left hand is used as the location for the ASL sign WARN in the following video clip:

One or more interactive elements has been excluded from this version of the text. You can view them online here:

It is very rare for signs to have a location below the waist or behind the body, but it is possible in some signed languages (Nyst 2012).


All signs have a handshape, an orientation, and a location, and many signs also have movement, which is divided into two types. path and local movement. Path movement involves articulation at the elbow and/or shoulder, as in THANK-YOU above, which starts off on the chin but moves outward into neutral signing space by extending the elbow. Local movement involves articulation at the radioulnar joint, wrist, base knuckles, and/or interphalangeal joints, as in WARN above, in which the signer’s right hand taps the back of the left hand twice by flexing and extending the wrist.

For one-handed signs, like most of those shown in this chapter so far, signers typically use their dominant hand, which is the hand used more many ordinary daily activities like writing and brushing teeth. The signer in these videos uses his right hand. The other hand is called the nondominant hand. In two-handed signs, both the dominant and nondominant hands are used. Usually, they are used equally (for example, both moving at the same time in the same way). However, in many two-handed signs, like ASL WARN, only the dominant hand moves, while the nondominant hand remains still, usually acting as the location for the sign.

Note that the base knuckles and interphalangeal joints are used for handshape, so any movement at those joints can also change the handshape of a sign. Similarly, any movement of the other four joints can change the sign’s orientation or location. Thus, movement is intertwined with the other three parameters, so it can sometimes be difficult to single out how to analyze a given movement.

For example, in ASL THANK-YOU, there is both outward movement due to elbow extension and a change in location from chin to neutral signing space. Normally, we describe this by giving the starting location as the actual location, and describe the ending location as part of the movement. Thus, we would say that the location of THANK-YOU is the chin, and that it has an outward path movement to neutral signing space.

Minimal pairs

Phones are often distinguished by just a single articulatory property. For example,  [p] and [k] differ only in place, while [t] and [d] differ only in phonation. Similarly, two signs may be distinguished in just one parameter. Signs which differ in just one parameter are called minimal pairs,. An example of a minimal pair in ASL is SORRY and PLEASE, as shown in the video clips below. Note for each how they have roughly the same orientation, location, and movement, but different handshapes. In SORRY, the open-A handshape is used:

One or more interactive elements has been excluded from this version of the text. You can view them online here:

While in PLEASE, the open-B handshape is used:


Similarly, minimal pairs also exist for differences in orientation only, such as PROOF versus STOP in ASL, as shown in the following video clips. Note that there is also a slight difference in movement due to the dominant hand bouncing back in PROOF, but otherwise, the two signs have the same handshape, location, and movement. In PROOF, the dominant hand is oriented with the palm facing up:

One or more interactive elements has been excluded from this version of the text. You can view them online here:

But in STOP, the dominant palm is facing to the signer’s left:

One or more interactive elements has been excluded from this version of the text. You can view them online here:

Location can also be a distinguishing factor in minimal pairs, such as APPLE versus ONION in ASL, as shown in the following video clips. These two signs have the same handshape, orientation, and movement, but they are articulated at different locations. APPLE is articulated on the cheek:

One or more interactive elements has been excluded from this version of the text. You can view them online here:

While ONION is articulated on the side of the head near the eye:

One or more interactive elements has been excluded from this version of the text. You can view them online here:

And finally, minimal pairs also exist for movement, such as THINK versus WONDER in ASL, as shown in the following video clips. These two signs have the same handshape, orientation, and location, but they differ in how the hand moves. For THINK, the hand moves in toward the head:

One or more interactive elements has been excluded from this version of the text. You can view them online here:

But in WONDER, the hand instead traces a circle in the same location:

One or more interactive elements has been excluded from this version of the text. You can view them online here:

Check your understanding

Coming soon!


Battison, Robbin. 1978. Lexical borrowing in American Sign Language. Silver Spring, MD: Linstok Press.

Henner, Jonathan, Leah Geer, and Diane Lillo-Martin. 2013. Calculating frequency of occurrence of ASL handshapes. LSA Annual Meeting Extended Abstracts 4(16): 1–4.

Nyst, Victoria. 2012. Shared sign languages. In Sign language: An international handbook, ed. Roland Pfau, Markus Steinbach, and Bencie Woll, 552–574. De Gruyter Mouton.

3.9 Signed language notation


There is no commonly accepted equivalent of the IPA for transcribing signs. The most historically significant sign notation system was developed by William Stokoe (1960, 1965), whose work is also notable for having demonstrated that signed languages have the same kinds of linguistic structures that spoken languages do, effectively kickstarting the entire field of signed language linguistics.

Stokoe’s system of symbols, popularly called Stokoe notation, divides signs based on the four parameters discussed in Section 3.8, though he originally considered orientation to be a subcomponent of handshape. Battison (1978) argued that orientation should be its own distinct parameter, and this has become the standard analysis of the internal structure of signs, with nonmanual articulators sometimes considered a fifth parameter.

The Stokoe notation for a basic one-handed sign has the structure LHOM where L is the symbol for the location, H is the symbol for the handshape, O is a subscripted symbol for the orientation, and M is a superscripted symbol for the movement. Various other marks and symbols can be used to indicate more complex signs (two-handed, changing handshapes, compounds, etc.). A partial list of Stokoe’s original symbols is given in Figure 3.34.

Figure 3.34. Sample symbols from Stoke notation.

For example, the ASL sign THANK-YOU could be notated as in Figure 3.35, with the curved shape indicating the chin as the location, B with a dot over indicating the B handshape with the thumb extended, the downward tick mark indicating a palm orientation toward the signer, and an upward tick mark indicated movement away from the signer.

Figure 3.35. One possible use of Stokoe notation for ASL THANK-YOU.

Stokoe notation was designed for ASL, so it is not suitable for signed languages generally, but the concept of dividing signs into parameters that underlies this system has influenced all remotely successful subsequent notation systems, such as SignWriting (Sutton 1981, 1990) and Hamburg Notation System (HamNoSys) (Prillwitz and Schulmeister 1987, Prillwitz et al. 1987, 1989). These two systems have more iconic symbols than Stokoe notation (making them somewhat easier to understand), and they have no inherent ties to ASL (making them more universally applicable).

For example, the SignWriting and HamNoSys symbols in Figure 3.36 both represent the same handshape, also shown in Figure 3.36.

Figure 3.36. Notation from SignWriting (left) and HamNoSys (centre) for the U handshape (right).

Note how both the SignWriting and HamNoSys symbols iconically represent the extension of the middle and index fingers, as lines pointing out from the closed fist (represented as a square in SignWriting and an oval in HamNoSys). HamNoSys also shows the extra detail of the thumb crossing over the palm, while SignWriting shows the difference in length between the middle and index fingers.

This handshape is notated by U in Stokoe notation, because this handshape is used to represent the English letter <U> in ASL. This relationship would not transfer to other signed languages, such as Jordanian Sign Language, in which the same handshape is used for the Arabic letter <ت> (Hendriks 2008), which represents the phone [t]. This clearly has no relationship to the English letter <U>, so the more iconic SignWriting or HamNoSys symbols would be more meaningful notation than Stokoe notation for representing this handshape when describing languages like Jordanian Sign Language.

Other systems have been constructed (see Hochgesang 2014 for an overview), but no single consistent standard has emerged, and the systems that do exist can be difficult to work with (many require special symbols not found in Unicode, for example), making the study of signed language phonetics and phonology more difficult.

Check your understanding

Coming soon!


Battison, Robbin. 1978. Lexical borrowing in American Sign Language. Silver Spring, MD: Linstok Press.

Hendriks, Bernadet. 2008. Jordanian Sign Language: Aspects of grammar from a cross-linguistic perspective. Doctoral dissertation, University of Amsterdam, Amsterdam

Hochgesang, Julie A. 2014. Using design principles to consider representation of the hand in some notation systems. Sign Language Studies 14(4): 488–542.

Prillwitz, Sigmund, Regina Leven, Heiko Zienert, Thomas Hanke, and Jan Henning. 1987. HamNoSys: Hamburg Notation System for sign languages: An introduction. Hamburg: Zentrum für Deutsche Gebärdensprache.

Prillwitz, Sigmund, Regina Leven, Heiko Zienert, Thomas Hanke, and Jan Henning. 1989. HamNoSys version 2.0: Hamburg Notation System for sign languages: An introductory guide, International Studies on Sign Language and Communication of the Deaf, vol. 5. Hamburg: Signum.

Prillwitz, Sigmund and Rolf Schulmeister. 1987. Entwicklung eines computergesteuerten Gebärdenlexikons mit bewegten Bildern. Das Zeichen 1(1): 52–57.

Stokoe, William C. 1960. Sign language structure: An outline of the visual communication systems of the American Deaf. No 8 in Studies in Linguistics, Occasional Papers. Buffalo, NY: University of Buffalo.

Stokoe, William C., Dorothy C. Casterline, and Carl G. Croneberg. 1965. A dictionary of American Sign Language on linguistic principles. Silver Spring, MD: Linstok Press.

Sutton, Valerie. 1981. Sign Writing for everyday use. Boston: Sutton Movement Writing Press.

Sutton, Valerie. 1990. Lessons in SignWriting. La Jolla, CA: SignWriting Press.

3.10 Syllables


Spoken language syllables

While spoken language words can be decomposed into phones, there seem to be higher layers of structure that are relevant to how spoken languages function. One such layer is made up of units called syllables. Thus, words can contain multiple syllables, and each syllable can contain multiple phones. Of course, some words may have only one syllable, such as the English words [bæt] bat and [prɪnts] prints, and some syllables may have only one phone, such as the English words [o] owe and [ɔ] awe.

As a unit of structure, syllables are often abbreviated with the Greek letter sigma σ, and within a transcription, the boundaries between syllables are notated with the IPA symbol [.], as in the transcription [kæ.nə.də] Canada. Note that [.] is only needed between syllables; nothing extra is needed to mark the beginning of the first syllable or the end of the last syllable.

The most prominent position within a syllable is called the nucleus (abbreviated here as Nuc), which is usually filled by a vowel in most languages. However, some languages allow syllabic consonants in the nucleus, as in English [br̩d] bird, [bɒ.tl̩] bottle, and [bɒ.tm̩] bottom. Some languages make more extensive use of syllabic consonants, such as Tashlhiyt Berber (a.k.a. Shilha, a Northern Berber language of the Afro-Asiatic family, spoken in Morocco), which allows syllabic sonorants (fairly typical in the world’s languages) as well as syllabic obstruents (quite rare in the world’s languages), as in the words [tʁ̩.fl̩] ‘she surprised’, [ts̩.kr̩] ‘she did’, [tb̩.dɡ̩] ‘it was wet’, and [tk̩.ti] ‘she remembered (Ridouane 2014).

The remaining phones in the syllable (if any) make up the margins: the onset (Ons) on the left of the nucleus and the coda (Cod) on the right. The margins of the syllable can each be empty, or they may contain one or more consonant phones. A margin with only one phone is called simple, and a margin with two or more phones is called complex.

Thus, in the English word [ə.prot͡ʃ] approach, the first syllable [ə] has no onset or coda, while the second syllable [prot͡ʃ] has a complex onset [pr] and a simple coda [t͡ʃ] (recall that an affricate counts as a single phone). Syllable structure is often shown graphically in a tree diagram, as in Figure 3.37, with each syllable having its own σ node, connected down to the next level of onsets, nuclei, and codas, which are in turn connected down to the level of the phones that they each contain. Sometimes, the word level is also shown explicitly above the syllables, abbreviated here as Wd.

Figure 3.37. Syllable structure for the English word approach.

The standard analysis of syllables is that every syllable must have a nucleus, which always contains at least one phone. Though affricates count as a single phone in margins, diphthongs usually count as two phones, but the details of how to treat such complex phones depend on the language and the assumptions underlying the analysis.

Note that while speakers often have consistent intuitions about how many syllables a word has and where the boundaries are, the physical reality of their speech does not always match these intuitions. For example, some English speakers claim that the word hire has one syllable [haɪr], while higher has two [haɪ.r̩], and yet, when these speakers hear recorded samples of their own pronunciation of these two words, they often cannot reliably distinguish one from the other. Many others think both words have one syllable or both have two syllables. There are lots of similar English words with this murky behaviour, mostly words with a diphthong followed by an approximant: [aʊr]/[paʊr] hour/power, [aʊl]/[taʊl] owl/towel, [vaɪl] vile/vial, etc.

Because of these and other issues, syllables have a somewhat questionable status. It seems that they are more abstract and conceptual rather than concrete and physical. They are primarily a way for speakers to organize phones into useful linguistic units, which may not necessarily have a consistent measurable impact on the actual pronunciation.

Syllable structure can be notated in plain text with CV-notation, with one C for each phone in the margins and one V for each phone in the nucleus (note that V is used in the nucleus even if it represents a syllabic consonant). Thus, the syllable structure of [ə.prot͡ʃ] could be represented as V.CCVC rather than in a full tree diagram.

A syllable with no coda, such as a CV or V syllable, like English [si] see and [o] owe, is often referred to as an open syllable, while a syllable with a coda, such as CVC or VC, like English [hæt] hat and [it] eat, is a closed syllable. A syllable with no onset, such as V or VC, like English [o] owe and [it] eat, is simply called onsetless. There is no special term for a syllable with an onset.

Crosslinguistic patterns in spoken language syllable types

Spoken languages generally prefer onsets and disprefer codas. This means that it is common for languages to require onsets, but it seems like there are no languages that require codas. Conversely, it is common for languages to prohibit codas, but there are no languages that prohibit onsets. These possibilities can be notated using parentheses to show what is allowed but not required. So we find languages whose syllables are all of the type CV(C), that is, they have a required onset and required nucleus, but an optional coda. However, there seem to be no reverse languages whose syllables can all be classified as (C)VC, with an optional onset, but a required nucleus and coda.

In addition, spoken languages generally prefer simple margins to complex margins. Thus, in languages that allow codas, some allow only simple codas and prohibit complex codas; if a language allows complex codas, it allows simple codas. Similarly for onsets: some languages prohibit complex onsets, and if a language allows complex onsets, it allows simple onsets.

Finally, there seems to be no strong relationships between complex onsets and complex codas: some languages allow complex onsets, some allow complex codas, and some allow both. All together, these trends give us a range of possible languages based on what kinds of syllable structures they allow and prohibit.

Syllabification and sonority

The association of phones to appropriate positions in syllable structure is called syllabification. Syllabification is often based at least partially on the sonority of the phones, which is an abstract measure of their relative prominence that corresponds roughly (but not exactly) to loudness. A sonority hierarchy is an ordering of phones by their sonority. Vowels are at the top of scale as the most sonorous phones, which is why they can occupy the privileged nucleus position in a syllable, while obstruents are at the bottom of the scale as the least sonorous, so they are typically relegated to the margins of a syllable.

There are some crosslinguistic patterns in sonority, but languages can differ in how they categorize some phones by sonority, so there is no one true universal sonority hierarchy. Some languages may distinguish plosives from fricatives by sonority, or voiceless from voiced obstruents, or nasals from liquids, and some languages may even have categories reversed from other languages.

Based on a language’s own sonority hierarchy, its syllables usually obey the sonority sequencing principle (SSP), which requires sonority to rise through the onset of a syllable, hit its peak in the nucleus, and then fall through the coda. Thus, the English syllable [plænt] plant is a well-formed syllable according to the SSP, because obstruents have the lowest sonority in English, followed by nasal stops, followed by other sonorants, followed by vowels at the top of the sonority hierarchy. Reversing the segments in the onset and coda, to create the attempted syllable *[lpætn], violates the SSP, because the onset has falling sonority rather than rising, and the coda has rising sonority rather than falling. The difference in sonority between these two words is graphed in Figure 3.38.

Figure 3.38. Representations of sonority patterns in the English word [plænt] and the attempted English word *[lpætn].

However, the SSP is not absolute. Many languages allow portions of a syllable to have a sonority plateau (when two adjacent segments have the same sonority, as in English [ækt] act, with two voiceless plosives in the coda), and some may have even looser syllable structure, allowing one or more sonority reversals, as in Georgian [ɡvphrt͡skhvni] გვფრცქვნი ‘you (singular) are peeling us’.

Signed language syllables

As discussed at the beginning of Section 3.8, signs do not seem to have comparable units to spoken language phones. However, many researchers have proposed that signs can be decomposed into syllable-like structures. It is important to note that the actual structure of signs and signed languages is not derived from spoken languages. Thus, whatever parallels or analogies we might find between the two modalities are incidental, or perhaps derived from some deeper, more abstract cognitive principles of linguistic organization. Crucially, we cannot just directly import the theories and structures of spoken languages into the analysis of signed languages. We have to take into account the differences in modality.

A common analysis of the internal structure of signs is to treat them as sequences of two types of units: static states (sometimes called holds, positions, or postures, roughly equivalent to a combination of the location and orientation parameters) and dynamic states (essentially the movement parameter) (Liddell 1984, Liddell and Johnson 1986, 1989, Johnson and Liddell 2010, Sandler 1986, 1989, 1993, Perlmutter 1992, van der Hulst 1993), with handshape often being a relatively stable property over an entire syllable (Mandel 1981). The exact nature and composition of these units varies from model to model, but they generally share the same basic division between some type of static unit and some type of dynamic unit.

Many linguists additionally argue that the dynamic units are more sonorous than the static units (Brentari 1990, Corina 1990, Perlmutter 1992, Sandler 1993). In this view, the less sonorous static units are like syllable margins (and thus, comparable to consonants), while the more sonorous dynamic units are like syllable nuclei (and thus, comparable to vowels).

However, there is a lot of disagreement about what kind of syllabic model (if any) is appropriate for the analysis of signed languages. Linguists might just be trying too hard to make signed languages fit their understanding of spoken languages, or there could be something underlying that does result in syllables as a natural organizational unit in both modalities. This is still a rich and open area of study in linguistics.

Check your understanding

Coming soon!


Brentari, Diane. 1990. Theoretical foundations of American Sign Language phonology. Doctoral dissertation, University of Chicago, Chicago.

Corina, David P. 1990. Reassessing the role of sonority in syllable structure: Evidence from visual gestural language. In CLS 26-II: Papers from the parasession on the syllable in phonetics and phonology, ed. Michael Ziolkowski, Manuela Noske, and Karen Deaton, 33–43. Chicago: Chicago Linguistic Society.

Johnson, Robert E. and Scott K. Liddell. 2010. Toward a phonetic representation of signs: Sequentiality and contrast. Sign Language Studies 11(2): 241–274.

Liddell, Scott K. 1984. THINK and BELIEVE: Sequentiality in American Sign Language. Language 60(2): 372–399.

Liddell, Scott K. and Robert E. Johnson. 1986. American Sign Language compound formation processes, lexicalization, and phonological remnants. Natural Language & Linguistic Theory 4(4): 445—513.

Liddell, Scott K. and Robert E. Johnson. 1989. American Sign Language: The phonological base. Sign Language Studies 64: 195–278.

Mandel, Mark. 1981. Phonotactics and morphophonology in American Sign Language. Doctoral dissertation, University of California, Berkeley.

Perlmutter, David M. 1992. Sonority and syllable structure in American Sign Language. Linguistic Inquiry 23(3): 407–422.

Ridouane, Rachid. 2014. Tashlhiyt Berber. Journal of the International Phonetic Association 44(2): 207–221.

Sandler, Wendy. 1986. The spreading hand autosegment of American Sign Language. Sign Language Studies 50(1): 1–28.

Sandler, Wendy. 1989. Phonological representation of the sign: Linearity and nonlinearity in American Sign Language. No 32 in Publications in Language Sciences. Dordrecht: Foris.

Sandler, Wendy. 1993. A sonority cycle in American Sign Language. Phonology 10(2): 243–279.

van der Hulst, Harry. 1993. Units in the analysis of signs. Phonology 10(2): 209–241.

3.11 Stress


The phonetics of stressed syllables

As an organization unit, syllables play a role in the overall rhythm and flow of language, especially by having some syllables be stressed, which gives them more prominence in the linguistic signal. In spoken languages, stressed syllables are often articulated with some combination of increased loudness, longer duration, and/or higher pitch. To the extent that signed languages have syllables, they also seem to have stressed syllables, which are typically articulated with greater muscular tension, quicker movements, and/or longer holds (Supalla and Newport 1978, Klima and Bellugi 1979). However, languages can vary quite a lot in exactly which phonetic properties are used for stressed syllables.

Spoken languages are sometimes classified based on whether they are “stress-timed” or “syllable-timed”, which means that roughly the same amount of time passes between stresses or between syllables, respectively. Despite widespread belief in this classification among non-linguists, it does not in fact appear to have any phonetic validity, so it is best avoided.

Degrees of stress

Stress in signed languages is still under-researched, but since signs typically only have one or two syllables, there is not much room for complex stress patterns in signed languages. However, in spoken languages, words can easily have many syllables, such as the English word internationalization, which has eight syllables, or the German word Kraftfahrzeughaftpflichtversicherung ‘motor vehicle indemnity insurance’, which clocks in at nine syllables. Even with just three or four syllables, there is room for multiple degrees of stress within a single word. In most spoken languages, usually only one syllable per word has the highest degree of stress, which is called primary stress and is marked in the IPA with a preceding upper tick mark [ˈ].

All other stressed syllables can be said to have secondary stress, which marked in the IPA with a preceding lower tick mark [ˌ] (note that this is distinct from the syllabic diacritic [ˌ] which always goes under a symbol, whereas the secondary stress mark [ˌ] goes before a symbol). The remaining syllables are unstressed, which has no dedicated IPA symbol.

We can see all three levels of stress in the word [ˈbʌ.niˌhʌɡ] bunny hug, which is used in Saskatchewan to refer to a hoodie. Note that the stress marks are used at syllable boundaries, so no [.] is needed to mark a syllable boundary in a position where [ˈ] or [ˌ] are used.

Stress is commonly marked instead with non-IPA diacritics, with accent marks over the nucleus (or over σ when discussing stress patterns across syllables generally): acute [ˊ] for primary and grave [ˋ] for secondary, and sometimes also breve [ ̆] for unstressed, if it needs to be explicitly marked. Using this system, bunny hug could be transcribed as [bʌ́nihʌɡ] or [bʌ́nĭhʌɡ]. However, since these diacritics have other uses in the IPA (see Section 3.12), they must be used carefully to avoid ambiguity.

Lexical versus predictable stress

Many spoken languages have lexical stress, which means that the placement of stress is mostly unpredictable and must be memorized for each word. This can create minimal pairs, such as [ˈtɑɾu] ‘fast runner’ versus [tɑˈɾu] ‘batter’ and [ˈbɛɫu] ‘basket’ versus [bɛˈɫu] ‘flute’ in Khowar, a Dardic language of the Indo-European family, spoken in Pakistan (Liljegren and Khan 2017).

In other spoken languages, stress is fully predictable based on the structure of the syllables in a word, so that two words with the same syllable structures but different phones would also have the same stress pattern. In such languages, the rules governing stress assignment can be quite complicated, and a full analysis is beyond the scope of this textbook. However, there are a few broad patterns for spoken languages with predictable stress.

First, although most words usually have one and only one primary stress, short function words like prepositions or conjunctions might normally only be unstressed, while compound words might have multiple syllables with roughly equal stress.

Second, primary stress is nearly always on one of the first two or the last two syllables in a word. Stress on the first syllable is called initial stress, stress on the second syllable is called peninitial stress, stress on the final syllable is called ultimate stress, and stress on the second syllable from the end is called penultimate stress. In some languages, primary stress may even be antepenultimate, on the third syllable from the end, but interestingly, we do not find the equivalent of third stress from the beginning, suggesting that there is something special about the behaviour of the end of the word versus the beginning with respect to whatever role stress plays.

Finally, secondary stress in longer words often occurs in a regular rhythm, skipping every other syllable, so that stressed syllables and unstressed generally alternate with each other. However, there is a great deal of complexity across the world’s spoken languages in how secondary stress is assigned.

But despite all the complexity, there are still some consistent generalizations. We do not seem to find spoken languages that have primary stress on, say, the middle syllable or the fifth syllable of the word, or languages that consistently alternate two stressed syllables with two unstressed syllables throughout every word. This suggests that there are deeper underlying principles that govern how stress is assigned, perhaps relating to the purpose of stress.

For example, stress might help with processing of the linguistic signal, so it should be relatively regular (to be more easily recognizable) and anchored to the boundaries of words (which are otherwise hard to determine in running conversation). In other words, not all computationally imaginable stress patterns are possible. Instead, there seems to be a small set of very specific restrictions on how stress works.

Check your understanding

Coming soon!


Klima, Edward S. and Ursula Bellugi. 1979. The signs of language. Cambridge, MA: Harvard University Press.

Liljegren, Henrik and Afsar Ali Khan. 2017. Khowar. Journal of the International Phonetic Association 47(2): 219–229.

Supalla, Ted, and Elissa Newport. 1978. How many seats in a chair? The derivation of nouns and verbs in American Sign Language. In Understanding language through sign language research. Perspectives in Neurolinguistics and Psycholinguistics. Edited by Patricia Siple. New York: Academic Press. 91–133.

3.12 Tone and intonation



During voicing, the rate of vocal fold vibration can be manipulated. This property is normally called the fundamental frequency (typically abbreviated F0) when talking specifically about the actual physical vibration rate and pitch when talking about the auditory perception of that vibration. For the purposes of this discussion, we will use pitch, since we are usually more concerned with the more abstract, cognitive categorization rather than the actual physical implementation, which can vary quite a bit from speaker to speaker.

Pitch is often intertwined with duration and intensity in stress systems, but it can also be manipulated separately as part of its own system. Roughly speaking, if pitch is manipulated at the level of the word or syllable to make completely different meanings, it is called tone, whereas if it is manipulated above the level of the word (phrases and sentences) to make different kinds of sentences (statements versus questions, for example), it is called intonation. There are some problematic cases that are not easily classified this way, but this is a useful basic distinction.

Tone notation

Many languages distinguish just two tones, normally identified as a high tone (H) and a low tone (L). The IPA has two different systems for notating tone: tone diacritics placed on the relevant phone and separate tone letters placed after the entire syllable.

For languages with simple tone systems, the tone diacritics are normally used, with the acute [ˊ] representing a high tone and the grave [ˋ] representing a low tone. Tone letters iconically represent the height of the tone with a horizontal line connected to a vertical base, with [˥] representing a high tone and [˩] representing a low tone.

In addition, non-IPA superscript numbers on a 1–5 scale are sometimes used instead, with the highest number [⁵] representing a high tone and the lowest number [¹] representing a low tone.

All three of these notation systems are shown in Table 3.2 for the words [lúk] ‘vomit’ and [lùk] ‘weave’ from Bemba, a southern Bantoid language of the Niger-Congo family, spoken in Zambia and nearby areas (Hamann and Kula 2015).

Table 3.2. Tone patterns in one-syllable Bemba words.
tone example with IPA tone diacritics example with IPA tone letters example with non-IPA tone numbers gloss
H [lúk] [luk˥] [luk⁵] ‘vomit’
L [lùk] [luk˩] [luk¹] ‘weave’

The choice of notation depends on a combination of factors, including legibility, the complexity of the language’s tone system, the intended purpose of the transcription, and historical tradition. Recall from Section 3.11 that [ˊ] and [ˋ] are also sometimes used to represent primary and secondary stress, so it is important to be clear exactly what is intended when using these diacritics.

The tone numbers can also be problematic, since there are many traditional tone numbering systems that differ from the system presented here. For example, the high tone in Mandarin is traditionally called “tone 1”, and this numbering is used in some romanizations of Chinese, such as the Wade-Giles system, in which 媽/妈 [ma⁵] ‘mother’ is written ma¹ or ma1.

Diacritics can also be problematic for similar reasons, since [má] ‘mother’ is written mā in pinyin, with a different diacritic.

Tone letters are often more reliably unambiguous in meaning, since they are not normally used with any other meaning, but they have their own issues, such as lack of widespread font support.

Tone as a phonemic property

In many tone languages, each syllable can in principle have its own independent tone, as in the various tone patterns seen in the Bemba words in Table 3.3.

Table 3.3. Tone patterns in longer Bemba words.
tone pattern example with IPA tone diacritics example with IPA tone letters example with non-IPA tone numbers gloss
LH [kùːlá] [kuː˩la˥] [kuː¹la⁵] ‘build’
HH [βúːlá] [βuː˥la˥] [βuː⁵la⁵] ‘take’
HL [péːlà] [peː˥la˩] [peː⁵la¹] ‘give’
LHL [ùkúwà] [u˩ku˥wa˩] [u1ku⁵wa¹] ‘fall’
LLH [ìnùmá] [i˩nu˩ma˥] [i¹nu¹ma⁵] ‘back’
HLH [íŋòmá] [i˥ŋo˩ma˥] [i⁵ŋo¹ma⁵] ‘drum’
HHL [íːnt͡ʃítò] [iː˥nt͡ ʃi˥to˩] [iː⁵nt͡ʃi⁵to¹] ‘work’

Here, we see that the first syllable of a word could have either a high tone, as in [βúːlá] ‘take’, or a low tone, as in [ùkúwà] ‘fall’. Then, regardless of what tone the first syllable has, the second syllable could also have a high tone, as in [βúːlá] ‘take’ and [ùkúwà] ‘fall’, or a low tone, as in [péːlà] ‘give’ and [ìnùmá] ‘back’, and so on. While not all language with tone behave this way, in general, they often allow for a wide range of possible tone combinations.

More tones

One of the ways that tones are more complex than some other phonetic properties is that they are often not simply binary high versus low. Many languages have an intermediate mid tone (M) between high and low, such as Igala, a Yoruboid language of the Niger-Congo family, spoken in Nigeria, which has minimal triplets like those in Table 3.4, which all have a low tone on the first syllable but then one of three different tones on the second (Welmers 1973). Mid tones are represented with the IPA diacritic macron [ˉ], the IPA tone letter [˧], or an intermediate superscript number (usually [³]).

Table 3.4. Tone patterns in Igala.
tone pattern example with IPA tone diacritics example with IPA tone letters example with non-IPA tone numbers gloss
LH [àwó] [a˩wo˥] [a¹wo⁵] ‘slap’
LM [àwō] [a˩wo˧] [a¹wo³] ‘comb’
LL [àwò] [a˩wo˩] [a¹wo¹] ‘star’

Other intermediate tones are also possible, especially when describing more fine-grained details in how a given language’s tone system works.

Contour tones

So far, we have only looked at level tones (high, mid, low), which are relatively stable from beginning to end. However, many languages also have contour tones, which change in pitch during the course of the syllable. For example, Awa (a Kainantu-Goroka language of the Trans-New Guinea family, spoken in Papua New Guinea) has two level tones (H and L) and two contour tones, a falling tone (F) that starts high and ends low, and a rising tone (R) that starts low and ends high (Loving 1966), as shown in the data in Table 3.5.

Falling tones are represented with the IPA diacritic caret [ˆ], a sequence of a high IPA tone letter followed by a low tone letter (usually [˥˩]), or a sequence of superscript numbers that starts high and goes low (usually [⁵¹]). Similarly, rising tones are represented with the IPA diacritic haček [ˇ], a sequence of a low IPA tone letter followed by a high tone letter (usually [˩˥]), or a sequence of superscript numbers that starts low and goes high (usually [¹⁵]). More complicated tones are possible, including using more intermediate tones and more than two component tones in a contour, but they are beyond the scope of this textbook.

Table 3.5. Tone patterns in Awa.
tone pattern example with IPA tone diacritics example with IPA tone letters example with non-IPA tone numbers gloss
H [ná] [na˥] [na⁵] ‘breast’
L [nà] [na˩] [na¹] ‘house’
F [nâ] [na˥˩] [na⁵¹] ‘taro’
R [pǎ] [pa˩˥] [pa¹⁵] ‘fish’

Tone letters for contour tones are sometimes displayed as a single combined character rather than a sequence of separate tone letters, as shown in Figure 3.39. However, this requires a font with the combined characters properly encoded, and this is not always available.

Figure 3.39. Contour tones as sequences of separate tone letters and as combined characters.


​​Finally, we can also see changes in pitch over entire sentences as intonation, with the purpose of conveying syntactic or pragmatic information rather than morphological information. For example, the English sentence this is vegetarian chili has a different intonation depending on whether it is a declarative statement or a question, and whether there is emphasis on a particular word, as in the following examples.

  1. (What are you eating?) This is vegetarian chili.
  2. THIS is vegetarian chili (and THAT is shrimp étouffée).
  3. This is VEGETARIAN chili (not BEEF chili).
  4. This is vegetarian CHILI (not vegetarian STEW).
  5. This is vegetarian chili? (I didn’t hear exactly what you said.)
  6. THIS is vegetarian chili? (It tastes like shrimp étouffée!)
  7. This is VEGETARIAN chili? (I’m sure I tasted meat in it!)
  8. This is vegetarian CHILI? (It seems more like a stew.)

Intonation is very complex, as it depends on the syntactic structure of the utterance, as well as its role in the larger discourse. It can also interact with word-level stress or tone in complex, interesting ways. Intonation lies at the intersection of many different aspects of language, and a proper analysis requires a solid understanding of phonetics, phonology, syntax, and semantics.

Check your understanding

Coming soon!


Hamann, Silke and Nancy C. Kula. 2015. Bemba. Journal of the International Phonetic Association 45(1): 61–68.

Loving, Rochard E. 1966. Awa phonemes, tonemes, and tonally differentiated allomorphs. Papers in New Guinea Linguistics A-7: 23–32.

Welmers, William E. 1973. African language structures. Berkeley and Los Angeles: University of California Press.

Chapter 4: Phonology


The phonetic properties of language are not entirely random. There are many repeated patterns and categories that give more abstract structure to the physical reality of the linguistic signal, both within a particular language and across languages. This chapter explores this abstract structure by looking at patterns in how the physical units of language can be combined, how they affect each other in patterned ways when they are combined, and the methods linguists can use to discover these patterns.

When you’ve completed this chapter, you’ll be able to:

  • Analyze linguistic data to determine the distributions of phones in a spoken language and phonological processes in signed languages,
  • Categorize phones into phonemes based on their distributions, and
  • Write phonological rules that show how to map phonemes and underlying representations to allophones and surface representations.


4.1 Phonemes and allophones


The essence of phonology

As discussed in Chapter 3, a linguistic signal is composed of smaller physical units: phones, handshapes, movements, etc. These are not combined in purely random ways. For example, the three phones [m], [i], and [k] can be combined to form the English word [mik] ‘meek’, but the other five possible combinations are not words of English. Four of these are normally unpronounceable by English speakers: [imk], [ikm], [mki], and [kmi]. However, the fifth, [kim], could easily be integrated into English as a new word. It is just an accident of the history of English that we do not yet have this as an actual word.

Additionally, when some of these physical units are pronounced near each other, they may affect each other’s articulation. For example, in American Sign Language, the two signs FOOD and BED can be compounded to form the sign HOME, but not as a strict sequence of FOOD followed by BED. Instead, the two signs are merged into a single sign that contains properties of both of its components.

The following video clip shows the ASL sign for FOOD, with repeated tapping at the mouth with a flat-O handshape:

One or more interactive elements has been excluded from this version of the text. You can view them online here:

The following video clip shows the ASL sign for BED, with a single articulation of the open-B handshape on the side of the face, with a nonmanual head tilt:

One or more interactive elements has been excluded from this version of the text. You can view them online here:

The following video clip shows two variants of the ASL sign for HOME (note how the signer numbers each variant before signing it, by pointing to an extended finger on his left hand):

One or more interactive elements has been excluded from this version of the text. You can view them online here:

Both variants of HOME blend different parameters of FOOD and BED. For example, both variants use only the flat-O handshape from FOOD, eliminating the open-B handshape from BED. However, both variants use the location at the side of the face from BED, either as a second location after movement in the first variant or as the only location in the second variant, with the mouth location of FOOD being lost. Additionally, the repetition from HOME is reduced, resulting in only two total touches in both variants, fewer than what is used in HOME. Finally, the nonmanual head tilt for BED is not used in HOME.

There are underlying patterns in all languages that determine which combinations of physical units are valid or invalid, as well as what kinds of articulatory changes occur when these physical units are combined. The study of these patterns is called phonology.

The phonological units of spoken language

In spoken language, one important pattern is how certain phones are pronounced differently, yet are treated as the same conceptual object by speakers. For example, consider the English words atom and atomic. In most varieties of North American English, the consonant phone in the middle of atom is pronounced as an alveolar flap; recall from Section 3.4 that the alveolar flap is symbolized in the IPA by [ɾ]. But in the word atomic, the corresponding phone is a voiceless alveolar plosive followed by a notable puff of air, symbolized in the IPA as [tʰ], where the superscript [ʰ] represents the puff of air (called aspiration). However, these two words are clearly related: atomic is built from the word atom, both in pronunciation and in meaning (see Chapter 5 for more on the topic of word-building). Because of this, it is convenient to think of these two phones as being the same object on some abstract conceptual level, despite being physically different.

This object is called a phoneme, and its various physical realities as phones are called its allophones. We can think of a phoneme as a set of allophones, with each one connected to certain specific positions. So in this case, we might say that the set {[ɾ], [tʰ]} is a phoneme, with [ɾ] and [tʰ] each being allophones of that phoneme, used in different situations, called environments.

The most common types of environments require one or more specific phonetic properties immediately to the left, one or more specific phonetic properties immediately to the right, or a combination of both. As with most aspects of linguistics, the environments for allophones can be more complex than what is presented in the simpler cases discussed in this textbook.

By convention, phonemes are often notated with just a single symbol in slashes / /, because the number of allophones can get quite large, and it would be too cumbersome to continue listing out all of the allophones as a set. The choice of symbol depends on certain assumptions, but for now, we can represent this phoneme with /t/.

Both of these allophones of /t/ occur between two vowels or syllabic consonants, but the flap [ɾ] is followed by an unstressed vowel or syllabic consonant, while the aspirated [tʰ] is followed by a stressed vowel or syllabic consonant (recall from Section 3.11 that stressed syllables are typically louder, longer, and/or higher pitched than unstressed syllables). So we might conjecture that stress is at least partially responsible for determining which allophone to use for /t/.

We can test that conjecture by looking at other words where this phoneme occurs (fortunately, it is often spelled with the letter <t> in English) and seeing which allophone is used. In [ˈmɛɾl̩] metal and [məˈtʰælək] metallic, we see the same pattern as in atom and atomic, so our conjecture holds. There are other pairs of related words that show the same pattern: [“bæɾl̩] battle and [bəˈtʰæljn̩] battalion, [ˈkrɪɾək] critic and [kraɪˈtʰiriə] criteria, etc.

If we look beyond related words, we see the same pattern. English words with /t/ between two vowels or syllabic consonants tend to have the flap [ɾ] if the second is unstressed but aspirated [tʰ] if the second is stressed. That is, words like data, writer, and Ottawa have [ɾ], while words like attack, return, and Saskatoon have [tʰ].

The aim of phonology

We as linguists do not have immediate access to phonemes. They are abstractions, not concrete reality that can be directly measured in the linguistic signal. We have to look at patterns in where we find various phones and figure out whether or not they belong together as allophones of the same phoneme. This is an important part of phonology: determining what the phonemes of a language are, what each phoneme’s allophones are, and which allophones are used in which environments.

Phonologists are not just concerned with the phonology of just one particular language. We also want to uncover any general universal phonological principles that might underlie all of human language. However, this is difficult. Most importantly, modality matters a great deal in phonology, because the kinds of basic units and patterns are just fundamentally different between different modalities. The parts of the vocal tract used for spoken languages behave differently than the manual and nonmanual articulators used for signed languages.

Thus, whatever universal phonological principles there may be, they must be quite abstract and independent of specific modalities. Yet, we do find some common principles specific to each modality, so it is useful to consider spoken language phonology separately from signed language phonology, as is done in this textbook.

Finally, note that different linguists may come to different conclusions about the phonology of a language, because phonemes and other phonological units are abstract theoretical constructs, which means they are sensitive to the starting assumptions we make and the theoretical framework we are using. The examples given to you here have straightforward analyses with very few assumptions, but these are not the only possible analyses, especially in more advanced theories of phonology.

Check your understanding

Coming soon!

4.2 Phonotactics and natural classes



While physical units may change their pronunciation in some environments, it is also possible that certain physical units cannot be used in some environments at all. Each language has its own set of phonotactics, which are language-specific restrictions on what combinations of physical units are allowed in which environments. For example, English has phonotactic restrictions that ban [tl] and [dl] in onsets, but this is not a universal restriction. Plenty of languages allow onsets with [tl] and [dl], such as Ngizim, which has words like [tlà] ‘cow’ (Schuh 1977), and Hebrew, which has words like [dli] ‘bucket’ (Klein 2020).

Some phonotactic restrictions may be somewhat looser than others. English generally does not have onsets containing [pw] or [vl], yet English speakers generally have no trouble pronouncing loanwords like pueblo [pwɛblo] and proper names like Vladimir [vlædəmir].

In ASL, there is a general phonotactic restriction called the Symmetry Condition that affects signs that have movement in both hands. The Symmetry Condition requires such signs to have the same handshape and to move in the same way (Battison 1978; see Napoli and Wu 2003 for extensive discussion and elaboration of the Symmetry Condition). That is, the two moving hands cannot generally do completely different things, which is something you may have noticed for yourself in the popular childhood challenge of trying to rub your stomach while patting your head.

The Symmetry Condition is evident in the ASL sign SENTENCE in the following video clip, in which both hands have the same F handshape and are moving in the same way, with slight radioulnar wiggling and an overall path out to the sides away from the centre of the body.

One or more interactive elements has been excluded from this version of the text. You can view them online here:

Exceptions to the Symmetry Condition are rare, but possible, such as the sign OPPRESS in the following video clip, in which both hands are moving, but with different handshapes (a 5 handshape on the dominant hand and an S handshape on the non-dominant hand) and different orientations (dominant palm facing out, non-dominant palm facing to the signer’s right).

One or more interactive elements has been excluded from this version of the text. You can view them online here:

Distribution and natural classes

The overall pattern of environments where a given physical unit can occur is called its distribution, and one of the most fundamental skills in phonology is being able to determine what the distributions are for the physical units of a language.

This may seem like a daunting task, but we can use our understanding of phonology and typology to help narrow down the options. In spoken languages, phones share various phonetic properties that are often relevant to distributions. For example, the restriction on [tl] and [dl] in English onsets is not random; [t] and [d] are both alveolar plosives. They form what we call a natural class, which is a set of phones that share some phonetic properties (in this case, place and manner of articulation) and also share some phonological behaviour (in this case, being governed by the same phonotactic restriction).

Using natural classes, we can more easily describe some of the other patterns in English phonotactics. English allows up to three consonants in an onset, but when there are three, the first must always be [s], the second must be one of [p], [t], or [k], and the third must be one of [r], [l], [j], or [w]. Again, these are not random: [p], [t], and [k] are the natural class of voiceless plosives, while [r], [l], [j], and [w] are the natural class of approximants. It would be unusual if instead of this pattern, English consonant clusters could contain [s], followed by one of some set that is not a natural class (such as [f], [n], [k]), followed by one of some other set that is also not a natural class (such as [r], [t], [h], [m]).

Note that the members of a natural class are language-specific, not universal. So while [p], [t], [k] form a natural class in English, they do form not a natural class in Kalaallisut (a.k.a. Greenlandic, an Inuit language of the Inuit-Yupik-Unangan family, spoken in Greenland). Kalaallisut has [p], [t], and [k], but it also has a voiceless uvular plosive [q], as in words like [iseʀaq] ‘goose’ (Schultz-Lorentzen 1945). Thus, the natural class of voiceless plosives in Kalaallisut would be [p] [t], [k], and [q], because natural classes are exhaustive, including every relevant phone in the language

Check your understanding

Coming soon!


Battison, Robbin. 1978. Lexical borrowing in American Sign Language. Silver Spring, MD: Linstok Press.

Klein, Stav. 2020. Notes on Modern Hebrew phonology and orthography. In Usage-based studies of Modern Hebrew: Background, morpho-lexicon, and syntax, edited by Ruth A. Berman. Studies in Language Companion Series 210. Amsterdam: John Benjamins. 131–143.

Napoli, Donna Jo, and Jeff Wu. 2003. Morpheme structure constraints on two-handed signs in American Sign Language. Sign Language & Linguistics 6(2): 123–205.

Schuh, Russell G. 1977. Bade/Ngizim determiner system. Monographic Journals of the Near East: Afroasiatic Linguistics 4(3). Malibu, CA: Undena Publications.

Schultz-Lorentzen, Christian Wilhelm. 1945. A grammar of the West Greenlandic language, Meddelelser om Grønland, vol. 129(3). Copenhagen: C. A. Reitzels.

4.3 Contrastive distribution and minimal pairs


Comparing distributions with minimal pairs

In addition to the individual distribution of a single phone, we are also often interested in the relative distribution of two phones. If they have overlapping distributions, such that there are at least some environments where they both can occur, the two phones are said to contrast with each other, and thus, they have contrastive distribution.

This relates to the concept of minimal pair from Section 3.8. Recall that for signed languages, a minimal pair is two signs that have the same articulation except for one parameter. These two signs can be said to contrast with each for that parameter. We can adapt this concept to words in spoken languages.

For example, in English, the phones [p] and [k] occur in many of the same environments, creating pairs such as [pɪl] pill and [kɪl] kill, [lɪp] lip and [lɪk] lick, and [spɪl] spill and [skɪl] skill. Each of these pairs is a minimal pair that have all the same phones in the same order, except for one position. So [pɪl] pill and [kɪl] kill both have the form [ɪl], with [p] in one word and [k] in the other.

The existence of just one such minimal pair is all it takes to prove that two phones have contrastive distribution, so minimal pairs play an important role in figuring out the distribution of phones in a language and how they may be grouped into the same or different phonemes.

However, in many cases, it may be difficult or even impossible to find minimal pairs. In English, the phone [ʒ] is the rarest consonant and has a limited distribution, occurring in words like [ruʒ] rouge, [ɡərɑʒ] garage, [vɪʒn̩] vision, and [mɛʒr̩] measure. It is almost never word-initial in English, except in some proper names (perhaps most famously, Hungarian-American actress Zsa Zsa Gabor) and in the neologism [ʒʊʒ] zhoozh ‘improve the appearance of someone or something with a small change’. This makes it difficult to find minimal pairs where [ʒ] is a crucial phone, especially when comparing it to another relatively rare phone like [ʃ], though there are a few examples of minimal pairs for [ʒ] and [ʃ] involving unusual or rare words, such as [əluʒn̩] allusion versus [əluʃn̩] Aleutian and [mɛʒr̩] measure versus [mɛʃr̩] mesher.

Near-minimal pairs and nonce words

But if no minimal pairs can be found, we usually have to rely on near-minimal pairs instead. A near-minimal pair looks almost like a minimal pair, except there are one or more additional differences elsewhere in the word besides the crucial position. For example, the English pair [plɛʒr̩] pleasure and [prɛʃr̩] pressure form a near-minimal pair for [ʒ] and [ʃ]. In the position of interest, we have [ʒ] versus [ʃ], which seem to be contrastive because nearly all of the rest of the phones are the same in both words, except for [l] versus [r], which prevents these words from being a true minimal pair.

While a single minimal pair is very powerful, a single near-minimal pair is not. We may have simply stumbled upon a weird example where the apparent meaningless difference is actually relevant to the distribution of the phones we are interested in. We cannot immediately determine whether or not a given near-minimal pair is useful, so it is important to find multiple examples. As we collect more near-minimal pairs, we can be more confident that the small differences are incidental rather than crucial to the distribution of the phones in question.

This is where speaker competence can also be useful, by asking them to evaluate nonce words, which are words that we make up for one-time use, such as for linguistic experimentation. We can construct nonce words that fill in minimal pair gaps, and if speakers agree that the nonce word is a valid hypothetical word of the language, then we can be more sure that the phones in question do in fact contrast with each other.

For example, rather than looking for more near-minimal pairs for [ʒ] and [ʃ], we could instead take an existing word with [ʒ] in it, like [beʒ] beige, then create a nonce word that is the same, except replacing [ʒ] with [ʃ], giving us a pair like [beʒ]-[beʃ]. Then we could ask English speakers whether the nonce word [beʃ] could be used as a completely different word with a different meaning from [beʒ]. Most speakers would agree, so we would be reasonably sure that [ʒ] and [ʃ] do indeed contrast with each other, despite not having a true minimal pair of actual existing English words.

Depending on the structure of the language and what resources we have access to, we may use one or more of these three tools (minimal pairs, near-minimal pairs, nonce words) to determine whether two phones contrast with each other. We would also need to do this work for every pair of phones in the language, but in some cases, we may get lucky, and there may be minimal triplets, minimal quadruplets, or even larger minimal n-tuplets.

For many speakers, English beet, bit, bait, bet, bat, but, bot, bought, boat, and boot form a minimal 10-tuplet (a decuplet!), showing simultaneously that the ten vowels [i], [ɪ], [e], [ɛ], [æ], [ʌ], [ɒ], [ɔ], [o], and [u] all contrast with each other. This cuts down on the work needed to demonstrate patterns of contrast in the language. But in many languages, even minimal pairs can be hard to find, so finding near-minimal pairs and testing nonce words may be the only options.

Check your understanding

Coming soon!

4.4 Complementary distribution


Phones without contrastive distribution

Two phones may instead have complementary distribution, with environments that never overlap. This means there is one set of environments for one phone and a completely different set of environments for the other.

For example, the phones [h] and [ŋ] are in complementary distribution in English for many speakers. For these speakers, [h] can only appear at the beginning of a word, as in [həˈræs] harass, or at the beginning of a stressed syllable, as in [ˌkɒmprəˈhɛnd] comprehend and [ˈt͡ʃaɪldˌhʊd] childhood. We can even see [h] appear and disappear in related words that have different stress patterns: there is an [h] in the stressed syllable of [vəˈhɪkjulr̩] vehicular, but there is no [h] in the corresponding unstressed syllable in [ˈviəkl̩] vehicle.

Conversely, for the same speakers, [ŋ] can never appear in those positions. It can only appear exactly where [h] cannot, such as in a coda, as in [lɒŋ] long and [fɪŋ.ɡr̩] finger, or at the beginning of an unstressed syllable, as in [ˈsɪ.ŋr̩] singer.

Further, if we try to replace [h] or [ŋ] with each other in any word, the resulting nonce words would be judged ungrammatical: *[ŋəræs], *[kɒmprəŋɛnd], *[lɒh], *[fɪhɡr̩], etc. Thus, we can never find or create minimal pairs for [h] and [ŋ], so they appear not to contrast with each other.

And yet, [h] and [ŋ] still seem to function as fundamentally different consonants in English, because they seem to belong to different phonemes, despite being in complementary distribution. No one would confuse one for the other, and in a broad transcription, we would notate them with different symbols. Thus, while contrastive distribution is enough to determine that two phones are allophones of separate phonemes, it is not a requirement.

Now consider the vowels in most North American pronunciations of English [bid] bead and [bit] beat. In broad transcription, we would normally use the same symbol [i] for both vowels, but in a more narrow transcription, we might want to indicate that the vowel of bead is longer, with [biːd] versus [bit]. Long [iː] and short [i] are different phones in English, with [iː] consistently being about 1.2–1.5 times as long as [i], and if we swap them, pronouncing bead as [bid] and beat as [biːt], it sounds very odd.

Like [h] and [ŋ], [iː] and [i] are in complementary distribution. Long [iː] must be followed by a coda with only voiced consonants, as in [biːd] bead, [fliːz] fleas, and [biːrd] beard. Compare these to words where one or more of the following consonants in the following coda is voiceless, where we instead find short [i]: [bit] beat, [flis] fleece, and [pirs] pierce.

So we have two pairs of phones, [h] and [ŋ] versus [iː] and [i]. In each pair, the two phones have complementary distribution, but the pairs behave differently. Despite the complementary distribution, we conceive of [h] and [ŋ] as somehow completely different consonants, needing to be represented differently even in broad transcription, just like any pair of contrasting phones: [p] and [b], [i] and [ɪ], etc. However, [iː] and [i] just seem to be variants of the same fundamental vowel phoneme.

That is, we want to treat [h] as belonging to a phoneme distinct from [ŋ], while treating [iː] and [i] as two allophones of the same phoneme. So, the phoneme corresponding to [h] would be notated as /h/, the phoneme corresponding to [ŋ] would be notated as /ŋ/, and the single phoneme corresponding to both [iː] and [i] would be notated as /i/.

Phonetic similarity of allophones

Why should we treat these two pairs differently? We often make the decision based on phonetic similarity, which is how much the relevant phones have in common in terms of their articulation. The phones [h] and [ŋ] are both consonants, but that is where their phonetic similarity ends: they differ in phonation, place of articulation, and manner of articulation, which are the main properties that define a consonant. This lack of phonetic similarity is a good reason to think that [h] and [ŋ] belong to different phonemes, despite being in complementary distribution.

In comparison, [iː] and [i] have a lot of phonetic similarity: they have the same vowel quality in all four respects (height, backness, rounding, and tenseness), and they differ only in vowel length. Complementary distribution and phonetic similarity together are strong evidence that [iː] and [i] are allophones of the same phoneme.

Of course, we have to be careful when looking at phonetic similarity. The two allophones [ɾ] and [tʰ] of the phoneme /t/ discussed in Section 4.1 are both alveolar consonants, but they are other wise very different in phonation and manner of articulation.

A key result of phonology is that if two phones are in contrastive distribution, then they are allophones of different phonemes. But as we see here, if two phones are in complementary distribution, they could be allophones of different phonemes, as with [h] and [ŋ], or the same phoneme, as with [iː] and [i]. Knowing how to decide which is which is another fundamental skill in phonology.

Check your understanding

Coming soon!

4.5 Phonemic analysis



Phonemic analysis is the process of analyzing a spoken language to figure out what its phonemes are, what the allophones are of those phonemes, and what each allophone’s distribution is. The resulting overall analysis is called a phonemicization of the language.

Note that a given phonemicization represents only one of many possible analyses. Languages do not generally have one single unique phonemicization, because there are many possible ways of dividing up the phones of a language into phonemes.

In addition, since phonemes are theoretical, abstract concepts, we have no direct way to see if our analysis is correct. Indeed, some linguists reject the notion of phonemes completely, since it is possible to analyze the phonology of a language without them. However, there is experimental evidence that speakers do make use of something phoneme-like, and until we are able to open up the human brain and find exactly how language is represented, phonemes are a reasonable analysis (see Chapter 13 for more information).


Even though we cannot know whether a given phonemicization, or any phonemicization at all, is correct, we can still compare different analyses and see which one is a better fir for the data and our assumptions.

In particular, if we have two competing phonemicizations that both account for all of the available data, we will generally prefer the simpler analysis (if there is one). This is the principle of simplicity. However, there is no single objective measure of simplicity, and it is sometimes possible to come up with two competing analyses that are seemingly equal in simplicity. In such cases, we might rely on other factors, but ultimately, we would normally be left in an ambiguous state.

Fortunately, the data sets you typically see in an introductory linguistics course have been carefully selected to have one obvious optimal phonemicization. But out in the real world, when we are working with raw linguistic data, there are often no obvious optimal analyses, so we may be less confident in whatever analyses we do come up with.

An example of phonemic analysis: Georgian laterals

To demonstrate phonemic analysis, consider the following data from Georgian, a Karto-Zan language of the Kartvelian family, spoken in Georgia (data adapted from Kenstowicz and Kisseberth 1979).

[vxlet͡ʃh] ‘I split’ [saxɫʃi] ‘at home’
[t͡ʃet͡ʃxli] ‘fire’ [kaɫa] ‘tin’
[zarali] ‘loss’ [pepeɫa] ‘butterfly’
[t͡ʃoli] ‘wife’ [kbiɫs] ‘tooth’
[xeli] ‘hand’ [ɫxena] ‘joy’
[kleba] ‘reduce’ [erthxeɫ] ‘once’
[leɫo] ‘goal’ [xoɫo] ‘however’
[ɫamazad] ‘prettily’

Step 1: Identify and organize the phones of interest

If we don’t have a particular set of phones in mind or want to phonemicize the entire language, we can start by searching for minimal pairs, or begin analyzing some small, simple natural class, such as the voiceless plosives or the front vowels. In introductory phonology assignments, you will normally be given the specific phones of interest.

For this demonstration, we have two specific phones of interest: an alveolar lateral approximant [l] (often called clear or light [l]) and a velarized alveolar lateral approximant [ɫ] (often called dark [ɫ]), which has the tongue back raised somewhat towards the velum as a secondary articulation along with the normal primary alveolar articulation. Many speakers of English have both of these two phones, with clear [l] at the beginning of a word and dark [ɫ] at the end, as in [lif] leaf versus [fiɫ] feel. For English, these two phones can be shown to be allophones of a single lateral approximant phoneme due to their complementary distribution and phonetic similarity, so we might wonder if the same holds true for Georgian.

Once we have selected a set of phones to study, we may want to organize them by natural classes. With only two or three, no grouping is normally necessary. But if we have four or more, we may find it helpful (we need to do this in Section 4.6 for an example from French).

For the Georgian lateral approximants, we should also keep in mind what makes them different. Here, the difference is between a raised tongue back for dark [ɫ], an no tongue back raising for clear [l]. Very often, we find that the distribution of a phone depends on properties related to its articulation, so if [l] and [ɫ] have complementary distribution, we might expect tongue backness of neighbouring phones to matter. Sometimes, however, there is no apparent phonetic relationship between phones and their environment, so we cannot rely on this as a universal strategy. Thus, while we should keep an eye on tongue backness in the environment, we should be open to other factors.

Step 2: Identify the individual environments of the phones of interest

With an understanding of how the phones of interest are related to each phonetically, we can create a diagram with the phones of interest listed across the top, and then, under each phone, we list out the individual environments it occurs in, word by word.

Most of the time, we can just look at what occurs immediately to the left and right of a phone to determine its environment, though sometimes, we may need to consider other information, such as syllabic position, stress, tone, or even phones that are farther away. The vast majority of the time, however, just looking at the immediate right and left will work.

For the purposes of compactness in notation when building such lists of environments, it is common to use the hash symbol # (a.k.a. number sign, pound sign, octothorpe, etc.) to mark a word boundary and an underline to represent the position of the phone of interest. Thus, “#a” for [ɫ] indicates that there is some word in the data in which [ɫ] is at the beginning of the word and is followed by [a], in this case, [ɫamazad] ‘prettily’. Using this method for [l] and [ɫ] in Georgian, we would get the following lists of environments.

[l] [ɫ]
xe eo
xi #a
ai xʃ
oi aa
ei ea
ke is
#e #x

Each entry in these lists comes from one or more words. The very first word in the data is [vxlet͡ʃh] ‘I split’, which contains [l] in the environment xe, that is, it occurs between [x] and [e], so we enter xe in the column under [l]. The second word in the data is [t͡ʃet͡ʃxli] ‘fire’, which contains [l] in the environment xi, so we enter xi in the same column.

We continue in this way, word by word, entering all of the environments where we find each of the phones of interest. Note that if a word contains multiple instances of any of phones of interest, we enter all relevant environments. We see this with the word [leɫo] ‘goal’, which has the environment #e for [l] and eo for [ɫ], so both of those get entered into their respective lists.

Step 3: Determine overlap in environments

We first want to make sure that the phones are not in obvious contrastive distribution. If both phones have some of the exact same environments, then there is a good chance they are allophones of separate phonemes. Consider instead if we have constructed similar lists for English [p] and [k]. At some point, we would likely have entries like #ɪ and su for both of them, due to words like pit, kit, spoon, and school. In that case, we would likely conclude that the phones are contrastive and should be analyzed as allophones of separate phonemes. We could then stop our analysis of those phones!

But for Georgian, we have to keep going, because there is no apparent overlap. We could still come to the conclusion that [l] and [ɫ] all allophones of separate phonemes, but we cannot base that decision on any overlap in environments in the data we have here.

Step 4: Simplify the environments

Looking at the left side of the environments for both phones, we see many of the same symbols: [x], [a], [e], [o], and #. There is not a lot of consistency on the left side, with no obvious natural classes in the left environment of one phone versus the other. However, on the right side of these phones, we see some repetition of phones and some natural classes within each phone’s list, rather than between the two lists, so it looks like the right environment may be crucial for discovering complementary distribution. Thus, we can simplify our analysis by ignoring the left environment. We can rewrite the lists by leaving off the left environment and removing any repeated entries, which gives us the following much simpler list of environments.

[l] [ɫ]
e o
i a

Now it is much easier to see what the distributions of these two phones are: [l] occurs only before the front vowels [e] and [i], while [ɫ] occurs only before the back vowel [o], the central vowel [a], the voiceless fricatives [ʃ], [s], and [x], and the end of the word. This is classic complementary distribution, because these are exactly opposite environments: front vowels are not back or central vowels, they are not voiceless fricatives, and they are not word boundaries. Neither phone of interest seems able to appear in the environment of the other.

Note how this pattern also fits our preliminary conjecture in Step 1 that the distribution of these two phones might have something to do with tongue backness, since that is precisely the property some of these environments differ in, specifically front versus back vowels.

Step 5: Organize the phones into phonemes

Since [l] and [ɫ] seem to be in complementary distribution, we might suspect they are allophones of the same phoneme. The question is, do they behave more like English [h] and [ŋ] (which speakers would normally conceptualize as belonging to different phonemes) or like English [iː] and [i] (which speakers would conceptualize as belonging to the same phoneme)? It is not always clear what to do in a given case, but we typically want to look for phonetic similarity.

The Georgian laterals have a lot of phonetic similarity: they have the same phonation (voiced), the same place of articulation (alveolar), and the same manner of articulation (lateral approximant); they differ only in secondary articulation (velarized or not). Thus, we have both complementary distribution and a high degree of phonetic similarity, so it seems reasonable to analyze [l] and [ɫ] as allophones of the same phoneme.

Step 6: Identify the default allophone and finalize the analysis

The default allophone of a phoneme is the one that occurs in the widest variety of environments, what we sometimes call the elsewhere case. For Georgian lateral approximants, the default is clearly [ɫ], since it occurs in many distinct environments that are all dissimilar from each other. By convention, we normally use the symbol of the default allophone to represent the phoneme unless there is good reason to do otherwise, so here, we would represent the phoneme containing [l] and [ɫ] as /ɫ/, since [ɫ] is the default allophone.

Note that the phoneme /ɫ/ and the phone [ɫ] are different kinds of objects, so this notation difference is crucial. Phonemes are theoretical abstractions that might also correspond to some kind of mental representation, while allophones are phones, which means they are concrete measurable sounds that are physically produced. The phoneme /ɫ/ corresponds to the set of allophones [l] and [ɫ], with [l] occurring before front vowels and the default [ɫ] occurring elsewhere.

Phonemes and their allophones are often depicted graphically in a tree-like diagram like the diagram for /ɫ/ in Figure 4.1. Here, we informally abbreviate “before front vowels” as front V to save space in the tree.

Figure 4.1. Phoneme diagram for/ɫ/ in Georgian.


This analysis makes predictions about laterals in Georgian beyond what we see in the given data. We would expect every clear [l] in Georgian to be followed by a front vowel, and we would expect every dark [ɫ] in Georgian to be followed by something other than a front vowel. All of the data we looked at agrees with these predictions, though we could still be wrong if we find new evidence that contradicts our analysis.

For example, we predict that [ɫ] should be able to occur before any consonant, not just voiceless fricatives, because it is the default case and should be appear in the widest variety of environment, while clear [l] is restricted to only appearing before front vowels. This is a testable prediction! We can look for more Georgian words and see what kind of lateral we find before other consonants. Fitting our prediction, we find only dark [ɫ] before consonants, as in [aɫq’a] ‘siege’, which cannot be pronounced *[alq’a] with a light [l].

Check your understanding

Coming soon!


Kenstowicz, Michael, and Charles Kisseberth. 1979. Generative phonology: Description and theory. New York: Academic Press.

4.6 Another example of phonemic analysis


More than two phones of interest

The Georgian case is a pretty straightforward example, with only two phones of interest and fairly obvious distributions and phonetic similarity. However, we often encounter more difficult cases, maybe because there are many phones of interest or because the distributions and/or phonetic similarity may be less clear.

Consider the following data for how some speakers pronounce French (a Western Romance language of the Indo-European family, spoken in France and elsewhere in the world; data adapted from Katamba 1989). The phones of interest are the voiced sonorants [m], [l], and [ʀ] and the voiceless sonorants [m̥], [l̥], and [ʀ̥]. Note that [ʀ] represents a voiced uvular trill and the diacritic [  ̥] indicates that the phone is voiceless rather than voiced.

[ʀym] ‘cold/flu’ [il] ‘island’
[mɛʀ] ‘mother’ [tabl] ‘table’
[tɛʀm] ‘term’ [kasabl] ‘breakable’
[film] ‘film’ [ɛl] ‘she’
[limite] ‘limited’ [klemã] ‘merciful’
[liʀ] ‘to read’ [simetʀikmã] ‘symmetrically’
[lɛvʀ] ‘lip’ [ɛtʀ̥] ‘to be’
[plɛziʀ] ‘pleasure’ [ʃifʀ̥] ‘number/figure’
[tʀivjal] ‘trivial’ [mɛtʀ̥] ‘to put’
[ʀali] ‘race-meeting’ [mɛkɔnɛtʀ̥] ‘to fail to recognize’
[ʀymatismal] ‘rheumatic’ [pœpl̥] ‘people’
[ʀɔ̃fle] ‘to snore’ [ɔ̃kl̥] ‘uncle’
[ekʀiʀ] ‘to write’ [tãpl̥] ‘temple’
[tɔʀdʀ] ‘to wring’ [ʀitm̥] ‘rhythm’
[pɛʀs] ‘Persian’ [ʀymatism̥] ‘rheumatism’

Now we just follow the same steps we did for Georgian.

Step 1: Identify and organize the phones of interest

Here, we have a lot of data to sort through, and six phones to consider. But the phones neatly separate into either three pairs ([m]-[m̥], [l]-[l̥], and [ʀ]-[ʀ̥]) or two triplets ([m]-[l]-[ʀ] and [m̥]-[l̥]-[ʀ̥]). Since it’s usually easiest to analyze a pair, we can start with just one pair and see if we can find any patterns. We will choose [m] and [m̥] first.

Step 2: Identify the individual environments of the phones of interest

For each phone in our chosen pair, we write down the individual environments it occurs in, word by word (again, at an introductory level, we will normally only ever need to consider the immediate right and left environment). So for [ʀym] ‘cold/flu’, we would write down y# in the column for [m], because [ʀym] has [m] between [y] and the end of the word. Then for the next word [mɛʀ] ‘mother’, we would down #ɛ in the column for [m]. And so on, until we have the following full list of environments:

[m] [m̥]
y# t#
#ɛ s#

Step 3: Determine overlap in environments

To check whether the phones are in contrastive distribution, we need to see if there is any overlap in the environments on the two lists. If both phones have some of the exact same environments, then there is a good chance they are allophones of separate phonemes.

There are no exact matches, but we do see some similarities in some left and right right environments. For example, we see that both [m] and [m̥] occur after [s]. However, voiced [m] only does so when followed by the vowel [a], as in [ʀymatismal] ‘rheumatic’, while voiceless [m̥] only does so at the end of the word, as in [ʀymatism̥] ‘rheumatism’.

Similarly, we see that both [m] and [m̥] occur at the end of the word, but with restrictions. Voiced [m] is only word-final when it is preceded by [y] or [l], as in [ʀym] ‘cold/flu’ and [film] ‘film’, while voiceless [m̥] is only word final when preceded by [t] or [s], as in [ʀitm̥] ‘rhythm’ and [ʀymatism̥] ‘rheumatism’. It is probably no coincidence that the difference in the voicing of the two phones of interest happens to match the voicing on the phone on its left in these cases.

Since there are apparent patterns to how these two phones are distributed, rather than them being able to occur in the same environments, they appear not to be in contrastive distribution, so we would continue on to Step 4.

Step 4: Simplify the environments

This is not a lot to go on, because there is so little data for [m̥], but it seems like both the left and right sides matter for the distribution of [m̥], since it consistently has a natural class on the left (voiceless obstruents) and a word boundary on the right. There is not much of a pattern to the distribution of [m], since it has a mix of various natural classes on both sides. So as a first guess, we might say that [m̥] occurs only word-finally when immediately preceded by a voiceless obstruent, while [m] occurs instead either after voiced phones (regardless of what comes after) or before any phone at all (that is, it is not word-final). This is complementary distribution.

Step 5: Organize the phones into phonemes

Since [m̥] and [m] seem to be in complementary distribution and are phonetically similar (they are both bilabial nasal stops, differing only in phonation), it seems reasonable to analyze [m̥] and [m] as allophones of the same phoneme.

Step 6: Identify the default allophone and finalize the analysis

The default allophone appears to be [m], since it occurs in two distinctly different environments, while [m̥] only occurs in one. Thus, we would propose a single phoneme /m/ with two allophones: [m̥] occurring word-finally when immediately preceded by a voiceless obstruent (abbreviated vls obs# here) and [m] occurring elsewhere as the default.

Figure 4.2. Phoneme diagram for /m/ in French.

Repeat Steps 2–6 for [l] and [l̥]

This seems like a reasonable analysis, so we can continue working through the phones of interest in pairs. The next pair to analyze is [l] and [l̥], so we cycle back and repeats Steps 2–6. This gives us the following list of environments for [l] and [l̥]:

[l] [l̥]
im p#
#i k#

We see the same pattern of complementary distribution as for the bilabial nasals: the voiceless lateral [l̥] occurs word-finally when immediately preceded by a voiceless obstruent, while the voiced lateral occurs everywhere else, either after any voiced phone, or before any phone (to prevent it from being word-final). We would end up with a parallel analysis to the nasals, with /l/ as the phoneme, having a voiceless allophone [l̥] in one environment (word-final while immediately after a voiceless obstruent) and a default voiced allophone [l] everywhere else:

Figure 4.3. Phoneme diagram for /l/ in French.

Repeat Steps 2–6 for [ʀ] and [ʀ̥]

Then we do the same for [ʀ] and [ʀ̥], but first, note how /m/ and /l/ have the same basic pattern: one exact same kind of allophone (voiceless) in the exact same environment (word-final immediately after a voiceless consonant), and the exact same kind of default allophone (voiced). Note only that, /m/ and /l/ are part of a natural class: they are both sonorant consonants. But the remaining pair of phones we need to analyze, [ʀ] and [ʀ̥], are also sonorant consonants.

This is unlikely to be a coincidence, so we can make a prediction, even before we look at the data. We predict that [ʀ] and [ʀ̥] should pattern just like the other two pairs of sonorants, with the two phones being in complementary distribution, and with the voiceless phone occurring only word-finally immediately after a voiceless obstruent and the voiced phone occurring elsewhere (after a voiced phone or before any phone at all). In the following list of environments for [ʀ] and [ʀ̥], the predicted pattern is exactly what we find:

[ʀ] [ʀ̥]
#y t#
ɛ# f#

Thus, we end up with the same basic analysis as for the previous pairs: the two phones of interest [ʀ] and [ʀ̥] are allophones of /ʀ/, with the voiceless allophone occurring word-finally immediately after a voiceless obstruent and the voiced allophone as the default, occurring everywhere else:

Figure 4.4. Phoneme diagram for /ʀ/ in French.

Is there more?

This analysis is nice, but it still seems like we are missing something. Why do these three phonemes have the same basic pattern for their allophones? Why do they have a voiceless allophone in this particular environment and not somewhere else? Is there a reason the environment for the voiceless allophone also mentions voicelessness? Recall how the environment for one of the Georgian laterals similarly shared a phonetic property with the allophone that occurred there. Can we somehow represent the larger pattern in the overall distribution of French sonorants generally? Right now, the distributions are still specified for each individual phoneme separately, creating a lot of redundancy in our analysis. The next stage of our phonological analysis in Section 4.7 will help answer these questions!

Check your understanding

Coming soon!


Katamba, Francis. 1989. An introduction to phonology. London: Longman.

4.7 Phonological rules


Eliminating redundancy with faithfulness

If we write out our analysis of the French sonorants as descriptions of how to pronounce the three phonemes, we get statements like the following:

Note the massive amount of redundancy in these statements. First, every phoneme /X/ has a statement of the same exact form: “/X/ is pronounced [X] elsewhere”. This is because of how we chose to represent the phoneme, using the same symbol as the default allophone. If we consistently do this for every phonemicization, then we will always have this kind of statement for the default pronunciation for every phoneme.

Since we will always have this default statement, we don’t need to list it explicitly. Instead, we can simply treat it as an inherent part of how phonology works: every phoneme is always pronounced as its matching default allophone “elsewhere”. This is sometimes called the principle of faithfulness: if a phoneme occurs in an environment not covered by any other statement for the pronunciation of that phoneme, then it is pronounced the same (its pronunciation is “faithful” to its phoneme). Thus, we can remove every instance of this default statement, relying instead on the principle of faithfulness to universally give us the default allophones for every phoneme in every spoken language. This leaves us with the following three statements for French:

Eliminating redundancy with natural classes

There is still some remaining redundancy. All three of these statements have the same form: “/X/ is pronounced [X̥] word-finally after a voiceless obstruent”. This is another pattern, and part of phonology (and linguistics in general) is finding patterns and reducing them down to simpler descriptions and explanations.

Note that /m/, /l/, and /ʀ/ are all sonorants. This has the beginnings of a natural class, but natural classes need to be exhaustive, and there are other sonorants in French. For example, we see [n] and [j] in the data, and these are presumably allophones of /n/ and /j/, which would need to be included in any natural class of sonorant phonemes. This leaves us with two options: either there are three independent statements about some sonorants as above, one for each of /m/, /l/, and /ʀ/, that coincidentally all have the exact same basic form, or there is a single statement we can construct that covers all sonorants, including /n/ and /j/.

Each option makes a different prediction about the pronunciation of French. If /m/, /l/, and /ʀ/ behave completely independent of /n/ and /j/, then we predict that /n/ and /j/ would not have voiceless allophones if they are word-final after a voiceless obstruent. If instead there is a single pattern that applies to all sonorants, we predict that /n/ and /j/ should have voiceless allophones in exactly the same environments that /m/, /l/, and /ʀ/ do.

Nothing in the given data can help us decide between these two options, because there are no words with /n/ or /j/ in the relevant environment in data. In fact, French phonotactics prevent that from ever happening anyway, so can unfortunately never test our predictions!

Eliminating redundancy with simplicity

Since we have two competing analyses that both account for the given data, and no other data can be found to contradict either analysis, we can follow the principle of simplicity and pick the analysis with the fewest statements. This allows us to simplify our three statements down to just one, something like the following:

Note that this says nothing about what happens to the place and manner of articulation of the sonorants, just their phonation. We should assume that statements like these only affect exactly what they say; everything else must remain faithful (unchanged). We do not want /m/ turning into any random voiceless phone! We specifically want it to be pronounced as [m̥], so only its phonation differs.

Writing phonological rules

These kinds of statements are often called phonological rules, and there is a shorthand notation we can use to reduce them down to a form that is easier to deal with. We can use an arrow \rightarrow to replace “is pronounced as” and use a slash / to separate the change in the rule from the environment where the rule applies. Finally, we can replace the wordy description of the environment “word-finally after a voiceless obstruent” with the simplified notation we used in the phoneme diagrams, using an underline to represent the position in the environment where the phoneme must be to undergo the rule and the hash # to indicate word boundaries.

This gives us the following shorthand rule:

There are more advanced ways we can simplify phonological rules, but for the purposes of this textbook, this form will be sufficient. We now have the following basic template for a phonological rule, containing three key components: the target (indicated here by A), the change (B), and the environment (C ▁ D).

A \rightarrow B / C ▁ D

The target of a phonological rule is the natural class of phonemes that are changed into their appropriate allophones. The change caused by a phonological rule is the list of all phonetic properties that describe how the allophones consistently differ from the target phonemes. Finally, the environment is the same as what we used for talking about the distribution of allophones. As we have seen, most environments typically only reference something on the immediate left and/or immediate right, though more complicated environments are possible.

Generative phonology and levels of representation

In some versions of phonology, phonemes, allophones, and phonological rules are not just convenient descriptions of patterns, but crucial objects in the theory, sometimes proposed to represent some aspect of cognitive reality. One of the most common such versions of phonology is generative phonology, initially developed in the 1950s and 1960s (Chomsky 1951, Chomsky et al. 1956, Halle 1959, Chomsky and Halle 1968), building upon ideas developed in the first half of the 20th century (Saussure 1916, Bloomfield 1939, Swadesh and Voegelin 1939, Trubetzkoy 1939, Jakobson 1942, Harris 1946/1951, Wells 1949) and ultimately reflecting ideas from the work of Dakṣiputra Pāṇini, a grammarian in ancient India (ca. 500 BC) who developed concepts and methods for the analysis of Sanskrit that can still be seen in modern linguistics.

In generative phonology, words have at least two distinct phonological forms. One is an approximation of the pronunciation (narrow or broad, as needed), which we have been representing in square brackets with phones. This representation is called the surface representation (SR) or phonetic representation. Because it is made up of phones, the SR is a relatively concrete representation, something directly observable and measurable. Here, all of the data we have been looking at are given SRs.

The second representation is made up of phonemes and is called the underlying representation (UR) or phonemic representation. Because the UR is made up of phonemes, it is an abstract object in our theoretical analyses of a language. As with phonemes, there is debate about whether URs also correspond to any sort of cognitive reality, but whether or not they do, they are useful tools for describing the phonology of a language. Here, we would have to rewrite all of our data using phonemes instead of allophones.

Thus, for every word in Georgian, we would replace every clear [l] with its phoneme /ɫ/. So the URs for [t͡ʃoli] ‘wife’ and [xeli] ‘hand’ would be /t͡ʃoɫi/ and /xeɫi/.

Similarly, to get the URs for the French data, we would replace all of the voiceless sonorants with their corresponding phonemes: the UR of [ɛtʀ̥] ‘to be’ would be /ɛtʀ/, the UR of [pœpl̥] ‘people’ would be /pœpl/, and the UR [ʀitm̥] ‘rhythm’ would be /ʀitm/. Note URs are enclosed with slashes, because they are made up of phonemes.

In generative phonology, the relationship between URs and SRs is not just a static link. Instead, URs are treated as inputs to a process that “generates” the SRs as output, by actively changing the phonemes into their appropriate allophones. This model is designed to mimic how language presumably works: we begin with some mental representation of a word in our mind, and then sometime later, we articulate that word. This overall process is called a phonological derivation, and the individual components of this process that change the phonemes are our phonological rules. This model is represented graphically in the following diagram.

Figure 4.5. Model of generative phonology.

Check your understanding

Coming soon!


Bloomfield, Leonard. 1939. Menomini morphophonemics. In Études phonologiques dédiées à la mémoire de M. le prince N. S. Trubetzkoy, vol. 8, 105–115. Jednota českých matematiků a fyziků.

Chomsky, Noam. 1951. The morphophonemics of Modern Hebrew. Master’s thesis, University of Pennsylvania, Philadelphia.

Chomsky, Noam and Morris Halle. 1968. The sound pattern of English. New York: Harper & Row.

Chomsky, Noam, Morris Halle, and Fred Lukoff. 1956. On accent and juncture in English. In For Roman Jakobson: Essays on the occasion of his sixtieth birthday, ed. Morris Halle, Horace Lunt, Hugh MacLean, and Cornelis van Schooneveld, 65–80. The Hague: Mouton.

Halle, Morris. 1959. The sound pattern of Russian: A linguistic and acoustical investigation. The Hague: Mouton.

Harris, Zellig S. 1946/1951. Methods in structural linguistics. Chicago: University of Chicago Press.

Jakobson, Roman. 1942. The concept of phoneme. In On language, ed. Linda R. Waugh and Monique Moville-Burston, 218–241. Cambridge, MA: Harvard University Press.

Saussure, Ferdinand de. 1916. Cours de linguistique générale. Paris: Payot.

Swadesh, Morris and Charles F. Voegelin. 1939. A problem in phonological alternation. Language 15: 1–10.

Trubetzkoy, Nikolai Sergeyevich. 1939. Grundzüge der Phonologie, Travaux du Cercle linguistique de Prague, vol. 7. Prague: Jednota českých matematiků a fyziků.

Wells, Rulon S. 1949. Automatic alternations. Language 25(2): 99–116.

4.8 Phonological derivations


Choose examples showing rule application

Once we have finished the phonemic analysis of a language and determined what phonological rules we need, we can demonstrate how the analysis works by showing sample derivations of a few critical words. When doing this, it is important to demonstrate a few things. First, we should give examples showing how the rule correctly applies when the target is in the environment, and we should do so for a representative set of phonemes in the target natural class.

For French, this means we should demonstrate the rule applying to at least three words, one with /m/, one with /l/, and one with /ʀ/. We could pick [ʀitm̥] ‘rhythm’, [ɔ̃kl̥] ‘uncle’, and [ɛtʀ̥] ‘to be’. If we are dealing with a particularly large target natural class (for example, all obstruents or all vowels), we usually only need to show a few examples, with enough diversity that they can be taken to be representative of the full natural class. Do not pick /t/, /s/, and /ʃ/ to represent all obstruents! Use something like /p/, /z/, and /ɡ/ instead.

Choose examples showing lack of rule application

​​We should also show a few examples of how the rule will not apply when the target phonemes are in the wrong environment. So we might want to show an example with a word-final sonorant preceded by something voiced, such as [tabl] ‘table’, as well as an example with a sonorant after a voiceless obstruent but not at the end of the word, such as [ekʀiʀ] ‘to write’.

Finally, we often might want to also show examples with phonemes that are similar to the target natural class and which are in the right environment but are not affected by the rule. The only similar phonemes that we can find in the right environment are vowels, so we could pick an example like [limite] ‘limited’, which has a vowel in the correct environment of the rule, but which does not change, because vowels are not sonorant consonants.

Determine URs

Then for each of the example words we are going to use in our demonstration, we need to determine their URs. Because of the principle of faithfulness, we know that the UR and SR should look the same, except specifically only in those places where a rule applies. In this case, the only rule we have creates voiceless sonorants, so to build the URs for our sample words, we should replace all the voiceless sonorants in the SRs with the underlying voiced phonemes they are derived from. This gives us the following set of example URs:

/ʀitm/ ‘rhythm’
/ɔ̃kl/ ‘uncle’
/ɛtʀ/ ‘to be’
/tabl/ ‘table’
/ekʀiʀ/ ‘to write’
/limite/ ‘limited’

Demonstrate the derivation

Finally, we can construct a derivation table which visually demonstrates the the phonological derivation of one or more words. Derivations are commonly formatted as follows, with the URs and glosses of the example words listed horizontally across the top, all of the relevant phonological rules listed vertically down the left, and the SRs listed horizontally across the bottom. In each column, the output of each phonological rule is given, showing how the word changes. We can use a dash — to indicate that the rule does not apply to a particular word. It is also useful to give rules a meaningful name as a reminder of what the rule is. Here, we will call the rule for French sonorants “devoicing”.

‘to be’
‘to write’
devoicing ʀitm̥ ɔ̃kl̥ ɛtʀ̥
SR [ʀitm̥] [ɔ̃kl̥] [ɛtʀ̥] [tabl] [ekʀiʀ] [limite]

Check your understanding

Coming soon!

4.9 Types of phonological rules


Phonation assimilation

There are many types of rules that languages may have. Perhaps the most common general type of phonological rule we find is assimilation, when a phoneme changes to an allophone that matches some aspect of its environment. That is, one or more of the properties in the rule’s change are also present somewhere in the rule’s environment. We see this with French devoicing, where the sonorants become voiceless in an environment that also involves voicelessness.

Phonation assimilation can also cause voicing rather than devoicing, as in Wemba Wemba (an extinct Kulinic language of the Pama–Nyungan family, formerly spoken in Australia), in which voiceless plosives are voiced after nasal stops, as in the following data (adapted from Hercus 1986).

/panpar/ \rightarrow [panbar] ‘shovel’

/jantaŋ/ \rightarrow [jandaŋ] ‘I’

/taɳʈa/ \rightarrow [taɳɖa] ‘touch’

We can write the relevant rule as follows:

In both the French and Wemba Wemba assimilation rules, the crucial part of the environment containing the assimilating property is on the left, but phonation assimilation can also depend on the right side of the environment, as in Polish (a West Slavic language of the Indo-European family, spoken in Poland). In Polish, voiced obstruents become voiceless if followed by a voiceless obstruent (data adapted from Stanisławski 1978 and Rubach 1996).

/dxu/ \rightarrow [txu] ‘of breath’

/rɪbka/ \rightarrow [rɪpka] ‘little fish’

/vçi/ \rightarrow [fçi] ‘of village’

/vɪkaz pism/ → [vɪkas pism] ‘list of journals’

The relevant phonological rule can be written as follows:

Place assimilation

Phonation is not the only phonetic property that can assimilate. In Persian (a Southwestern Iranian language of the Indo-European family, spoken in Iran and surrounding areas), we see assimilation of place, with alveolar stops becoming postalveolar before a postalveolar (data adapted from Bijankhan 2018).

/ʔætʃɒn/ \rightarrow [ʔæṯʃɒn] ‘parched’

/χædʃe/ \rightarrow [χæḏʃe] ‘flaw’

/ʔenʃɒ/ \rightarrow [ʔeṉʃɒ] ‘essay’

The relevant phonological rule can be written as follows:

Nasality assimilation

Nasality is also another common property that assimilates, as in Ka’apor (a.k.a. Urubú-Kaapor, a Wayampí language of the Tupian family, spoken in Brazil). In Ka’apor, vowels are nasalized after a nasal stop (data adapted from Kakumasu 1986).

/uruma/ \rightarrow [urumã] ‘duck’

/tamui/ \rightarrow [tamũi] ‘old man’

/mɨra/ \rightarrow [mɨ̃ra] ‘wood’

/nino/ \rightarrow [nĩnõ] ‘lie down’

/niʃoi/ \rightarrow [nĩʃoi] ‘none’

/ne/ \rightarrow [nẽ] ‘you (sing.)’

The relevant phonological rule can be written as follows:

Other kinds of rules

Most any phonetic property can assimilate, and there are also many rules that do not involve assimilation at all.

[add in a few more examples]

Using common rules types

Knowing what kinds of phonological rules we are likely to find helps narrow down our options when trying to determine what phones are allophones of the same or different phonemes. For example, for the French sonorants, we see that there are natural pairs of voiced and voiceless sonorants, so it would be reasonable to see if the distribution of these match what we know about rules that affect voicing, such as assimilation.

By taking advantage of our knowledge of common types of rules, this allows us to avoid focusing on likely irrelevant factors. For example, for French, we would know not to worry too much about vowel rounding or place of articulation, since these are not normally triggers for changing phonation. We can also begin looking for patterns based on common rules without even knowing which phones of interest we should examine: maybe there is a pattern in vowel nasality based on the presence or absence of an adjacent nasal stop (indicating assimilation of nasality). Of course, the language we are analyzing won’t have all of these rules, but it might have one, so we can get a head start on analyzing its phonology.

This means that phonemic analysis and rule discovery go hand in hand. Sometimes, we may use known phonological rules to help uncover distributional patterns in phones, and other times, we may find the distributional patterns first, leading us to posit a phonological rule. Working on a language from both directions can be much more productive than trying to do phonemic analysis directly. This is a method that permeates all of linguistics, not just phonology. Every language we analyze tells us something about how language itself works, and that broader knowledge of how language works helps us to analyze the next language.

Check your understanding

Coming soon!


Bijankhan, Mahmood. 2018. Phonology. In The Oxford handbook of Persian linguistics, ed. Anousha Sedighi and Pouneh Shabani-Jadidi, Oxford Handbooks in Linguistics, 111–141. Oxford: Oxford University Press.

Hercus, Luise A. 1986. Victorian languages: A late survey. No 77 in Pacific Linguistics Series B. Canberra: The Australian National University.

Kakumasu, James. 1986. Urubu-kaapor. In Handbook of Amazonian languages, ed. Desmond C. Derbyshire and Geoffrey K. Pullum, vol. 1, 326–404. Berlin: Mouton de Gruyter.

Rubach, Jerzy. 1996. Nonsyllabic analysis of voice assimilation in Polish. Linguistic Inquiry 21(1): 69–110.

Stanisławski, Jan. 1978. Wielki słownik polsko-angielski. Warsaw: Wiedza Powszechna.

4.10 Signed language phonology


Modality differences in phonological rules

Finally, we might wonder whether signed languages have phonological rules, since they have fundamentally different modalities. In some sense, signed languages do have phonological rules, but not quite in the same way that spoken languages do. Phonological rules in spoken languages are typically fully productive, which means they apply to every word that satisfies the target and environment. So French devoicing of sonorants applies to absolutely every single sonorant that is in the correct environment; there are no exceptions.

But signed languages do not seem to have these kinds of productive phonological rules. There are many theories why this might be the case. Perhaps it is because signed languages do not have levels of analysis equivalent to phonemes or phones. There are parameters (handshape, movement, etc.) and syllable-like structures, but maybe there is nothing in between. A combination of phonetic properties in signed languages yields a syllable or even an entire sign, while in a spoken language, a combination of phonetic properties just yields a phone. This may mean that there simply is not the right kind of phonological unit to be targeted by phonological rules in signed languages the way we have seen for spoken languages.

It should not be a surprise that something so intimately connected to modality might differ between spoken and signed languages, and this is a good reminder that we cannot simply import spoken language linguistic analysis into signed languages. They need to be analyzed in their own right.

Weak hand freeze

Regardless, there are phonological processes in signed languages that are somewhat rule-like, they just only affect individual signs rather than all signs that match the requirements for the rule. Two-handed signs in particular are often subject to phonological processes because they involve so much articulatory complexity and effort. If both hands are moving, the nondominant hand may undergo weak hand freeze, which causes it not to move. We can see this in the ASL sign SENTENCE, which has two two-handed forms, one in which the nondominant hand moves and one in which the nondominant hand has been frozen.

SENTENCE (two-handed movement)

SENTENCE (with weak hand freeze)

Weak hand drop

Another phonological process that can affect two-handed signs is weak hand drop, in which an immobile nondominant hand is simply not used at all. We can see this in the ASL sign CHOOSE, which has two forms, a two-handed version with an immobile nondominant hand and a one-handed version in which the nondominant hand has been dropped.

CHOOSE (two-handed)

CHOOSE (with weak hand drop)


Signs may also undergo lowering, in which the sign is articulated at a lower location to reduce the effort of moving the hands all the way to the original higher position. Lowering can be seen in the ASL sign KNOW, which is sometimes articulated at the forehead but can instead be lowered to a location under the eyes.

KNOW (at forehead)

KNOW (with lowering)

Distalization and proximalization

Signs can also shift which joints are used. If the joints shift down the arm towards the fingers, the sign has undergone distalization, while if they shift up the arm towards the shoulder, the sign has undergone proximalization. Without knowing the original version of a sign, it may be difficult to tell whether two variants represent distalization or proximalization. The ASL sign CHAT has two variants, one that is more proximal (with elbow and a bit of shoulder movement) and one that is more distal (with radioulnar movement).

CHAT (proximal)

CHAT (distal)

Check your understanding

Coming soon!

Chapter 5: Morphology


In this chapter, we look at words and at the meaningful pieces that combine to create words. We will see that languages vary in how words are built, but that nonetheless we can find structure inside of words in all languages. In linguistics, the study of word forms is known as morphology.

When you’ve completed this chapter, you’ll be able to:

  • Identify morphologically complex words, and the morphemes within them
  • Distinguish between inflectional morphology, derivational morphology, and compounds
  • Explain how a word’s morphology interacts with its lexical category
  • Analyze the structure of complex words

5.1 What is morphology?


In linguistics, morphology is the study of how words are put together. For example, the word cats is put together from two parts: cat, which refers to a particular type of furry four-legged animal (🐈), and -s, which indicates that there’s more than one such animal (🐈 🐈‍⬛ 🐈).

Most words in English have only one or two pieces in them, but some technical words can have many more, like anti-internationalization, which has at least six (anti-inter-nation-al-iz(e)-ation). In many languages, however, words are very often made up of many parts, and a single word can express a meaning that would require a whole sentence in English.

For example, in the Harvaqtuurmiutut variety of Inuktitut, the word iglujjualiulauqtuq has 5 pieces, and expresses a meaning that could be translated by the full English sentence “They (sg) made a big house.” (iglu = house, jjua = big, liu = make, lauq = distant past, tuq = declarative; example from Compton and Pittman 2010).

Not all combinations of pieces are possible, however. To go back to the simple example of cat and -s, in English we can’t put those two pieces in the opposite order and still get the same meeting—scat is a word in English, but it doesn’t mean “more than one cat”, and it doesn’t have the pieces cat and -s in it, instead it’s an entirely different word.

One of the things we know when we know a language is how to create new words out of existing pieces, and how to understand words that other people use. We also know what combinations of pieces are not possible. In this chapter we’ll learn about the different ways that human languages can build words, as well as about the structure that can be found inside words.

What is a word?

If morphology is the investigation of how words are put together, we first need a working definition of what a word is.

In everyday life, in English we might think of a word as something that’s written with spaces on either side. This is an orthographic (=spelling-based) definition of what a word is. But just as writing isn’t necessarily a reliable guide to a language’s phonetics or phonology, it doesn’t always identify words in the sense that is relevant for linguistics. And not all languages are written with spaces in the way English is—not all languages have a standard written form at all. So we need a definition of “word” that doesn’t rely on writing.

This is actually a hotly debated topic! Linguists might distinguish phonological words (words for the purposes of sound patterns), morphological words (words for the purposes of morphology), and syntactic words (words for the purposes of sentence structure), and might sometimes disagree about the boundaries between some of these.

In this textbook, though, we don’t need to worry about possible differences between these types of words, and for the purposes of linguistic investigation of grammar we can say that a word is the smallest separable unit in language.

What this means is that a word is the smallest unit that can stand on its own in an utterance. For example, content words in English (nouns, verbs, adjectives, and adverbs) can stand by themselves as one-word utterances when you’re answering a question:

(1) a. What do you like to eat?
Answer: cake (noun)
b. What did you do last night?
Answer: sleep (verb)
c. What colour is the sky today?
Answer: orange (adjective)
d. How did you wake up this morning?
Answer: slowly (adverb)

Words are also syntactically independent: they can appear in different positions in a sentence, changing their order with respect to other elements even while the order of elements inside each word stays the same.

Though morphology is concerned with the shape of words, words aren’t the smallest unit of language. As we already saw earlier in this chapter, words themselves can have smaller pieces inside them, as in the simple cases of cats (cats) or international (international)—but these smaller pieces can’t stand on their own.

To refer to these smaller pieces within words, we use the technical term morpheme. A morpheme is the smallest possible pairing of both form (sign or sound) on the one hand, and meaning or grammatical function on the other. (We say “grammatical function” because while some morphemes have clear meanings, of the type that will be discussed in Chapter 7 in the context of lexical semantics, other morphemes express more abstract grammatical information.)

Words that contain more than one morpheme are morphologically complex. Words with only a single morpheme are morphologically simple.

The word morphology is itself morphologically complex, being made up of morph- “shape” and -ology “study of”. So morphology is the study of shapes, in linguistics of word shapes.

In biology, “morphology” is the study of the shape of animals and other organisms, and if you do an internet search for “morphology”, this is often the first meaning that comes up.

Our goal in morphology is to understand how words can be built out of morphemes in a given language. In the this chapter we will first look at the shapes of different morphemes (and morphological processes); in later sections we will review different functions that morphology can have, looking at divisions between derivational morphology, inflectional morphology, and compounding.


Compton, Richard, and Christine Pittman. 2010. Word-formation by phase in Inuit. Lingua 120:2167–2192

5.2 Roots, bases, and affixes


Affixes vs roots

In this section we look at the possible shapes that morphemes themselves can take. Some morphemes are affixes: they have to attach to something. The morphemes -s and inter- and -al are all affixes. You can’t say them on their own, they have to attach to something else. We write affixes with a hyphen on one side or the other to indicate this need for attachment.

The thing an affix attaches to is called a base. Some bases are morphologically simple, while others are morphologically complex.

For example, consider the word librarian. This word is formed by attaching the affix -ian to the base library.

Librarian can then itself be the base for another affix: for example, the word librarianship, the state or role of being a librarian, is formed by attaching the affix -ship to the base librarian.

There is a special name for simple bases: root. A root is the smallest possible base, which cannot be divided, what we might think of as the core of a word. Roots in English we’ve seen so far in this chapter include cat, library, and nation.

If you look at the history of the words library and nation, they both trace back to Latin (by way of French), and in Latin the relevant words were morphologically complex: library traces back to the Latin root libr- (meaning “book”), and nation traces back to the Latin root nat- (meaning “be born”). When a child first encounters a word like library or nation, however, the word doesn’t come annotated with this historical information! In the minds of most contemporary English speakers, it is likely that library and nation are treated as simple roots; in Chapter 13, you’ll learn about how this kind of hypothesis could be tested experimentally.

Turning back to affixes, an affix is any morpheme that needs to attach to a base. We use the term “affix” when we want to refer to all of these together, but we usually specify what type of affix we’re talking about when possible.

Types of affixes

an affix that attaches before its base, like inter- in international.
an affix that follows its base, like -s in cats.
an affix that attaches around its base.
an affix that attaches inside its base.
Simultaneous Affix
an affix that takes place at the same time as its base.


An example of a circumfix can be found in the marking of plural possessors in many Algonquian languages. The following examples are from Meskwaki, spoken in parts of the Midwest of the US and in Northern Mexico; the source of these examples is Oxford (2020), who adapted them from an in-preparation grammar by Amy Dahlstrom (A grammar of Meskwaki, an Algonquian language). These examples are presented in Meskwaki orthographi; “a·” indicates a long vowel.

(2) a. ne-ta·nes-aki
“my daughters”
(2) b. ne-ta·nes-ena·n-aki
out daughters”

What you can see here is that the singular possessor in “my daughters” is marked only by a prefix, but the plural possessor in “our daughters” is marked by the combination of the prefix ni- and the suffix -ena·n—or, in other words, by a circumfix.

These examples have morpheme-by-morpheme glosses, which means that the morphological analysis has been done for you; in Section 5.X we’ll discuss how we figure out the boundaries between morphemes in a language we aren’t already familiar with.Morpheme-by-morpheme glosses use standard abbreviations:

  • 1 stands for “first person” (I, me, my / we, us, our)
  • PL stands for “plural” (so 1PL means “we, us, our”)
  • AN stands for “animate”. Algonquian languages distinguish all nouns as “animate” or “inanimate”, and this is reflected in its morphology.


Infixes are affixes that appear in the middle of another morpheme. For example, in Tagalog (a language with about 24 million speakers, most of them in the Philippines) the infix -um- appears immediately after the first consonant the root to which it attaches, to form the perfective form of a verb (used to indicate completed action, usually translated with the English simple past):

(3) a. [takbuh] run [tumakbuh] ran
b. [lakad] walk [lumakad] walked
c. [bili] buy [bumili] bought
d. [kain] eat [kumain] ate

Simultaneous affix

Simultaneous affixes are common in signed languages and in languages with tone. When signing, it’s possible to do things with multiple articulators (a second hand, or your face), or to add motion on top of a sign, in a way that is not possible with oral articulations in spoken languages.

For example, in ASL there is a morpheme that attaches to verbs to express durative aspect (the meaning that something happens for a while, or for a long time). This morpheme involves adding a particular circular motion to the base sign for the verb; this circular motion doesn’t happen before or after the verb, but simultaneously with it.

(4) [VIDEO: ASL verb, verb+durative]

Once we see these examples in signed languages, we can think of morphology in some spoken languages that has a similar profile. For example, languages with tone sometimes have tonal morpheme, that are overlaid on the consonants and vowels of a word.

English isn’t a tonal language, but we have some pairs of words that clearly involve the same root, but where the stress has shifted. These are noun-verb pairs where the noun has stress on the first syllable, but the verb has stress on the second syllable.

(5) a. They used to use cords to recórd music.
b. I have a pérmit that permíts me to drive.
c. I receive mail at my home áddress, at least when it’s addréssed properly.

Not all English speakers have stress shift in the same pairs of words—many people pronounce address with stress on the second syllable in both the noun and the verb, for example.

Free vs bound morphemes

Another way to divide morphemes is by whether they are free or bound. A free morpheme is one that can occur as a word on its own. A bound morpheme, by contrast, can only occur in words if it’s accompanied by one or more other morphemes.

Because affixes by definition need to attach to a base, only roots can be free. Indeed, most roots in English are free, but we do have a few roots that can’t occur on their own. For example, the root -whelmed, which occurs in overwhelm and underwhelmed, can’t occur on its own as *whelmed.

By contrast, in many other languages all (or most) roots are bound, because they always have to occur with at least some inflectional morphology. This is the case for verbs in French and the other Romance languages, for example; it was also the case for Latin, which is why the roots nat- and libr- were shown with hyphens above.

In our notation, we show that morphemes are bound by putting hyphens either before or after them (on the side that they attach to other morphemes).


Oxford, William R. 2020. Algonquian. In Routledge handbook of North American languages, ed. Daniel Siddiqi , Michael Barrie, Carrie Gillon, and Éric Mathieu. Routledge.

5.3 Morphology beyond affixes


There are some morphological patterns that (arguably) don’t involve affixation at all: internal change, suppletion, and reduplication.

Internal change

Internal change is one name for the type of change found in many irregular English noun plurals and verb past tenses.

For example, the plural of mouse is mice; the plural of goose is geese. The past tense of sit is sat, and the past tense of write is wrote

These are all relics of what used to be a regular pattern in English. By regular we mean that they were phonologically predictable based on the general pattern of the language, and automatically applied to new words. Now they have to be memorized, and are therefore irregular. There are still productive alternations like these in other Germanic languages, like German.


Suppletion is an even more irregular pattern, where a particular morphological form involves entirely replacing the base. For example, the past tense of the verb go is went—there is no amount of affixation or internal change that will get you from one to the other! This type of total replacement is also found in English in the comparatives and superlatives of good ~ better ~ best and bad ~ worse ~ worst, throughout the paradigm of the verb to be, and on some pronouns.

If a language has suppletion (not all languages do!) it is commonly found on some of the most frequent words in the language, just as we see in English. The reason for this is that children acquiring a language tend to assume patterns are regular and predictable until the weight of the evidence convinces them otherwise—and they’re more likely to get enough evidence to reach the conclusion that something is suppletive if a word is incredibly common. The relevance of frequency for certain types of patterns in language is something we’ll see again in Chapter 11 on Child Language Acquisition and Chapter 13 on Psycholinguistics and Neurolinguistics.

Suppletion is a type of allomorphy, which we will see more about in the next section.


Finally, reduplication involves repeating part or all of a word as part of a morphological pattern. In Halq’eméylem, a Salish language spoken in British Columbia, one pattern of reduplicating a verb produces an adjective meaning that something or someone is likely or disposed to do the action (Shaw 2008). The links below go to FirstVoices, an online platform for community-based language resources.

(1) a. kw’élh ‘to capsize’ [kʼʷə́ɬ] [kʼʷə́ɬkʼʷəɬ] ‘likely to capsize’
b. qwà:l ‘to speak’ [qʷél] [qʷélqʷel] ‘talkative’

This is not the only pattern of reduplication in Halq’eméylem; languages in the Salish family have many patterns of reduplication, that result in several patterns of meaning and grammatical function.

English does have one pattern of reduplication, but it can apply to phrases as well as words. This type of reduplication carries the meaning of something being a prototypical example of the type; it is often called salad-salad reduplication (“Tuna salad is a salad, but it’s not a salad-salad.”—in other words, tuna salad isn’t a prototypical salad because it doesn’t involve lettuce or other greens).

Morphological typology

Looking at different languages, we can divide them typologically into different morphological types.

At one end we have what are called isolating or analytic languages. No human language is perfectly* isolating—this would be a language where all words are morphologically simple. Chinese languages like Mandarin and Cantonese are highly isolating, because in these languages inflectional information is typically expressed by small function words (“particles”) rather than by affixes. However there are nonetheless many compound words in the language—compounds are words built out of more than one root, discussed more below in Section 5.X.

English is less isolating than Mandarin, but still very analytic.

The opposite of analytic is synthetic. Synthetic languages have a lot of morphological complexity in words, and are often characterized by having no (or very few) free roots.

Languages that are more synthetic fall into different types. The main division is between agglutinative and fusional languages. In highly agglutinative languages, words are built from many easily separated affixes, each of which is associated with a consistent piece of meaning. Japanese is a somewhat agglutinative language, as in the following example where the verb has a string of suffixes corresponding to the English passive (“was verb-ed”) and causative (“made X verb”).

(2) Watasi-wa natto-o tabe-sase-rare-ta
“I was made to eat natto.”

By contrast, a fusional language is one where many inflectional meanings are combined into single affixes. The Romance languages are a good example of fusional languages: the suffix on a verb expresses tense, aspect, and subject agreement, and is difficult to break down into smaller affixes. For example, in Spanish, the suffix -iéramos expresses subject agreement (first person plural), tense (past), aspect (imperfective), and mood (subjunctive).

Shaw, P. (2008). Inside Access: The Prosodic Role of Internal Morphological Constituency. In The Nature of the Word: Studies in Honor of Paul Kiparsky. ed. Kristin Hanson and Sharon Inkelas. The MIT Press.

5.4 Allomorphy


Some morphemes have a consistent meaning, but change slightly in their form depending on where they occur. In English, for example, the indefinite article a shows up as an when it occurs before a vowel (a book vs. an apple). This is an example of allomorphy based on the phonology (sounds) of the morpheme’s environment.

Another example of allomorphy can be found in the plural in English. First, consider the pairs of singular and plural nouns in (1).

(1) singular plural
a. [s] book books
cat cats
nap naps
b. [z] paper papers
dog dogs
meal meals
c. [​​ɪz] or [əz] niece niece
horse horses
eyelash eyelashes

The plural in all these words is spelled as “s” (or “es”), but it isn’t always pronounced the same way. If you pay attention, the plural adds the sound [s] in (6a), the sound [z] in (6b), and the sound [​​ɪz] or [əz] in (6c). This is predictable, based on the last sound in the noun root. See if you can come up with a generalization about when you see each of the allomorphs in (6). (The answer appears at the end of this section.)

Now look at the singular-plural pairs in (7). These examples show more allomorphs of the plural in English, but they are not predictable: the allomorph of the plural used with these roots has to be remembered as a list.

One way of describing nouns that have no change in the plural is to say that they take an empty affix or zero affix. We use the symbol ∅ (the symbol for an empty set in mathematics) to indicate a morpheme that has no overt form.
(2) singular plural
a. -(r)en child children
ox oxen
nap naps
b. internal change mouse mice
goose geese
woman women
c. no change (-∅) fish fish
sheep sheep
deer deer

When a morpheme can be realized in more than one way, we refer to its different forms as allomorphs of the morpheme.

(3) lists all the allomorphs of the English plural seen in this section.

(3) Plural: -s, -z, -ɪz, -(r)en, internal change, -∅
There are more allomorphs of the plural in English than we’ve seen here.  Can you think of any others? For any other language that you know, are there allomorphs of the plural in that language? What about other affixes, in English or in other languages, can you think of further examples of allomorphy?


5.5 Lexical categories


Derivation vs inflection and lexical categories

Morphology is traditionally divided into two types:

  1. Derivational morphology: Morphology that changes the meaning or category of its base
  2. Inflectional morphology: Morphology that expresses grammatical information appropriate to a word’s category

We can also distinguish compounding, which is a type of morphology that combines multiple roots into a single word.

The definitions of derivation and inflection above both refer to to the category of the base to which morphology attaches. What does this mean? The category of a word is often referred to in traditional grammar as its part of speech. In the context of morphology we are often interested in the lexical categories, which is to say nouns, verbsadjectives, and adverbs. In the rest of this section we give an overview of what lexical categories are, and how we can identify them.

Lexical Categories, aka “Parts of Speech”

Determining the category of a word (its “part of speech”) is an important part of morphological and syntactic analysis. A category of words or morphemes is a group that behave the same way as one another, for grammatical purposes.

You may be familiar with traditional semantic (based on meaning) definitions for the parts of speech. If you ever learned that a noun is a “person, place or thing”, or that a verb is an “action word”, these are semantic tests. However, semantic tests don’t always identify the categories that are relevant for linguistic analysis. They can also be hard to apply in borderline cases, and sometimes yield inconsistent results; for example, surely action and event are “action” words, so according to the semantic definition we might think they’re verbs, but in fact these are both nouns.

In linguistics we’re interested in making generalizations about where different categories of words or morphemes can occur, and how they can combine with each other. We therefore define categories based on the grammatical contexts in which words or morphemes are used—their distribution. The distribution of different categories varies from language to language; the remainder of this section reviews some of the main distributional tests for lexical categories (nouns, verbs, adjectives, and adverbs) in English (distributional tests for non-lexical categories will appear in Chapter 6 on Syntax).

If you know any other language, think about whether any of these tests can be adapted to identify lexical categories in that language, or if there are other morphological or syntactic cues that distinguish lexical categories in that language.

Nouns (N)

Verbs (V)

Adjectives (Adj)

Adverbs (Adv)

Using derivational affixes to identify category

In addition to the morphological tests above, you can also use derivational affixes to help determine the category of a word. For example:

The property of derivational affixes to not only create particular categories, but also to attach to specific categories, is called selection. We discuss selection by derivational morphology further in Section 5.6.

5.6 Derivational morphology


Derivational morphology and selection

Derivational morphemes are typically choosy about the types of bases they combine with—another word for choosy is selective, and so we talk about how derivational affixes select the category of their base.

For example, the suffix -able combines only with verbs, and always creates adjectives meaning “able to be verb-ed”: readable, writeable, playable, employable, and googleable are all possible adjectives of English, even if they don’t appear in a dictionary—the other examples probably do show up in most dictionaries, but googleable might not, because google a relatively recent verb (adapted from the name of the company). But as an English speaker I don’t need to check the dictionary to find out if something is a possible English word—if I’m talking to someone and I say that something is “googleable”, I can be confident that I’ll be understood even if the person I’m speaking to has never heard that verb before!

Here is a very incomplete sample of derivational affixes in English, with the category they select on the left side of the arrow, and the category they create on the right side.

(1) -tion V N
-able V Adj
-en V Adj
-ed V Adj
-ing V Adj or N
-ment V N
-ness Adj N
-ity Adj N
-ous N Adj
-hood N N
-ize N V
-ly Adj Adj
-ish Adj Adj

There are many more than this! You’ll see them inside many words if you start paying attention.

Prefixes in English never change the category of the base they attach to, but they express clear meanings, like negation, repetition, order (e.g. pre- and post-), etc.

(2) non- N N non-issue
  Adj Adj non-distinct
 un- V V undo
  Adj Adj unhappy
 re- V V redo

Derivational morphology can also be even more selective, requiring not only a base that has a certain category, but only attaching to specific roots or bases. A lot of derivational morphology in English was acquired from borrowing words from French and Latin; these “latinate” affixes often prefer to combine with each other, and sometimes only with roots that are also latinate. Such affixes are less productive than other affixes, which combine freely with most bases.

Some of the most productive derivational suffixes in English are -ish, which can attach to most adjectives, -ness, -able, and -ing.

-ing is particularly productive: it can attach to all verbs in English to form adjectives (traditionally called “participles”) or nouns (traditionally called “gerunds”). It is very unusual for a derivational affix to be that productive; usually there are at least a few roots that don’t occur with a derivational affix, for whatever reason.

Order of Affixation

Because derivational affixes care about the category of the base they attach to, and they can result in a change to a new category for the whole word, the order in which they are added to a word can matter!

Prefixes, suffixes, and circumfixes always attach to the outer edge of their base. That means that if a word has only suffixes, or only prefixes, there is only one order those affixes could have attached in; it will never be the case that the suffix that was added last appears closer to the root than earlier affixes.

Consider the word foolishly. This has the root fool (a noun), the suffix -ish (which attaches to nouns to form adjectives), and the suffix -ly (which attaches to adjectives to form adverbs). The only way to build this word is to first attach -ish to the root fool, and then attach -ly to the new base foolish. This structure is illustrated in Figure 5.1.

Tree diagram: foolish-ness [Adv [Adj fool(N)-ish] -ly]
Figure 5.1 Tree diagram for foolishness

But if a word has both prefixes and suffixes, then it’s slightly more work to figure out what order they attached in. Sometimes the selectional properties of the affixes means that there is only one option. Consider the word unkindness. Here we have one prefix and one suffix. So in principle there are two orders in which we could build the word:

In both these hypothetical derivations the intermediate base—unkind in Option 1 and kindness in Option 2—is a possible word of English, so from that perspective both derivations seem equally plausible.

But only one of these options matches the selectional properties of the affixes involved.

This means that it can only be the order in 1, where un- attaches before -ness, while its potential base is still an adjective, that is the correct one.

Tree diagram: unkind-ness [N [Adj un + kind(Adj)] -ness]
Figure 5.2 Tree diagram for unkindness

5.7 Inflection


So far we’ve focused on derivational morphology. The next kind of morphology we’ll discuss is inflectional morphology.

Unlike derivational morphology, inflectional morphology never changes the category of its base. Instead it simply suits the category of its base, expressing grammatical information that’s required in a particular language.

In English we have a very limited system of inflectional morphology:

That’s all of it! But if we look at other languages, we find more types of inflectional morphology.

One thing about inflectional morphology is that lots of it can be expressed syntactically instead of morphologically. So some languages have tense, but express it with a particle (a separate word) rather than with an affix on the verb. This is still tense, but it’s not part of inflectional morphology.

The rest of this section gives a general survey of types of inflectional distinctions commonly made in the world’s languages, but there are many types of inflection that aren’t mentioned here.


Most languages, if they have grammatical number, just distinguish singular and plural, but number systems can be more complex as well.

For example, many languages have dual in addition to singular and plural. Dual number is used for groups of exactly two things; we have a tiny bit of dual in English with determiners like both, which means strictly two. You have to replace both with all if the group has three or more things in it.

An example of a language that distinguishes dual is Inuktitut, one of the dialects spoken by the Inuit people who live in the Arctic region. There is a good deal of dialect variation across the Inuit languages; Inuktitut is the variety that is the official language of the territory of Nunavut, and has about 40,000 speakers.

(1) gloss singular dual (2) plural (3+)
“door” matu matuuk matuit
“cloud” nuvuja nuvujaak nuvujait
“computer” garasaujaq garasaujaak garasaujait

The three-way distinction between singular, dual, and plural in Inuktitut applies not only to nouns but also to verbs that agree with their noun subjects:

(2) first person singular nirijunga “I eat”
dual nirijuguk “the two of us eat”
plural nirijugut “we (three or more) eat”
second person singular nirijutit “you (one of you) eat”
dual nirijusik “you two eat”
plural nirijusi “you (three or more) eat”
third person singular nirijuq “they (sg) eat”
dual nirijuuk “the two of them eat”
plural nirijut “they (three or more) eat”

A small number of languages go further and also have a trial (pronounced “try-ull”), usually only on pronouns. This is used for groups of exactly three.

A language can also have paucal number, used for small groups.


Person distinctions are those between first person (I, we), second person (you), and third person (heshe, itthey).

Some languages make a distinction in the first person plural between a first person inclusive (me + you, and maybe some other people) and a first person exclusive (me + one or more other people, not you). Anishnaabemowin (Ojibwe), which has about 20,000 speakers, makes this kind of distinction. The pronoun niinawind refers to the speaker plus other people but not the person being addressed (that is, “we excluding you”). This is known as the exclusive we. The pronoun for inclusive we (“we including you”) is giinawind. The distinction between inclusive and exclusive we is sometimes referred to as clusivity.

In Odawa and Algonquin varieties of Anishinaabemowin, spoken near Lake Huron and in Eastern Ontario and Quebec, these pronouns are niinwi and giinwi, respectively, but make the same contrast in meaning. Cree, which belongs to the same language family as Ojibwe (the Algonquian family), also makes an inclusive/exclusive distinction in the first-person plural. The inclusive form is niyanân and the exclusive form is kiyânaw. (Ojibwe examples from Valentine 2001.)

[SOURCE for Cree?]


Case refers to marking on nouns that reflects their grammatical role in the sentence. Most case systems have ways to distinguish the subject from the object of a sentence, as well as special marking for possessors and indirect objects.

Some languages have many more case distinctions than this; usually many of the case forms express meanings that in languages like English we express using prepositions. Estonian and Finnish are known for having especially many cases (14 in Estonian and 15 in Finnish): the Wikipedia article on Finnish cases is a good source if you’d like to learn more.


Agreement refers to any inflectional morphology that reflects the properties of a different word in a sentence, usually a noun.

The most common type of agreement is verbs agreeing with their subject, though verbs in some languages might also agree with their object (or might sometimes agree with their object instead of their subject). Verbs usually agree with nouns for their number and person.

Determiners, numerals, and adjectives often agree with the noun they modify, usually for number, case, and gender (assuming a language has some or all of these types of inflection in the first place!).

Tense and Aspect

Tense refers to the contrast between present and past (or sometimes between future and non-future) and is typically marked on verbs.

Aspect is a bit harder to define, but is usually characterized as the perspective we take on an event: do we describe it as complete, or as ongoing? In English we have progressive (marked with be + –ing) and perfect aspect (have + –ed/-en).

French has a slightly different contrast in the past tense between the imparfait and the passé composé—these both locate things in the past, but the imparfait describes them as habitual or ongoing (imperfective aspect), while the other describes them as complete (perfective aspect).

The Mandarin particle le (了) also expresses perfective aspect, describing an event as complete, and zài (在) expresses progressive aspect, describing an event as in progress. But these are not examples of inflectional morphology, because these particles (=small words) are separate from the verb and do not act as affixes.

Terminology for aspectual distinctions can be confusing. In particular, the English perfect is not quite the same as the French or Mandarin perfective—though just as their names overlap, some of their uses are also similar.


In English we have derivational negative morphology (as in the prefixes in- or non-), which negates the meaning of a base or root.

Inflectional negation, by contrast, makes a whole sentence negative. In English we express inflectional negation syntactically, with either the word not (or its contracted clitic form n’t) In other languages, however, negation can be expressed by inflectional affixes.

Other inflectional distinctions

What other types of distinctions can be marked in the verbal inflection of a language? Here we review a non-exhaustive set of inflectional distinctions made in some of the languages of the world.

OBVIATION: Algonquian languages, including Cree and Anishinaabemowin, make a distinction between proximate and obviative third person. You might think of this distinction as something similar to the near/far distinction between this and that in English, where this is used for something that is closer to the speaker and that is for something farther away. But, like in English, the proximate/obviative distinction is not just about physical distance; it can also allude to distance in time, or within a conversation, to someone that is the topic of discussion (proximate) versus someone that is a secondary character (obviative). The distinction is marked on the verbal morphology, as illustrated below with examples from Cree:

(3) proximate obviative
a. Regina wîkiwak. Regina wîkiyiwa.
“They live in Regina.” “Their friend/someone else lives in Regina.”
a. kiskinwahamâkosiwak. kiskinwahamâkosiyiwa.
“They are in school.” “Their friend/someone else is in school.”


CAUSATIVES: A causative is a construction that expresses that an event was caused by an outside actor. In English we have a few constructions that express causativity, using verbs like make, have, and get:

(4) a. English causative with make:
The tree fell. I made the tree fall.
b. English causative with have:
The actors exited stage right. The director had the actors exit stage right.
c. English causative with get:
The teacher cancelled the exam. The students got the teacher to cancel the exam.

When a language has a morphological causative, it expresses these types of meanings by adding a morpheme onto the main verb. For example, in Kinande, a Bantu language spoken in the Democratic Republic of the Congo, the verb erisóma means “to read”, but erisómesya means “to make (someone) read”.

This is a type of morphology that changes the argument structure of a verb—the pattern of arguments (subjects, objects, indirect objects) that it combines with. Other types of argument changing morphology are applicative or benefactive (to do something to or for someone) and passive. We discuss the syntax of argument changing in Section 6.11

EVIDENTIALS: Many languages use morphology to indicate a speaker’s certainty about what they’re saying, or the source of their evidence for what they say. This is called evidential marking.

For example, in Turkish there is a distinction between the “direct past” -di, used to mark things you are certain of or that you directly witnessed, and the “indirect past” -miš, used to mark things you have only indirect evidence for.

(5) a. gel-di
b. gel-miš
“came, evidently”

In English we don’t have any grammatical marking of evidentiality. We can still express our evidence or certainty, but we do this with the lexical meanings of nouns, verbs, adjectives, and adverbs. For example, “I saw that…” would express that the source of your evidence is something you saw; “Apparently” would express that you aren’t 100% certain, etc.

MODALITY: Many languages express the possibility or necessity of something happening via morphology on the main verb. This is called modality. Examples of this include categories like the conditionelle or the future in French.

GENDER: In English we mark gender on third person pronouns, and we also have some words that have derivational gender suffixes (like –ess in actor vs actress).

By contrast, gender in a language like French is best treated as inflectional: not only do all nouns have a semantically arbitrary gender, determiners and adjectives (and sometimes verbs) show agreement with the grammatical gender of the noun they’re associated with to. For example, the noun chat “cat” in French is masculine (abbreviated M), and so it appears with a masculine determiner and adjective; the noun abeille “bee” is feminine (abbreviated F), so it appears with a feminine determiner and adjective. This is independent of the actual sex of a cat or bee.

(6) a. le petit chat
the.M small.M cat(M)
“the small cat”
b. la petite abeille
the.F small.F bee(F)
“the small bee”

Many European languages have this type of gender system, which divides nouns into masculine, feminine, and sometimes neuter. It’s also found elsewhere in the world: for example, Kanien’kéha (Mohawk), spoken by about 3,500 people in Ontario, Quebec, and New York, has a gender system that includes masculine, feminine/indefinite, and feminine/neuter.

Other languages of the world have different noun class or noun classification systems, which also divide nouns into somewhat arbitrary classes, but categories that don’t match the gender categories used for humans.

For example, the languages in the Bantu family of languages (a subgroup of the Niger-Congo language family spoken across the southern half of Africa, and which includes Kinande, Zulu, and Swahili, among many others) put all humans into one class, but have somewhere between 4 and 10 classes in total, which (just like gender in French) can be reflected by agreement on other words in a sentence.

Algonquian languages, including Cree and Anishinaabemowin, divide nouns into animate and inanimate. Animate nouns are usually those that are alive, whether animals or plants, or spiritually important things like asemaa (tobacco). Inanimate nouns usually refer to physical objects that aren’t alive. Sometimes the same noun can be animate or inanimate with slightly different meanings: for example mitig means “tree” when it’s animate but “stick” when it’s inanimate. There are other nouns that are less predictable: for example, miskomin “raspberry” is animate, but ode’imin “strawberry” is inanimate.


Valentine 2001

5.8 Compounding


Compounds: Putting roots together

The last main “type” of morphology is compounding. Compounds are words built from more than one root (though they can also be built from derived words): if you find a word that contains more than one root in it, you are definitely dealing with a compound. Compounding differs from both derivation and inflection in that it doesn’t involve combinations of roots and affixes, but instead roots with roots.

English is a language that builds compounds very freely—this is like other languages in the Germanic language family, like German and Dutch. For almost any two categories, you can find examples of compounds in English.

Compounds and Spelling

In English we don’t spell compounds in a consistent way. Some compounds—typically older ones—are spelled without a space, while others are spelled with a hyphen, and many new compounds are spelled with spaces, as though they are separate words.

We can tell that some sequences of “words” are compounds, though, in a few different ways. First of all, there is a difference in pronunciation. Compounds are always stressed (given emphasis) on their first member; by contrast, phrases (sequence of words) get stress on their last member.

So the compounds:

Are pronounced differently than the ordinary sequences of adjectives followed by nouns:

Another difference is in the interpretation: a blackboard need not be black, a greenhouse usually isn’t green (though you grow green things in it).

Finally, there’s a syntactic difference. Something we’ll see when we get to Chapter 6 is that there’s no way to string nouns together in English syntax, without connecting them with prepositions or verbs. So any time you see a string of “words” in English that all look like nouns, you have to be dealing with a compound.

English compounds and spelling

English really likes building very long compounds out of nouns. This is something people usually associate with German. In German, unlike in English, compounds are always spelled without spaces. So you get words like:

(1) Donaudampfschiffahrtsgesellschaftskapitän
“Danube steam shipping company captain”

The second row in (1) inserts the hyphens in this German compound so that you can see the roots more clearly—but if you look at the English translation, it actually tracks all the same nouns in the German example. English writing has just adopted the convention of writing long or novel compounds with spaces. Structurally, English compounds work just like their German counterparts.

Compounds and Headedness

If compounds have more than one root in them, which of them determines the category of the word?

Most compounds—the ones that you might make up on the spot in particular—have a head. The head of a compound determines its interpretation (a sunflower is a type of flower, a bluebird is a type of bird, etc.) as well as its category.

In English, the head of a compound is always on its right: English is a right-headed compound language.

Compounds that have a head are called endocentric. This is the same endo– morpheme you find in endo-skeleton. An animal (like a human) with a skeleton inside of it is endoskeletal, and a compound with a head inside of it is endocentric.

What about the compound equivalent of exo-skeletal, animals that have a carapace instead of a skeleton (like insects or crabs)? Compounds that are exocentric don’t have a head inside of them—they don’t describe either of their members.

Some exocentric compounds don’t have an interpretive head, but still have what we might call a category head, in that the root on the right matches the category of the whole compound. For example, redhead (“person with red hair”) is often listed as an exocentric compound, because it does not describe a type of head. Similarly sabretooth is exocentric because it doesn’t describe a type of tooth. But both of these are noun-noun compounds that are themselves nouns, so their right-hand member is almost a head. A spoilsport (“person who spoils other people’s fun”) is not a type of sport, but it is still a noun.

But other exocentric compounds don’t even have a head in this sense. For example, outcome looks like a compound of a preposition and a verb, but is a noun. Dust-up is a compound of a noun and a preposition, but is a noun. Tell-all is a compound of a verb and a determiner (all), but is an adjective.

Finally, a special kind of compound is usually called a dvandva compound (terminology from Sanskrit grammar, dvandva means “pair”). Dvandva compounds can be thought of as “co-headed”—they can be paraphrased with an “and” between the two members. In English a lot of our dvandva compounds involve roots that sometimes only occur in the compound, that mirror each other’s sounds. These are sometimes called reduplicatives.

But we also have some other dvandva compounds:

These are less common than other types of compounds in English.


5.9 Structural ambiguity in morphology


Ambiguity in derivation

It isn’t always the case that affixes can only attach in one order. Sometimes both orders (or all orders, if there are more than two) meet the selectional requirements of all affixes involved.

When a string is compatible with more than one structural representation, it is structurally ambiguous—but not only are both trees potentially correct, they’re often associated with different meanings.

Consider again the prefix un- but now in a word like untieable.

The word untieable is ambiguous. Pause for a moment and try to come up with its two interpretations.

The two interpretations of untieable are:

  1. Able to be untied.
    For example: the knot most people use for their shoelaces is chosen because it’s easily untieable.
  2. Not able to be tied.
    For example: if you haven’t learned to tie knots, a celtic knot might seem untieable.

As you might be able to see from the paraphrases I’ve given here—a paraphrase is a different way of saying the same thing—we can account for the ambiguity of untieable by attaching the two affixes in different orders.

For meaning 1 “able to be untied”, we first attach the prefix un- to the verb tie, producing the verb untie (to undo a knot). Then we attach the suffix -able to untie to turn it into the adjective untieable.

Tree diagram: untie-able [Adj [V un + tie(V) ] -able]
Figure 5.3 Tree diagram for untieable, on the interpretation “able to be untied”

For meaning 2 “not able to be tied”, by contrast, first we attach the suffix -able to the verb tie, producing the adjective tieable (capable of being tied). Then we attach the prefix un- to this adjective. Now un- has its adjectival meaning, so we end up with an adjective meaning “not capable of being tied”.

Tree diagram: un-tieable [Adj un- [Adj tie(V) + -able ] ]
Figure 5.3 Tree diagram for untieable, on the interpretation “not able to be tied”

This type of ambiguity in derivational morphology requires that at least one affix be able to attach to bases of more than one category. We’ll see that structural ambiguity is even more common in the case of compounds.

Structural ambiguity is different from the type of ambiguity we find with homophones—words that sound the same but have different meanings. If I say that I went to the bank, without more context you don’t know if I went to the kind of bank that’s a financial institution or the kind of bank that’s the edge of a river. There’s nothing structural in this ambiguity, instead there are just two different roots that sound the same.

Ambiguity in Compounding

Just like with derivational morphology, you can have structural ambiguity in compounds! In fact, it’s even easier to create structurally ambiguous examples, at least in languages that easily build compounds, because any string of noun roots (for example) could hypothetically combine in many different ways.

5.10 How to draw morphological trees


In Section 5.9 we saw that the order in which we attach derivational affixes, or the order in which we build compound words, sometimes matters. So a word like “governmental”, isn’t just a string of the root govern + the suffix –ment + the suffix –al. Instead, it’s the result of first combining govern and –ment, and then combining the result of that with a further suffix –al.

In linguistics, we often represent this type of structure with a tree diagram. Trees are used to represent the constituency of language, the subgroupings of pieces within a larger word or phrase. One of the big insights of linguistics is that constituency is always relevant when describing how pieces combine together, whether we’re looking at morphemes within a word or words within a sentence. (though different theories in linguistics often take different views of what range of hierarchical structures are possible in natural languages.)

Tree diagram: [Adj [N govern(V) + -ment ] -al ]
Figure 5.5 Tree diagram for governmental

When drawing a morphological tree, we can follow these steps:

  1. Identify the root and any affixes
    • 1 root: non-compound word
    • 2 roots: compound word
  2. Determine the category of the root
  3. Determine the order in which affixes attach
  4. Determine the category of any intervening bases, and of the whole word.

You might find that it makes sense to do these in different orders, or in different orders in different words. The best way to find out what works for you is to practice.

5.11 How to solve morphology problems


An important skill when it comes to morphology is being able to segment words in another language into their individual morphemes—in other words, being able to identify roots and affixes in complex words.

Remember that a morpheme is a consistent pairing of form (sound or sign) with meaning or function. Finding morphemes requires comparing words whose meanings you know, to see if the shared parts of their meanings correspond to shared parts of their forms.

Consider some of the examples of singular, dual, and plural nouns we saw earlier in the chapter from Inuktitut:

(1) gloss singular dual (2) plural (3+)
“door” matu matuuk matuit
“cloud” nuvuja nuvujaak nuvujait

How can we find the plural morpheme in these examples? If we just start with a single plural word, like matuit “doors (three or more)”, there’s no way to figure out how to divide it (or even if it can be divided). But if we compare words that share only one aspect of their meaning/grammatical function, we can start to divide words:

matuit ~ nuvujait
door.PL cloud.PL

The words matuit and nuvujait are both plural. Their sounds don’t overlap very much, but they do share the final -it. This is a consistent pairing of form and meaning, so we can hypothesize that -it is the suffix meaning PLURAL.

Next we can compare matuit with another word with the meaning “door”.

matuit ~ matu
door.PL door.SG

The words matuit and matu both have “door” as part of their meaning; they also both contain the string matu. This is a consistent pairing of form and meaning, so we can identify matu as the root meaning “door”. (This means that matu doesn’t have any suffix meaning singular—that’s common for singular nouns across languages, but note that you will sometimes find a singular suffix. We could also say that there’s a -∅ singular suffix, but that’s not necessary.)

We can also go back to nuvujait, and see that once we’ve identified the suffix -it we’re left with the string nuvuja, which by hypothesis would be the root meaning “cloud”—and indeed, nuvuja appears in our data with that meaning!

So far we have three morphemes:

What about the dual? We have two dual nouns in the data set above.

We can separate out the roots that we’ve already identified:

We’re now left with two slightly different suffixes: –uk and –ak. Because we’ve already identified the roots, we can be pretty confident that these suffixes both express the meaning DUAL. They would be allomorphs:

If we had more data, our next step would be to try to find out if the choice of -uk vs. -ak is predictable in Inuktitut. Based on these two words, we might hypothesize that -uk occurs after [u] and -ak occurs after [a] (an example of phonologically-determined allomorphy), but we would need to check more words to see if that prediction is correct.

Chapter 6: Syntax


In Chapter 5 we looked at the internal structure of words (morphology). In this chapter we look at how words are organized into phrases and sentences, which in linguistics is called syntax.

In linguistics, syntax is the study of how words are organized into phrases and sentences. Just as the morphemes in a word are organized into structures, the words in a sentence are also best viewed not just as a string of words, but instead as having a hierarchical structure (Section X). And just as words contain a head morpheme, we’ll see that every phrase has an element that is its syntactic head (Section X).

After covering core concepts in syntax in the first half of this chapter, in the second half we’ll see how we can use tree diagrams to represent the structure of sentences and phrases, just as we did previously for the structure of morphemes inside words.

When you’ve completed this chapter, you’ll be able to:

  • Use the evidence of constituency tests to identify the phrases within a sentence
  • Categorize words into lexical and functional categories based on their distribution
  • Identify relationships between grammatically related sentences (active and passive, statements and questions)
  • Draw tree diagrams to represent the structural analysis of sentences in English

6.1 Syntactic knowledge and grammaticality judgements


What kind of knowledge do we have about the syntax of language? Let’s start by considering the sentence in (1):

(1) All grypnos are tichek.

You might not know what a grypno is, or what it means to be tichek (because these are made-up words!), but you can tell that this sentence is still the right kind of “shape” for English. In other words, (1) is consistent with the way English speakers put words together into sentences.

Compare this with the sentence in (2):

(1) *Grypnos tichek all are.

Unlike (1), (2) isn’t the right shape for a sentence in English. Even if you did know what a grypno was, or what it meant to be tichek, this still wouldn’t be the way to put those words together into a sentence in English.

Something we can be pretty confident about is that you’ve never heard or read either of these sentences before encountering them in this chapter. In fact, most of the sentences you encounter in this textbook are likely to be ones you haven’t heard or read in exactly that order before. So that means that your internal grammar of English must be able to generalize to new cases—this, again, is the infinite generative capacity of language, something introduced back in Chaper 1.

As someone who uses language—in the case of (1) and (2), as someone who speaks and reads English—you can identify sentences that do or do not fit the patterns required by your internal grammar. In syntax we describe sentences that do match those patterns as grammatical for a given language user, and sentences that do not match required patterns as ungrammatical.

Grammaticality judgements in syntax

In syntax when we say something is ungrammatical we don’t mean that it’s “bad grammar” in the sense that it doesn’t follow the type of grammatical rules you might have learned in school. Instead, we call things ungrammatical when they are inconsistent with the grammatical system of language user.

The evaluation of a sentence by a language user is called a grammaticality judgement. Grammaticality judgements as a tool for investigating the linguistic system of an individual language user—there is no way to get a grammaticality judgement for “English” as a whole, for example, only grammaticality judgements from individual English speakers. Sometimes you will see a sentence described as grammatical or ungrammatical “in English” or another language; technically this is a shorthand for saying that users of the language generally agree about whether it is grammatical or not. In many cases different users of a language disagree about the status of a particular example, and that can tell us something about syntactic variation in that language!

We are often most interested in examples that are ungrammatical, because they tell us about the limits on building sentences in a language. The convention in linguistics is to mark ungrammatical examples with an asterisk (*) at the beginning of the sentence, sometimes called a star (slightly easier to say). Whenever you see that symbol in front of an example in this textbook, it indicates that the example is ungrammatical in the linguistic sense.

Sometimes we want to indicate that a sentences is weird because of its meaning, rather than its syntax. In these cases we use a hashmark (#) instead of a star.

For example, consider an example like (3):

(3) #The book pedalled the bicycle harmoniously.

This sentence is the right shape for English, it just doesn’t make any sense. So we would say that it’s grammatical but semantically odd, and that’s what the hashmark indicates.

Most of the sentences we will consider in this chapter are ones that many English speakers (but not all) share similar judgements about. If you disagree with any of the judgements reported here, you can take the opportunity to think about what that tells you about your own grammar, and whether the difference could be explained using the tools we develop here, or if it shows that we would need to revise our theory of syntax in other ways!

The goals of syntactic theory

Our goal in syntax is to develop a theory that does two things:

  1. predicts which sentences are grammatical and which ones are ungrammatical, and
  2. explains observed properties of grammatical sentences.

But we also want to build a theory that can be used to explain not just properties of English, but properties of all human languages. In much of this chapter we’ll be focusing on the syntax of varieties of English, because that’s a language that’s common to everyone who reads this textbook, but we will often have opportunities to see how other languages show us the scope of variation for syntax in human languages.

What kind of theory do we need to make these kinds of predictions? If languages were finite we could simply list all the good sentences and be done. But any language user can generate sentences that no one has ever encountered before, and other people can understand those sentences, so what we “know” when we know the syntax of a language must be more than just a list of grammatical sentences. In the next section of this chapter, we’ll see that what we know about syntax can’t be just about the order of words, it has to be something about their grouping (**constituency*) as well.

6.2 Word order


A starting point: basic word order

If you think about hearing or seeing a sentence, or if you think about reading a sentence that’s been written down, a really obvious property is that words and morphemes come in a particular order. Indeed, the only difference between the grammatical sentence in (1) (All gryphons are tichek.) and the ungrammatical sentence in (2) (*Grypnos tichek all are.) is that the words appear in different orders.

The relevance of word order for grammaticality is particularly strong for a language like English with relatively fixed word order: there isn’t much flexibility in English to change the order of words in a sentence, without either changing the meaning or making the sentence ungrammatical. Many other languages also have relatively fixed word order, including French and Mandarin, but lots of other languages—including Latin, Anishinaabemowin, Kanien’kéha, and ASL, to name just a few—have much more flexible word order, determined by stylistic factors or by the topic or focus of the sentence.

What is the basic order of words in English sentences? Based on the grammatical sentences in (1) and the ungrammatical ones in (2), see if you can come up with any generalizations about where the verb appears in English.

(1) a. Amal ate chocolate.
b. Beavers build dams.
c. Cats chase mice.
d. Daffodils bloom.
e. Eagles fly.
(2) a. *Amal chocolate ate.
b. *Build beavers dams.
c. *Chase mice cats.
d. *Bloom daffodils.
e. *Fly eagles.

These sentences are all statements, not questions or commands: they state a fact about the world, something that could be true or false. Looking at (2b-e), and comparing them with the grammatical sentences in (1), we can make the generalization that the verb cannot be the first word in an English statement.

What about (2a)? In (2a) the verb isn’t the first word, but the sentence is still ungrammatical; we might try to explain that by saying that the verb also can’t be the last word in a statement—except that (1d) and (1e) are both grammatical even though the verb does come last. So a more accurate generalization would be to say that the verb in an English sentence has to come after at least one noun, and that it can be followed by a second noun, but doesn’t have to be.

We could write this generalization as a kind of formula or template: the grammatical sentences in (1) have the order N V (N) (the parentheses around the second “N” mean that it is optional).

Another way to describe word order involves talking not just about categories like nouns and verbs, but grammatical functions like subject and object. Word order in English doesn’t just require that any noun come before the verb, it must be the noun that corresponds to the subject. Similarly, if the verb is a transitive verb with an object, the object noun must come after the verb. This is why Chocolate ate Amal. is a grammatical sentence of English (though with a somewhat implausible meaning), but cannot express the same meaning as (1a).

If you aren’t sure about terms like “subject”, “object”, and “transitive”, read the rest of this section and then come back and re-read the last paragraph. If you feel you are comfortable with those terms, it’s still a good idea to review the definitions given here, to make sure that you understand the terms in the same way they’re used in this textbook.

Key grammatical terminology

This section reviews some key grammatical terminology that you might be familiar with from elsewhere (often from language classes). This vocabulary is important for describing the basic structure of phrases and sentences, and so we review it here.

A string of words that expresses a complete proposition. For statements (as opposed to questions or commands), a proposition is something that can be true or false. A sentence is a clause that stands on its own as an utterance.
A clause is a combination of one subject and one predicate—some clauses occur inside other clauses (see below on complex sentences), though, and so not all clauses are independent sentences.
The state, event, or activity that the sentence attributes to its subject.

The word “predicate” is used in two ways. Sometimes it is used to refer to a single head/word (usually a verb or an adjective), but sometimes its used to describe everything in the sentence other than the subject (e.g. a whole verb phrase). In this chapter we use it in the first sense: a predicate is something that combines with a subject and (sometimes) one or more objects.

The participants or actors involved in a sentence. They are typically noun phrases, but it’s possible to have arguments of other types (usually prepositional phrases or whole clauses).

In the following sentences the arguments are in bold and the predicate is italicized.

(3) a. Vanja loves chocolate.
b. The children gave [the kitten] [a toy].
c. Everyone is excited.

Predicates can be classified by their transitivity, which is the number of arguments they take. (This is also sometimes called the valency of a predicate.) The words for transitivity are based on the number of objects a predicate takes.

One argument (the subject); no object.
Two arguments (subject and direct object); one object.
Three arguments (subject, direct object, and indirect object); two objects.

Arguments can be classified in at least two ways: their position in the sentence, and how they’re related to the predicate (are they the actor, the thing acted upon, etc). For now we will focus on the position of arguments, with diagnostics specific to English. Later in this chapter we will return to classifying arguments based on their role in an event.

Almost always appears before the predicate in English, and
controls agreement on the verb. If the subject is a pronoun, it is in nominative case (I, we, you, he, she, it, they)
Direct object
Usually appears after the verb in English. If the direct object is a pronoun, it is in accusative case (me, us, you, him, her, it, them)
Indirect object:
Only appears when a verb has three arguments. Generally the recipient of the direct object. Sometimes (not always) marked by “to” (or another preposition); if it is a pronoun, in accusative case (but in languages that have dative case, often in dative case)

Now that we’ve looked at grammatical terminology relating to predicates and arguments within sentences, let’s talk about terminology for sentences as a whole. First, we can classify sentences according to their function—whether we use the sentence to make a statement, ask a question, or give a command.

Statements. Things that can be true or false.
  • Yes-No questions (For example: Did Romil watch a movie?)
  • Content questions: (For example: What did Romil watch?)
Direct commands. (For example: Open the door!)

Alternatively, we can classify sentences according to their structure; that is, according to whether they contain one clause or more than one clause, and (if more than one clause) how the sub-clauses are related to each other.

Simple sentence
A sentence is simple if it contains only one clause. All the sentences we have seen so far have been simple sentences.
Compound sentence
A compound sentence has two clauses, linked by a conjunction (and,
or, or but). (For example: [ Danai laughed ] and [ Seo-yeon cried ].)
Complex sentence
A complex sentence is one that contains a subordinate embedded
clause—a clause inside a clause (an example of recursion!). (For example: Seo-yeon knows [ that Danai laughed ] .)

Variation across languages: order of Subject, Object, and Verb

Having reviewed terminology relating to predicates and their arguments, we’re now in a better position to talk about variation across languages in terms of basic word order—the order found in simple declarative clauses.

English is Subject-Verb-Object (SVO). This is one of the most common word orders in the world’s languages, found in about 35.5% of languages (Dryer, 2013). Other languages with this basic word order include most of the Romance languages, and both Mandarin and Cantonese. This word order is usually referred to as “SVO” even though not all clauses have objects; in a sentence without an object, the order would just be SV.)

All the other logically possible orders for subjects, objects, and verbs are also attested in the world’s languages, though. The most common basic word order is Subject-Object-Verb (41% of languages, according to Dryer 2013); for example, Japanese and Korean are both SOV languages.

Many languages have the order Verb-Subject-Object, for example Irish and the other Celtic languages, as well as in Anishinaabemowin. Orders where the object comes before the subject (VOS, OVS, OSV) are less common, but found in a few languages.

As we noted before, even though most languages have a basic word order (the order found in neutral declarative sentences), in many languages this order is much more flexible than it is in English.

When word order is flexible, it’s usually the case that order determined at least partly by topic and/or focus—the topic is the thing you’re talking about, and the focus is something you want to emphasize. So while English has a very strict SVO word order, languages with word order that is flexible with respect to the subject and predicate might be said to have a strict topic-comment word order, where he first element in the sentence is the topic (the thing the sentence is about) and the rest is a comment on that topic. Language users will prefer or require particular word orders in particular conversational contexts.

In Chapter 9.3 Dr. Kanatawakhon-Maracle gives several examples of flexible word order of this type in Mohawk—showing that translating from English isn’t always straightforward, with many different translations being possible with shades of meaning that can be a bit hard to distinguish in English.


Matthew S. Dryer. 2013. Order of Subject, Object and Verb. In: Dryer, Matthew S. & Haspelmath, Martin (eds.) The World Atlas of Language Structures Online. Leipzig: Max Planck Institute for Evolutionary Anthropology. (Available online at, Accessed on 2022-02-26.)

6.3 Structure within the sentence: Phrases, Heads, and Selection


From words to phrases

The order of words isn’t all there is to say about the structure of sentences. Instead, all human languages appear to group words together into constituents. The generalizations about which sentences you find grammatical and which ones you find ungrammatical don’t refer to purely linear properties like “fourth word in a sentence”, but instead to phrases in particular structural positions. In the rest of this section we’ll explore what it means to be a phrase in more detail; in the next section we’ll start talking about structural positions.

A phrase is a set of words that act together as a unit. Let’s look at the example in (1) to see what this means:

(1) All kittens are very cute.

What other groups of words can appear in the same position as the words all kittens in this sentence?

(2) a. Puppies are very cute.
b. The ducklings that I saw earlier are very cute.
c. These videos of a baby panda sneezing are very cute.

…and so on. It turns out that lots of different groups of words can go in this position—but not all of them! What all these examples have in common is that we’ve replaced [all kittens] with another group of words that includes at least one plural noun: puppies or ducklings or videos. If we swap in a singular noun, the sentences would be ungrammatical, as we see in (3).

(3) a. *The puppy are very cute.
b. *The duckling that I saw earlier are very cute.
c. *This video of a baby panda sneezing are very cute.

…but if we change the plural verb are to the singular is they become good again (this is subject agreement inflection):

(4) a. The puppy is very cute.
b. The duckling that I saw earlier is very cute.
c. This video of a baby panda sneezing is very cute.

It turns out that the groups of words that we can easily substitute in this position of the sentence are all ones that have a noun in them. But it’s not enough to just have some noun in the group of words at the front of the sentence, as the examples in (5) show. (5a) is ungrammatical even though the string of words at the beginning includes the pronoun I—and it’s ungrammatical whether we try the form is or are or even am. In (5b) the sentence is ungrammatical even though we have the compound noun baby panda, again no matter what form of the verb we try.

(5) a. *That I saw earlier {is / are / am} very cute.
b. *Of a baby panda {is / are} very cute.

What distinguishes the grammatical sentences in (1) (2) and (4) from the ungrammatical sentences in (5) is that in (1) (2) and (4) the group of words at the beginning of these sentence are noun phrases: these are words that not only contain a noun, but where the noun is the “most important” element in some sense. What does it mean to be the “most important”? It’s the noun that determines an important part of the meaning of the subject, but it’s also this noun that determines the category of the whole phrase, which determines where the phrase can go in relation to other phrases and sentences. The noun is the head of the phrase, the same kind of headedness we saw in Chapter 5 for morphology, but applied to words in a phrase instead of to morphemes in a word.

The head of a phrase also determines what else can go inside the phrase: in particular it determines whether the phrase contains an object—though for heads that aren’t verbs, we usually use the more general term complement. Recall from the discussion of grammatical terminology in Section 6.2 that we classify verbs by their transitivity—that is, by how many objects they take. Each verb has an opinion about whether and how many objects are allowed with it. By contrast, there’s no verb that cares whether it’s modified by an adverb—or, indeed, no verb that cares whether it has a subject or not (all verbs in English require subjects). The technical term for this is selection: heads select their complements (both whether a complement is required/allowed, and what its category is).

Headedness is important to the grammar of all languages, not just English. The right kinds of generalizations in syntax are never about single words like nouns or verbs, but instead about phrases like noun phrases or verb phrases.

A further important point in the structure of natural language phrases is that phrases can contain other phrases of the same type inside of them. So for example, the noun phrase [these videos of a baby panda] contains a second noun phrase inside it, [a baby panda].

The ability of a structure to contain another structure of the same type inside itself is called recursion. This is another key property of natural language grammars—even though there is some debate among linguists about whether all human languages exhibit recursion, everyone agrees that many or most languages do, and that one of the things we need to explain about our human language capacity is that everyone can acquire a language with recursion (for more on child language acquisition, see Chapter 11).

Variation across languages: Word order within phrases

As we’ve already seen, languages vary in their word order, but this variation isn’t random—it isn’t the case that anything goes in word order.

This isn’t just true for the order of major constituents in a sentence (subject, objects and verbs), but also for the order of elements inside phrases; in particular, the order of heads and what they select (their object or complement).

In English it is always the case that heads precede their complements. This is true of verbs and their objects, prepositions and their noun phrase complements, and nouns and their prepositional phrase complements.

(6) a. I [VP ateV [NP an apple ].
b. [PP toP [NP Toronto ]
c. [NP pictureN [PP of a robot ]

In contrast to English, Japanese is a strictly SOV language. And in Japanese, heads always follow their complements. In other words, heads in Japanese don’t appear in the middle of their phrases like in English, but instead always at the end of their phrases.

(6) a. Watasi-wa [VP ringo-o tabe-ta. ]
I-TOPIC apple-ACC eat-PAST
“I ate (an) apple.”
b. [PP Tokyo e ]
Tokyo to
“to Tokyo”
c. [NP robotto no shasin ]
robot of picture
“picture of (a) robot”

This is the reverse of the order we get in English.

Technically words like of in Japanese would be postpositions instead of prepositions, and sometimes the more general term adpositions is used for both languages like English and languages like Japanese. These terms are parallel to suffix, prefix, and affix in morphology.

The ability of heads to either precede or follow their complements is called head directionality. A language can be head initial like English, or head-final like Japanese. If you’re analyzing an unfamiliar language, and need to figure out its word order, one of the first questions you should ask is whether it appears to be head initial or head final.

In later sections of this chapter we’ll see other ways to derive differences in word order, involving differences in the movement (or transformations) available in a language’s grammar.

After reading this section, you could proceed directly to the introduction to phrase structure rules and tree drawing in Section 6.13, though you may find it helpful to read Section 6.4 first.

6.4 Identifying phrases: Constituency tests


By identifying certain parts of sentences as phrases, we are making a claim that it’s represented in the mental grammar of speakers as a unit. The technical term for units inside a sentence is constituent: a constituent is a group of words that act together within a sentence.

Along with headedness, constituency is one of the central concepts in syntax. Both of these are highlighted when we represent the structure of language using tree diagrams, as we’ll see beginning in Section 6.13, but they’re fundamental to understanding the organization of sentences with or without trees.

When we encounter a new sentence, how do we identify the phrases inside of it? We want to find evidence that certain groups of words do actually act together within the sentence. To find that evidence, we use grammaticality judgements, and a few simple tests.

The tests that identify constituents (often called constituency tests) that we’ll review in this chapter come in four basic types:

Many textbooks also introduce a coordination test, but it is not always reliable, so we’ll discuss it briefly at the end of this section but won’t rely on it.


Here are two sentences to start with.

(1) The students saw a movie after class.
(2) The students saw a movie about dinosaurs.

Let’s consider the string of words a movie. Based on discussion so far in this chapter, you might have the idea that this is a noun phrase—or at least that it could be a noun phrase. But whether or not you have that idea, we need evidence to decide one way or the other.

One piece of evidence that something is a noun phrase is that you can replace it with a pronoun, and get a sentence with the same meaning (in a context where the meaning of the pronoun is made clear). In (3) we take the pronoun it and replace the string of words we’re interested in, then ask if the new sentence is grammatical or not.

(3) The students saw a movie after class. The students saw it after class.

Replacing a movie with it in (3) does give us a new grammatical sentence, so we have evidence not only that a movie is a constituent in (1), but also that that constituent is a noun phrase.

What about a movie in (2)? Let’s run the same test there:

(4) The students saw a movie about dinosaurs. *The students saw it about dinosaurs.

This time the result of replacing a movie with it is an ungrammatical sentence, so in (2) a movie is not a complete noun phrase. We might be surprised about this—we expect a noun like movie to be inside a noun phrase—but if we test other possible constituents we see that it’s not that there’s no noun phrase here, it’s just that it’s a bit bigger:

(5) The students saw a movie about dinosaurs. The students saw it.

Based on comparing the results of our replacement tests in (4) and (5), we can conclude that in (2) a movie is not a complete noun phrase, but a movie about dinosaurs is both a constituent and a noun phrase.

We can do the same pronoun replacement test with the string the students in (1). Because students is plural, the relevant pronoun is they:

(6) The students saw a movie after class. They saw a movie after class.

The result of this replacement is grammatical, so we conclude that the students is also a constituent, and also a noun phrase.

Replacement tests don’t have to involve pronouns. Verb phrases can be replaced with do (or do too), but seeing this usually requires setting up two sentences with different subjects or with a contrast in time like yesterday vs. today. Since we have just seen that the students in (1) is a noun phrase subject (because it comes at the beginning of a simple declarative sentence, before the verb), let’s set up a replacement test for verb phrase with a preceding sentence with a different subject:

(7) a. The teachers saw a movie after class, and… The students did too.
b. The teachers saw a movie after class, and… *The students did too before class.

What we see in (7) is that did too can replace saw a movie after class, but can’t replace saw a movie alone. This tells us that saw a movie after class is a constituent, and it’s a verb phrase (because do (too) replaces verb phrases).

What about the string after class? This string expresses a time, and we can replace it with the word then:

(8) The students saw a movie after class. The students saw a movie then.

This shows that after class is a constituent; in fact, it’s a prepositional phrase. Not all prepositional phrases can be replaced by then, however—about dinosaurs is also a prepositional phrase, but can’t be replaced.

(8) The students saw a movie about dinosaurs. *The students saw a movie then.

Here the result of doing replacement would be grammatical in other contexts, but it isn’t another way to say that the students saw a movie about dinosaurs—this is why it’s marked ungrammatical here, it’s ungrammatical on the intended meaning. You have to pay attention to both grammaticality and meaning when you do replacement tests.

At this point, you’re probably wondering how you know what you can use as a replacement. Here are some handy tips:

Because replacement is category-specific, you can use the evidence of replacement tests both to identify constituents and to figure out what category a certain phrase is: If you can replace it with a pronoun, then you’ve got a noun phrase and you can look for the noun as the head. If you can replace it with do or do so, then you’ve got a verb phrase which will have a verb as its head. Then and there are a little less reliable because they sometimes replace PPs or APs, but you’ll be able to tell the difference between prepositions and adjectives because prepositions usually have complements and adjectives almost never do.


Replacement is not the only tool we have for checking if a set of words is a constituent. Some constituents can be moved to somewhere else in the sentence without changing the sentence’s meaning or its grammaticality. Prepositional phrases are especially good at being moved. Consider this sentence:

(9) Nimra bought a scarf at that strange little shop.

Let’s start by targeting the last string of words by moving it to the beginning. Move the string of words then ask yourself whether the resulting sentence is grammatical.

(10) Nimra bought a scarf at that strange little shop. At that strange little shop Nimra bought a scarf.

Yes, it is. In isolation the sentence might sound a little unnatural, but we can imagine a context where it would be fine, such as, “At the department store she bought socks, at the pharmacy she bought some toothpaste, and at that strange little shop, she bought a scarf.”

On the other hand, if we target a smaller string of words, as in (11), we get a different result.

(11) Nimra bought a scarf at that strange little shop. *At that strange Nimra bought a scarf little shop.

The result of moving the string at that strange to the beginning of the sentence is a total disaster. The fact that the resulting sentence is totally ungrammatical gives us evidence that the string of words at that strange is not a constituent in this sentence.


A cleft construction is one where you take two parts of a sentence and divide them from each other. (A cleft is a split or gap.)

In English, a cleft is a sentence with the form:

It is/was _ that _.To use the cleft test, we take the string of words that we’re investigating and put it after the words It was (or it is/it’s), then put the remaining parts of the sentence after the word that. Let’s try this for phrases that we’ve already shown to be constituents with our other tests.

(12) The students saw a movie after class.
It was a movie that the students saw _ after class.
It was after class that the students saw a movie _.
(13) The students saw a movie about dinosaurs.
It was a movie about dinosaurs that the students saw _.
(14) Nimra bought a scarf at that strange little shop.
It was at that strange little shop that Nimra bought a scarf _.

Clefting a verb phrase doesn’t always sound entirely natural, but to the extent that it’s good, in English you need to put a present or past tense form of do in the remaining parts of the sentence:

(15) The students saw a movie after class.
(?)It was see a movie after class that the students did.

And things that our tests showed were not constituents cannot be put into the first position of a cleft sentence:

(16) *It was a movie that the students saw _ about dinosaurs.
(17) *It was at the strange that Nimra bought a scarf _ little shop/em>.

Now let’s try the cleft test on a new sentence:

(18) Rathna’s brother baked these delicious cookies.
It was these delicious cookies that Rathna’s brother baked _.
It was Rathna’s brother that _ baked these delicious cookies.

The cleft test shows us that the string of words these delicious cookies is a constituent, and that the words Rathna’s brother are a constituent. But look what happens if we apply the cleft test to another string of words:

(19) Rathna’s brother baked these delicious cookies.
*It was Rathna’s brother baked that _ these delicious cookies.
(20) Rathna’s brother baked these delicious cookies.
*It was these delicious that Rathna’s brother baked _ cookies.
(21) Rathna’s brother baked these delicious cookies.
*It was cookies that Rathna’s brother baked these delicious _.

All of these applications of the cleft test result in totally ungrammatical sentences, which gives us evidence that those underlined strings of words are not constituents in this sentence. Remember, though, just because a certain string of words isn’t a constituent in one sentence, doesn’t mean it’s not a constituent in any sentence—the result of a constituency test only applies to the specific sentence you’re testing.


If a string of words is a constituent, it’s usually grammatical for it to stand alone as the answer to a question based on the sentence.

(22) Rathna’s brother baked these delicious cookies.
a. What did Rathna’s brother bake? These delicious cookies.
a. Who baked these delicious cookies? Rathna’s brother.

Answers to questions can also help us identify a verb phrase, because they’re a good context for do-replacement (as a replacement test):

(23) Who baked these delicious cookies? Rathna’s brother did.


Notice that in the answer, “Rathna’s brother did”, the word did replaces the verb phrase baked these delicious cookies.

Again, if a string of words is not a constituent, then it is unlikely to be grammatical as the answer to a question. In fact, it’s difficult to even form the right kind of question:

(23) a. What did Rathna’s brother bake cookies? *These delicious.
b. Who of Rathna’s these delicious cookies? *Brother baked.


Results of tests like these are how we investigate the structure of the mental grammar that underlies how people use the languages they know. We can’t observe mental grammar directly, so observing how words behave is how we make inferences about how it must work. These four tests are tools that we have for observing how words behave in sentences. If we discover a string of words that passes these tests, then we know that the phrase is a constituent, and that tells us something about the organization of the sentence as a whole.

Not every constituent will pass every test, but if you’ve found that it passes two of the four tests, then you can be confident that the string is actually a constituent.

After reading this section, you could proceed with the introduction to X-bar theory in Section 6.14, to see how we represent constituency in tree diagrams.

6.5 Functional categories


From lexical categories to functional categories

Up to this point we’ve focused mostly on lexical categories, not only in this chapter but in Chapter 5 on Morphology.

As we’ve started looking at phrases and sentences, however, you may have noticed that not all words in a sentence are nouns, verbs, adjectives, or adverbs. Consider the sentence in (1).

(1) The spaceship will arrive in orbit very soon.

Spaceship is a noun, and it is the head of the noun phrase [the spaceship] (we can tell because it could be replaced by a pronoun like it). But what category is the? Similarly, in this sentence arrive is a verb, orbit is a noun, and soon is an adverb, but what categories do will, in, and very belong to?

Words like the, will, in, and very belong to functional categories. These can be thought of as the grammatical glue that holds syntax together. While lexical categories mostly describe non-linguistic things, states, or events, functional categories often have purely grammatical meanings or uses.

Some of the most important functional categories that we’ll use in this chapter are described in this section. In some later sections other functional categories will be introduced—as we develop a syntactic theory, a lot of the action comes in identifying new grammatical functions, and figuring out how they map onto structure.


You may be familiar with the definite article the and indefinite article a(n).

(2) a. the book
b. a cat

In English, these occur in noun phrases before the head noun, as well as before any numbers or adjectives:

(3) a. the three red books
b. a large angry cat

In fact, they are usually the very first thing in a noun phrase, and you can only have one of them (unlike adjectives, which you can pile up). If you try to have more than one, the result is ungrammatical:

(4) a. *a the book
b. *the a cat

This distribution doesn’t apply only to the and a(n), though. There are a bunch of other elements that occur in exactly the same places, with exactly the same restrictions. These other things aren’t articles in traditional grammar, so we call this larger functional category determiners.

Some other determiners:

Test for yourself that these occur in the same places in noun phrases as the and a(n) do—and that some other words expressing quantities (like all and many) and numbers do not.

Possessors in English expressed by possessive pronouns or by noun phrases marked with ‘s also appear in the same position as determiners, and are also in complementary distribution with them, as shown in (5):
(5) a. my book
b. [a friend from school]’s cat
c. *the [a friend from school]’s cat
d. [a friend from school]’s the book

Notice that the marker ‘s attaches to the whole phrase, rather than to the head noun friend; this makes it a clitic rather than an affix, and makes it different from possessor marking typically found in languages with genitive case.

Possession can also be marked with a prepositional phrase, which would come after the noun and not be in complementary distribution with determiners: the cat [of my friend from school].

Not all languages have definite and indefinite articles, but many languages have some kind of determiners. If you know a language other than English, try to figure out whether there’s a class a words that occur outside adjectives and numbers that might be determiners—these could come first, as in English, but might instead come last, especially if other things in the noun phrase come after the noun itself.


Pronouns are a special functional category that can replace a whole noun phrase, as we saw in Section 6.4. The set of pronouns in the variety of English most Canadians speak is limited to the following (each row lists the nominative, accusative, and genitive forms of the pronoun):

Many English speakers have a dedicated second person plural like y’all oryous; in Canadian English, you guys may have the distribution of a second person plural pronoun, though it looks on the surface like a noun phrase. Across different varieties of English, many people have different case forms for some of the pronouns listed above as well.

Many languages have pronouns, but in some languages pronouns aren’t used as often as they are in English; users of those languages may prefer to leave noun phrases out entirely, rather than replace them with pronouns.

While pronouns are a functional category, in this textbook we will treat them as still belonging to the same category as nouns (abbreviated N).


Auxiliaries are like verbs in that they can be present or past tense, and can show agreement, but they always occur alongside a lexical main verb. For this reason they’re sometimes called “helping verbs”.

For example, in the progressive in English we see the auxiliary be, alongside a main verb that ends in the inflectional suffix -ing:

(6) The bears are dancing.

In English declarative sentences, auxiliaries occur after the subject and before the main verb.

If an English sentence is negative, at least one auxiliary will occur to the left of negative not /n’t:

(7) The bears aren’t dancing.

In a Yes-No questions in English, at least one auxiliary appears at the front of the sentence, before the subject:

(8) Are the bears dancing.

The auxiliaries in English are:

Importantly, these can all also be used as lexical verbs! They’re auxiliaries only when there’s also another verb in the clause that’s acting as the lexical verb. If have expresses possession, or be is followed by a noun or adjective instead of a verb, these are main verb uses.

In English there is also a class of modal auxiliaries. These only occur as auxiliaries in modern English, and are different from the other auxiliaries in that they don’t agree with the subject. The modal auxiliaries are:

Sometimes lists of modals include ought (as in You ought not do that.)need (as in You need not go) and dare (as in I dare not try), but these aren’t used as modals very frequently by most English speakers today.

You can test for yourself that these have the same distribution with respect to subjects, negation, and in questions as the auxiliaries be, have, and do.


Prepositions express locations or grammatical relations. They are almost always followed by noun phrases (though a few prepositions can occur by themselves)—in other words, they are almost always transitive and select a noun phrase complement. Prepositions can sometimes be modified by words like very or way. Those modifiers, the preposition, and the following noun phrase, all group together into a prepositional phrase constituent.

Some prepositions:

Outside is an example of a preposition that can occur without a following noun phrase, in a sentence like They’re playing outside.

Other functional categories

A few other functional categories that you will encounter in this chapter are degree words like very and way, which always modify adjectives or adverbs; numbers, which occur between determiners and adjectives, and which as a syntactic category also include words like many and few; and conjunctions, which include only and, or, and but, and connect two phrases of the same category.

Two other important functional categories will come up later in this chapter: tense, which will be the category that heads sentences, and complementizers, which will introduce embedded clauses. We will learn how to identify these functional categories in later sections; for now, it’s useful to note that words like because and although pattern together with other words as part of the functional category of complementizers, even though in traditional grammar they’re often identified as conjunctions (based mostly on their meaning, instead of their distribution).

Functional categories as “closed class”

Even though there are lots of different functional categories, they’re different from lexical categories in that it’s much harder to add new words to an existing functional category than it is to come up with new lexical items. So I can coin new nouns (like grypno) and new adjectives (like tichek) very easily, but it’s more difficult to add, say, a brand-new determiner or auxiliary to a language.

Even though it’s harder, though, it’s definitely not impossible! Consider the functional category of pronouns. There are lots of new pronouns that people have proposed as nonbinary pronouns. These neopronouns are sometimes harder to get the hang of than new lexical nouns are (which is one of the signs that pronouns are more of a closed class than nouns are) but it’s very possible to become a fluent user of a new pronoun with a bit of practice.

Now that we’ve looked at relationships between clauses, you could proceed to see how we represent clauses in tree diagrams in Section 6.15, which introduces the functional category TP (tense phrase).

6.6 Clausal embedding


Recursion: Sentences inside sentences

So far we’ve talked about the organization of words into constituents in a single clause. Consider the sentence in (1):

(1) The students saw a movie about dinosaurs.

This sentence has 3 noun phrases: [the students], [dinosaurs], and [a movie about dinosaurs]. The noun phrase [dinosaurs] is inside the bigger constituent [a movie about dinosaurs], and they’re linked together by a preposition about—in fact, [about dinosaurs] is a prepositional phrase. We also have a verb phrase [saw a movie about dinosaurs]—the verb and its object (or its objects, if it is ditransitive), will always be part of the same verb phrase constituent.

Now consider the sentences in (2) and (3):

(2) Deniz said something.
(3) Samnang might leave.

In (2), the object of the verb said is something; together these form a verb phrase. But now consider a sentence like (4):

(4) Deniz said that [ Samnang might leave ].

In (4), the entire clause from (3) appears after the verb said, in the same position that something appeared in (2). Also, if we do constituency tests—for example replacement in (5)—we see that [said that Samnang might leave] is a verb phrase that can be replaced by do (too).

(4) Keiko said that Samnang might leave, and Deniz did too.

What we see here is that the complement of a verb can be a whole clause; in this case we call the clause-inside-a-clause an embedded clause.

What about the word that? The role of that seems to be to introduce the embedded clause. Words that have this function of introducing an embedded clause belong to the category complementizers (called that because they turn clauses into the complements of verbs).

Like other categories, complementizers create complementizer phrases. Here the complementizer phrase is [ that Samnang might leave ]; again, you can identify this constituent with tests.

(5) It was that Samnang might leave that Deniz said __.

Just as verbs select how many complements they take, they can also select the category of their complement. Some transitive verbs can combine only with noun phrase objects, some only with prepositional phrases, some only with complementizer phrases—and some with any or all of these.

For example, the verb know can combine with several different categories of complements:

(6) They know…
…this fact. (noun phrase)
…about birds. (prepositional phrase)
…that birds can fly. (complementizer phrase)

Other verbs can only take some of these as complements:

(7) We ate…
…curry. (noun phrase)
*…about curry. (*prepositional phrase)
*…that curry is for dinner. (*complementizer phrase)
(8) The teacher said…
…something. (noun phrase)
*…about chocolate. (*prepositional phrase)
…that they like chocolate. (complementizer phrase)
(9) They talked…
…mythology. (noun phrase)
…about mythology. (prepositional phrase)
*…that mythology is interesting. (*complementizer phrase)

(Some readers might not find She talked mythology. totally grammatical; whether talk can take a noun phrase object is something that has changed in English.)

So far the examples of embedded clauses that we’ve seen are all embedded statements. Is that the only kind of embedded clause that exists in English, or in language in general? Are there any complementizers other than that?

Take a moment to see if you can think of some other verbs that embed whole clauses, and see if you can identify some element in those sentences that looks like another complementizer. You can do this for English, or for another language that you know.

Questions inside sentences: Embedded interrogative clauses

We just saw that the English verb know can combine with several different types of complements (complementizer phrases, noun phrases, and prepositional phrases). It also happens to be able to combine with more than one type of embedded clause. Consider the following examples:

(10) I know…
…(that) ghosts exist.
whether ghosts exist.
if ghosts exist.

What we see here is that the verb know can combine not only with clauses introduced by that (or nothing), but also ones introduced by whether or if. Another way to write this would be to use { curly braces } to surround the C heads allowed after know, as in:

(11) I know [CP {that, ∅, whether, if} ghosts exist ].

Not all verbs are equally flexible! Some verbs, like believe, only allow that or ∅, not whether or if:

(12) a. I believe [CP {that, ∅} ghosts exist ].̱
b. *I believe [CP {whether, if} ghosts exist ].̱

Other verbs only allow whether or if as complementizers, like

(13) a. *I wonder [CP {that, ∅} ghosts exist ].̱
b. I wonder [CP {whether, if} ghosts exist ].

What this tells us is that the difference between that/∅ on the one hand, and whether/if on the other hand, is something that verbs can be sensitive to when it comes to selection.

What, then, is the difference between these two sets of complementizers?

We can see the difference if we look at their use with verbs of quotation, comparing embedded clauses with direct quotation (indicated in English writing by using quotation marks).

Consider the verb say in (14):

(14) They said that ghosts exist. = They said: “Ghosts exist.

The embedded CP with that can directly paraphrase a directly quoted statement.

Now compare the verb ask in (15):

(15) They asked if ghosts exist. = They asked: “Do ghosts exist?

In (15) we see that the embedded CP with if corresponds not to a quoted statement, but to a quoted question! This is the difference between that/∅ and if/whether:

In English, all verbs that can select a CP headed by whether can also select a CP headed by if, though some speakers prefer one or the other, or prefer one to the other for certain verbs. If you’re a fluent English speaker, do you prefer one of these words to the other one?

The relationship between that and ∅ is slightly more complex. For the most part, any verb that can select a CP headed by that can also select a CP headed by ∅. But there are a few verbs that strongly prefer an overt that—for many English speakers, an example is the verb report:

(16) a. The newspaper reported that there was a demonstration yesterday.
b. ?? The newspaper reported ∅ there was a demonstration yesterday.

When a complementizer phrase occurs in a different position in a sentence that is also often obligatory. For example, when a clause is the subject of a sentence, many speakers find it ungrammatical to leave that out in English.

(17) a. [That there was a demonstration yesterday] surprised some people.
b. *[There was a demonstration yesterday] surprised some people.
c. It surprised some people [that there was a demonstration yesterday].
d. *It surprised some people [there was a demonstration yesterday].

Embedded nonfinite clauses

Are complementizer phrases all just questions or statements?

There’s at least one other distinction in types of clauses that verbs can take as complements. Consider, for example, the verb want:

(18) a. I want [ghosts to exist].
b. *I want [{that/∅} ghosts exist].
c. *I want [{whether/if} ghosts exist].

At least for some English speakers, the verb want doesn’t allow any of the complementizers we’ve seen so far. Instead, it requires that the clause it embeds have a nonfinite verb. Is a clause like [ ghosts to exist ] a complementizer phrase, or should we identify it as something else?

There is reason to think that at least some nonfinite clauses can occur in complementizer phrases: while many nonfinite embedded clauses don’t have any overt complementizer, at least some do. Consider the examples in (19).

(19) a. I want [for ghosts to exist].
b. I prefer [for my coffee to have milk in it].
c. I’d like [for you to leave now].

Not everyone likes for in these examples, but at least some English speakers do. Here for appears in the same kind of position we previously saw that, if, or whether in. We can analyze for as a complementizer for nonfinite clauses.

Some verbs can take either finite or nonfinite complements. Consider the verb prefer:

(20) a. I prefer [that cookies have chocolate chips].
b. I prefer [for cookies to have chocolate chips].

So just like the verb know can select either a question or a statement
as its complement, the verb prefer can select either a finite or a non-finite clause.

“Nonfinite” and “infinitive” are two words for the same thing—notice that in both cases a negative prefix (non- or in-) attaches to the root finite.Nonfinite verbs in English appear with the infinitive marker to. This marker shows up in the same position occupied by auxiliaries—after the subject but before negation—and it’s in complementary distribution with the modal auxiliaries.

Having learned about the structure of embedded clauses, you could proceed to see how they are represented in tree diagrams in Section 6.18, which introduces embedded CPs (complementizer phrases).

6.7 Main clause Yes-No questions


From embedded questions to main clause questions

So far we’ve seen embedded questions, introduced by whether or if, but what about actual questions—ones that end in a question mark? What generalizations can we make about this type of sentence?

Let’s start with Yes-No Questions—questions whose answer in English can be “yes” or “no”. Consider first the statements in (1).

(1) a. It will rain.
b. They have left.
c. Ghosts are haunting this house.

The statements in (1) become questions in (2):

(1) a. Will it rain?
b. Have they left yet?
c. Are ghosts haunting this house?

The questions in (2) could be answered by “yes” followed by the corresponding sentence in (1) (or “no” followed by the negation of one of those sentences).

Comparing the statements and their corresponding questions helps us state a generalization about the structure of Yes-No questions in English:

Yes-No Question Formation in English
Yes-No questions are formed by moving the first auxiliary in the main clause to the front of the sentence (i.e. before the subject).

This is also known as Subject-Auxiliary Inversion (or Subject-Aux Inversion).

It’s important that we say “the first auxiliary in the main clause”, instead of just looking for the first auxiliary in the sentence. If there’s an auxiliary inside the subject, then it’s not the one that moves. We can see this by thinking about how to form a Yes-No question based on the statement in (3)

(3) [ The information [ that was shared ] ] will surprise them.

The subject of this sentence is [the information that was shared], which contains the auxiliary was. But trying to do Subject-Auxiliary Inverstion with was as in (4a) is extremely ungrammatical; you have to move the auxiliary will from the main clause instead, as in (4b).

(4) a. * Was [ the information [ that __ shared ] ] will surprise them?
b. Will [ the information [ that was shared ] ] __ surprise them?

As far as linguists know, there is no human language with a grammatical process like “form questions by taking the first auxiliary you encounter, and putting it at the front of the sentence”. That’s interesting partly because it’s a very computationally simple kind of rule. The fact that we don’t find it in human languages supports the idea that structure and constituency, not just the linear order of words, is fundamental to their grammar.

Subject-Auxiliary Inversion can be described as a transformation. A transformation is a rule that changes the structure of a sentence in a predictable way, by reordering the constituents. It gives us a way of describing a set of grammatical sentences based on their consistent relationship to another set of sentences.

In current syntactic theory, transformations are usually thought of in terms of movement. We will return to the idea of syntactic movement when we formalize it in the context of tree diagrams in Section 6.19.


There’s one last piece we have to discuss with regard to Subject-Auxiliary Inversion in English, which is: what do we do when there’s no auxiliary? Consider the sentence in (5):

(5) Ghosts exist

There’s no auxiliary in this sentence. But when we form a question, suddenly an auxiliary appears!

(6) Do ghosts exist?

Where does this do come from?

Generally, in English in contexts that require an auxiliary for a grammatical reason, if there’s wasn’t already an auxiliary in the sentence then the auxiliary do shows up to give you the auxiliary you needed. This is true not only in questions but also in negated sentences:

(7) a. Ghosts don’t exist.
b. *Ghostsn’t exist.
c. *Ghosts existn’t.

If there’s already an auxiliary, however, you don’t get to add do for free in the same way:

(8) a. They have left.
b. They haven’t left.
c. *They don’t have left.
d. *Don’t they have left?

The rule that adds do when a sentence requires an auxiliary in order to be grammatical is called Do-Support.

Echo Questions

There’s another way to form Yes-No Questions in English, without doing
any movement at all. This is by using question intonation: just
pronouncing the sentence “as though it had a question mark.”

So for example, alongside (9a), in some contexts I might just say (9b), without Subject-Auxiliary Inversion.

(9) a. Do ghosts exist?
b. Ghosts exist?

For many English speakers, questions like this are a bit more restricted than questions formed by Subject-Aux Inversion. Try to think of the contexts in which you might say “Ghosts exist?” instead of “Do ghosts exist?”. While sometimes you could say either one, “Ghosts exist?” is slightly better in contexts where you’re asking someone to repeat themselves, or possibly expressing surprise.

Embedded Questions, Main Clause Questions, and Punctuation

What are we doing in conversation when we use main clause questions vs. when we use embedded questions?

If you’re talking to someone and you produce a main clause question—“Do you like chocolate?”—you are actually asking a question, usually hoping that the person you’re talking to will answer it, to give you information you’re looking for.

In English we indicate this with punctuation: questions have to end with a question mark (?), while other sentence types end with either a full stop (.) or an exclamation point (!).

But if you’re talking to someone and you produce a sentence with an embedded question—“They asked if I like chocolate.”—you’re not actually asking the question yourself. Instead you’re reporting on what someone else said, what they believe, or maybe what they know. These sentences do not end with a question mark, because the sentence as a whole isn’t a question.

Now that we have both main clause questions and embedded questions,
though, we can combine them! Consider a sentence like the following:

(10) They should know that ghosts exist.

We can turn the embedded clause into a question:

(11) They should know whether ghosts exist.

Or we could make the whole sentence a question, but leave the embedded clause as a statement:

(12) Should they know that ghosts exist?

Or we could do both at once!

(13) Should they know whether ghosts exist?

In this last example, we have two [+Q] complementizer phrases. In the main clause this triggers Subject-Auxiliary Inversion (which is how main clause questions are marked in English), while in the embedded clause we get the question complementizer whether (which is how embedded Yes-No questions are marked in English).

6.8 Main clause content questions


Until now we’ve only talked about Yes-No questions—questions that can be answered by saying “yes” or “no” in a language like English.

(1) Is it snowing? (Main clause Yes-No Question)
(2) They asked if it’s snowing. (Embedded Yes-No Question)


Not all languages have words corresponding to “yes” and “no”! In many languages, the way you answer a Yes/No–question is by repeating the verb with or without negation. For example to answer Is it raining? you could say Is. or Isn’t. This is the case in Mandarin, for example, as well as in Irish.

Yes-No questions like these just ask whether something is or is not the case. But we can ask more complex questions, asking for specific information about part of a sentence. For example:

(3) When was it snowing?. (asking about time)
(4) Where is it snowing? (asking about location)

These are content questions; in English they are often called WH-questions because they involve question words that start with the letters “wh” (who, what, where, when, why, which… and how, which doesn’t start with “wh” but does contain both those letters). These words are traditionally labelled interrogative pronouns; in this chapter we will simply call them content question words. In linguistics the label “WH-questions” is often generalized to other languages, but in this textbook we will stick to “content questions” since the relevant words don’t start with “WH” in other languages.

In many languages content question words do tend to start with some of the same sounds. For example, in French many (though not all) of these words begin with “qu-” (pronounced [kw]):

  • qui (who)
  • quoi (what)
  • quand (when)
  • quelle (which)

And in Anishinaabemowin many content question words start with [a] or [aa], though in some varieties the short vowel [a] is no longer pronounced in these words.

  • awenen (who)
  • awegonen (what)
  • aanapii (when)
  • aaniin (how, why, in what way)
  • aandi (where)

Whether or not a language has a group of question words that start with the same sound, all languages have ways of asking content questions, just like all languages have ways of asking Yes-No questions.

Unlike Yes-No questions, the answer to a content question would be a word or phrase corresponding to the question word that was used. If someone asks:

(5) Who were you talking to?

It wouldn’t make any sense to answer with yes or no. Instead the answer would be a noun phrase like “my friend” or “the person over there” or “U’ilani”, or with a full sentence (“I was talking to my friend / the person over there / U’ilani.”)

Just like Yes-No questions, content questions in English involve a change in word order from what we find in corresponding statements.

Consider the following very short dialogue:

(6) A: That squirrel has hidden something.
B: What has the squirrel hidden?

By asking the question with what, Person B is asking for more information about the something Person A mentioned. But even though this question is about the object of the verb hide, the question word what appears at the very beginning of the clause.

We find the same thing with all content questions in English: no matter where the phrase we’re asking about would show up in a statement, the question word has to go at the beginning of the sentence:

(7) Where is it snowing?
a. It’s snowing in Ottawa.
b. *It’s snowing where?
(8) When was it snowing?
a. It was snowing yesterday.
b. *It was snowing when?
(9) How do squirrels hide nuts.
a. Squirrels hide nuts by burying them.
b. *Squirrels hide nuts how?

All the grammatical main clause content questions we’ve seen here also involve Subject-Aux Inversion, just like Yes-No questions do. We can see this because the is before the subject in all the grammatical content questions.

But Subject-Aux inversion isn’t the only change in word order, we also need to state a generalization about the position of the question word itself. We will formalize the generalization in Section 6.19 in the context of movement within syntactic tree structures, but for now we can state the following generalization.

Question Word Fronting
A content question word (e.g. who, what, where, when, why, how), or a phrase headed by a content question word, must appear at the beginning of the clause.

Variation across languages: Questions

Depending on what other languages you know, you may already have been thinking about the fact that not all languages have Subject-Aux Inversion in questions. There are a variety of question-marking strategies in different languages, and in this section we review some of the most common ones.

Many languages use a fixed question word to mark Yes-No questions, sometimes called a question particle. For example, one of the more common ways to form a Yes-No question in French is to add est-ce que to the beginning of the sentence:

(10) Nous avons trouvé les fantômes.
we have found the ghosts.
“We have found the ghosts.”
(11) Est-ce que vous avez trouvé les fantômes.
+Q you.PL have found the ghosts.
“Have y’all found the ghosts?”

The particle est-ce que looks like multiple words, and historically it indeed derives from a phrase meaning something like “is it that”, but in contemporary French it acts like a single word, which we could hypothetically think of being spelled “eska”.

French has another way of forming questions that does involve Subject-Aux Inversion. Alongside the examples above, you can also say:

(12) Avez-vous trouvé les fantômes.
have-you.PL found the ghosts.
“Have y’all found the ghosts?”

This way of forming questions is someone old-fashioned for many current speakers of French, especially when speaking instead of writing. The more common strategy today is to use a question particle like est-ce que—or to just use question intonation.

We can treat est-ce que as a [+Q] complementizer that occurs in main clauses—exactly as though we marked questions in English by just adding whether or if to the beginning of the sentence.

Japanese also forms questions by adding a question particle, but because Japanese is head-final, the particle appears at the end of the sentence:

(13) Gakusei-wa yuurei-o mitsuke-ta.
Student-TOPIC ghost-ACC find-PAST
“The student found the ghost.”
(14) Gakusei-wa yuurei-o mitsuke-ta ka
Student-TOPIC ghost-ACC find-PAST +Q
“Did the student find the ghost?”

Similarly, Mandarin forms questions with the particle ma (Mandarin examples from Liing 2014).

(14) wàimiàn zài xià
outside PROG fall rain
“It’s raining outside.”
(15) wàimiàn zài xià ma
outside PROG fall rain +Q
“Is it raining outside?”

The analysis of Mandarin is a little bit more complicated, though. First of all, the particle appears at the end of the sentence in Mandarin, like in Japanese, even though Mandarin is otherwise head-initial, like English and French. Second, there are several other ways to ask questions in Mandarin, which are equally (if not more) common than adding the particle ma.

Overall, it is very common for languages to form questions by adding a particle to the beginning or end of a sentence—but it is also very common to form questions via head movement of the type we see in English.

It’s tempting to think of question particles as being like the English auxiliary do, but remember that do only shows up in Englishwhen there’s no other auxiliary. The question particle in a language like French, Japanese, or Mandarin is more like the English complementizers if or whether, except found in main clauses instead of only in embedded clauses.

Just as not all languages have Subject-Aux Inversion in questions, not all languages exhibit Question Word Fronting. These languages always leave content question words in-situ.

Japanese, for example, is a language with in-situ content questions. Question words like nani (“what”) are pronounced in the same place as the corresponding non-question arguments are.

(16) Usagi-wa nani-o tabe-ru ka
rabbit-TOPIC what-ACC eat-PRES +Q
“What do rabbits eat?”
(17) Usagi-wa yasai-o tabe-ru
rabbit-TOPIC what-ACC eat-PRES
“Rabbits eat vegetables.”

The key point to notice in these generalizations is that the word order in (17) and (18) is the same, even though (17) is a content question and (18) is the corresponding statement. Both have a different word order than the English translation, partly because Japanese is head-final (so the verb comes at the end) and partly because the word nani-o stays in-situ.

The question particle ka also appears in this WH-question, just like it does in Yes-No questions in Japanese, as we saw above.


Liing, Woan-Jen. 2014. How to ask questions in Mandarin Chinese. Doctoral Dissertation, CUNY.

6.9 Embedded content questions


In looking at simple and complex clauses in English, so far we have looked at:

All that remains to complete this picture is to look at the profile of Content Questions in embedded clauses.

It turns out that any verb that can embed a Yes-No question can also embed a content question. Let’s see some examples:

(1) know
a. I know [CP if squirrels have hidden nuts ]. (embedded Yes-No question)
b. I know [CP what squirrels have hidden ]. (embedded content question)
(2) ask
a. They asked [CP if the movie ends at 8PM ]. (embedded Yes-No question)
b. They asked [CP when the movie ends ]. (embedded content question)
(3) wonder
a. We wondered [CP if someone had baked cookies ]. (embedded Yes-No question)
b. We wondered [CP what someone had baked ]. (embedded content question)

What’s going on in the (b) examples? First of all, we do not see the complementizer if that is found in embedded Yes-No questions—and it’s ungrammatical if we try to include it, no matter what order it goes in:

(4) a. *I know [CP what if squirrels have hidden ].
b. They asked [CP if when the movie ends ].
c. We wondered [CP what whether someone had baked ].

From the ungrammaticality of all the sentences in (4), we can conclude that it’s impossible to include a complementizer in embedded content questions in English.

What about Subject-Auxiliary Inversion? There’s no inversion in I know what squirrels have hidden.—the auxiliary has stays after the subject squirrels—and if we try to add Subject-Auxiliary Inversion this sentence would become ungrammatical:

(4) *I know [CP what have squirrels hidden ].

So it looks like embedded content questions—at least for most English speakers in Canada—are like main clause content questions in putting the content question word at the front of the clause, but unlike main clause content questions in that they don’t do Subject-Auxiliary Inversion.

There are some varieties of English where sentences like this, with Subject-Auxiliary Inversion in embedded clauses, are grammatical! This has been described for some varieties spoken in Belfast, Ireland, for example, as described by Henry (1995).


Henry, A. (1995). Belfast English and standard English : dialect variation and parameter setting.

6.10 Arguments and thematic roles


Arguments as participants in events

In Section 6.2 we classified predicates in terms of their transitivity—that is, the number of arguments they combine with. Some verbs take only a subject (intransitive), a subject and an object (transitive), or a subject, an object, and an indirect object (ditransitive).

Importantly, “subject” and “object” are structural terms, not semantic ones. In English, a subject appears at the beginning of a declarative clause, has nominative case (if it’s a pronoun), and controls agreement on the tensed verb. An object in English occurs after the verb, and has accusative case (if it’s a pronoun).

We might ask, though, whether all subjects are interpreted similarly, or if all objects are.

Looking towards semantics (the study of meaning), verbs can be thought of as describing events or states. The difference between events and states, in semantics, is that events are often thought of as dynamic (things are actively happening), whereas situations are more neutral, and can include
states (things that are simply true, without anything happening, like being tall). From here on we’ll use eventuality as a general term for both events and states.

An eventuality involves some number of participants. The participants in an event can play various roles in the event, which in linguistics are called thematic roles.

Do subjects always play the same role in an eventuality? If we look at the following examples, it looks like they don’t:

(1) a. The children yelled. (children did something on purpose)
b. The wind blew the tree down. (wind did something, but not on purpose)
c. The tree burned. (something happened to the tree)

Indeed, we sometimes see this even with a single verb, as with sink in (2).

(2) a. The pirate sank the ship.
b. The ship sank.

In the sentences in (2), we see that the verb sink can be either transitive or intransitive. In the first case [the pirate] is the subject and in the second [the ship] is the subject, but they play different roles in the eventuality. Indeed, in (2a) [the ship] is the object, but plays the same role in the eventuality as it does when it’s the subject in (2b).

But for other verbs, we don’t see this kind of “trading places”:

(3) a. The author wrote the book.
b. The author wrote.

In (3) the subject stays the same in the transitive and intransitive uses of the verb write, and [the author] continues to play the same kind of role in the eventuality.

To talk about the different roles associated with subjects and objects, we can define a number of thematic roles that are relevant in natural language. There are potentially many such roles, but in this chapter we’ll focus on just a few.

Agents are animate actors who do things on purpose.
  • [The pirate] sank the ship. (subject = agent)
Usually inanimate (not alive), cause things to happen but without
acting on purpose.
  • [The bilge pump malfunction] sank the ship. (subject = causer)
The participant affected by the event, may be changed by the
  • The pirate sank [the ship]. (object = theme, affected/changed by the event)
  • The author read a book. (object = theme, not affected by the event)
Some work on thematic roles distinguishes themes (undergo the event, but are not affected or changed) from patients (undergo the event, and are affected or changed as a result). The objects of verbs of consumption (like eat) or creation (like build) are prototypical patients: they either disappear or come into existence as a result of the event. The distinction between themes and patients is not relevant in this chapter, however, and so we set it aside.
The thing an agent uses to accomplish an action. (Often, but not necessarily, introduced by the preposition with in English)
The place where the event takes place. (Often, but not necessarily, introduced by a locative preposition)
  • The pirate sank the ship [with a cannon]. (PP = instrument)
  • The pirate sank the ship [at sea]. (PP = location)

Not all animate subjects are agents: some animate subjects instead perceive something or experience a mental state.

An animate participant that experiences a mental state, including perceptions (see, hear, etc.)
  • [Pirates] frighten me. (me = Experiencer, [pirates] = causer)
  • I fear [pirates]. .̱ I like [pirates]. .̱[] (I = Experiencer, [pirates] = causer)
In most ditransitives, the indirect object is the goal of the event—the location or person who receives the theme.
  • The pirates sent [the ship] a message.
  • The pirates sent a message to [the ship] .̱[] ([the ship] = goal; a message =

Different verbs don’t just select how many arguments they combine with, but also select what thematic roles their arguments take. But verbs aren’t totally free to map thematic roles onto argument positions; for example, while we’ve seen that an experiencer can be either the subject or object of a verb, if a verb combines with both an agent and a theme, the agent is always the subject. Also, whenever a verb takes only a single argument, that argument will necessarily be the subject (at least in English, and in many other languages).

Looking at verbs with only one argument, we find both agent-intransitives and theme-intransitives.

(4) Agent intransitives:
a. The pirate laughed.
b. Everyone jumped.
c. The author wrote.
(5) Theme intransitives:
a. The tree fell.
b. The ship sank.
c. The ice melted.
d. A train arrived.

Passives, which we discuss in Section 6.11, are a type of derived theme-intransitive: passive clauses end up with a theme subject by “getting rid of” the original agent subject.

Events with no participants

Does every verb have to have at least one argument? In languages like English, every (non-imperative) sentence has to have a pronounced subject. But consider sentences like the following:

(6) a. It is raining.
b. It is snowing.

We might think that weather verbs like rain, snow, etc. take something like a theme subject—the it subject refers to something like “the weather”.

Pronouns like it can usually be replaced with full NPs, however, and yet it’s quite odd to replace the it subject of weather verbs with [the sky] or [the weather]:

(7) a. #The sky is raining.
b. #The weather is snowing.

Thinking in terms of thematic roles gives us a handle on what’s going on with this type of predicate: these are verbs that describe eventualities that don’t have any participants. So when it’s raining or snowing, then (at least in English) we don’t describe that as being something that anything is doing, it’s just something that happens.

Where does the it come from, then? One influential suggestion is that this it just shows up to give the sentence a subject, when there’s no other subject available. (This is kind of like Do-Support, but for nouns.)

So a verb like rain or snow has one syntactic argument (a subject), but does not have any semantic arguments to which it gives thematic roles.

6.11 Changing argument structure: Causatives and passives


So far we’ve only looked at thematic roles that verb roots come specified with. But all languages have ways to adjust the thematic roles expressed in a clause, either syntactically or morphologically.

Adding arguments: Causatives

For example, many languages have a causative construction. Causatives add an extra causer or agent (which becomes a new subject):

(1) a. They read a book. (transitive: Agent-Theme)
b. I made them read a book. (causative: Adds a second Causer/Agent)

English has a syntactic causative construction. Other languages have morphological causatives, that don’t involve a causative verb like make, but instead have verbal morphology that does the same work of adding an additional causer argument. Japanese is a language with a morphological causative, illustrated in (2).

(2) a. Neko-wa tabe-ta
cat-TOPIC eat-PAST
“The cat ate.” (intransitive: Agent)
b. Watasi-wa neko-ni tabe-sase-ta
“I made the cat eat.” (causative: adds a second Causer/Agent)

Other argument adding constructions include applicatives (adding a participant that the event is done for, usually as an indirect object. For example: I baked a cake. -> I baked my friend a cake.).

Removing arguments: Passives

Conversely, there are constructions that remove an argument from the ones the verb usually projects. Perhaps the most famous of these is the passive.

English, like many of the world’s languages, has a passive construction, which removes the original subject of a verb, resulting in the original object becoming the passive subject. A non-passive sentence is known as an active sentence. For example:

(3) a. They wrote a book. (original sentence: active)
b. A book was written (by them). (passive)

A grammatical passive can be identified by the following three properties:

  1. Original subject of the basic (active) transitive verb is demoted: it ceases to be the subject, and is optionally expressed in a propositional phrase (in English = by phrase) or with oblique case.
  2. Theme object of the basic (active) transitive verb becomes the subject of the passive clause.
  3. Characteristic morphology or syntax (in English = be + Past Participle -en/-ed)

All three of these properties are needed for a clause to be a true grammatical passive. In traditional grammar, active and passive are identified as “voices” of a verb; in some languages there are other grammatical voices, for example “middle voice”.

The first property of passives relates them to corresponding active sentences. This is a key property of passives: for any passive clause, there is always an active counterpart. (This is similar to questions, which we described in terms of their grammatical relationship to statements.)

Consider the following active sentence:

(4) The pirates sank the ship.

This is transitive, so it has a passive counterpart:

(5) The ship was sunk (by the pirates).

The sentence in (5) has all three defining properties of passives:

  1. The verb is replaced by be + past participle sunk
  2. The original subject is demoted and appears in an optional
  3. The subject is [the ship], which was the theme object of the
    active verb.

Compare this with the theme-intransitive we saw earlier

(6) The ship sank (*by the pirates).

In contrast to (5), (6) does not have all three defining properties of a grammatical passive:

  1. The subject is [the ship], which was the theme object of the active verb, but
  2. The original subject cannot be expressed in a by-phrase
  3. There is no auxiliary be, and no past participle.

While the subject in both these cases is [the ship], the theme intransitive doesn’t have the other properties of a passive clause.

Passives in Popular Discourse

In prescriptive grammar and in popular discussions, the passive is often disparaged: advice (or “rules”) for writing often says that you should avoid the passive voice. Sometimes this is justified by saying that thepassive “hides” the agent of an event.

In fact, though, the passive allows you to express the agent in a by-phrase in a way that other intransitives do not

(7) a. The ship was sunk by the pirates. (Passive, but expresses the agent)
b. The ship sank. (Active! But no way to express who did the sinking)
c. The bomb exploded. (Active! But doesn’t say who set the bomb)

So the reason given for avoiding the passive doesn’t hold up.

In both writing advice and in online discussions, you often see news headlines criticized for using “passive voice” for using verbs like “dies”/“died” or “something went wrong”, without identifying the cause of death or who did something wrong. If you search “passive voice” on a platform like Twitter, you will often find posts that make this kind of claim.

These tweets are typically pointing out a problem with the content of various types of public language (headlines, public statements by politicians), but they frame the criticism in terms of grammatical structure. And the sources these tweets complain about are rarely passive in the linguistic sense discussed in earlier in this section, but usually active. In most cases, using a passive would actually make it more possible to express an agent; for example, instead of only saying that an individual died, a passive could say that they were killed by police.

This is an example of how language ideologies—our attitudes and beliefs about language—can be expressed in popular discourse. Here people express a legitimate criticism of public writing—not clearly expressing the agent or person responsible for an action—but do so using grammatical terminology (in a way that doesn’t match the original grammatical meaning of the term “passive”). The intended criticism is about what information gets expressed in news headlines, but it is presented as though it is a grammatical issue.

We could say that this is just a change in the popular meaning of “passive”—but since “avoid the passive” is also given as stylistic advice more generally (independently of criticisms of headlines that elide responsibility), it’s useful to try to reserve the term “passive” for the actual grammatical structure.


6.12 Interim summary


In this chapter, we have so far covered some of the core concepts in syntactic theory, and seen how we can use them to reason about grammatical structures and relationships between classes of sentences.

These core concepts include the observation that natural languages are better described in terms of structural relations rather than just the linear order of words, that the properties of a phrase are determined by its head, and that we can use grammaticality judgements to investigate fine details about a language’s syntax. Beyond the structure of simple and complex clauses, we’ve also seen how we can usefully describe the properties of certain classes of clauses (questions and passives) by showing how they are systematically related to other clauses (statements and actives).

Beyond just the structure of English, we’ve discussed how these concepts can help give us a handle on languages that look very different on the surface. As just one example, the head direction parameter accounts for differences in word order between English and Japanese in all types of clauses.

Even though we have mostly focused on one language, we have still only scratched the surface of the language’s syntactic grammar. However, we now have tools we could apply either to other phenomena in English, or to the grammar of entirely unrelated languages.

In the remaining sections of this chapter we introduce a particular formal notation used to represent the syntactic structure of natural language: tree diagrams. In particular, we will introduce a derivational implementation of X-bar theory, where the grammatical sentences of a language are described in terms of constraints on a set of well-formed tree structures, and movement transformations that can be applied to those tree structures.

Many linguistics textbooks introduce trees at the outset, alongside the core concepts that they are meant to describe. However, when presented in that order it is easy to get caught up in the details of tree diagrams, and lose sight of what they are really meant to do, which is to make our claims about the basic structural relationships in a sentence clearer and more precise. For this reason, this chapter has lingered on basic syntactic analysis before introducing the notation of tree diagrams. However, you will have seen that sections in the first half of the chapter are linked to corresponding sections in the second half, and vice versa, so that the reader can easily switch back and forth between the two.

6.13 From constituency to tree diagrams


In this section we begin to introduce the formal notation of tree diagrams. We use tree diagrams to make specific and testable claims (hypotheses) about the structure of phrases and sentences.

Thinking back to Section 6.1, one way of thinking about the goal of syntactic theory is that it’s aiming to account for what languages users know about which sentences are grammatical, and which sentences are ungrammatical.

What constituents do we find inside sentences? Well, we know that a sentence consists of (at least) a subject and a predicate, and that subjects are (usually) noun phrases, and predicates are often verb phrases. We might express this as a rule, known as a phrase structure rule.

(1) S → NP VP

This rule says that wherever you have an S, it is possible for that S to be made up of an NP followed by a VP.

Tree diagram: [S [NP] [VP] ]
Figure 6.1 Schematic tree for S containing an NP subject and VP predicate

Another way to represent the same idea is with a tree diagram, as in Figure 6.1. Tree diagrams can express the same information as phrase structure rules, but can more efficiently express the output of multiple such rules; current syntactic theories are typically expressed in terms of constraints on possible trees, rather than in terms of constraints on phrase structure rules.

What kind of structure might we expect to find inside the NP subject? Here are some NPs—you might think of a sentence in which some or all of them can occur (remember that you can tell if a string in a sentence is a single NP by using a replacement test to try substituting a pronoun).

(2) a. robots (N)
b. some robots (D N)
c. six robots (Num N)
d. the six robots (D Num N)
e. the six small robots (D Num Adj N)
e. robots from Boston (N PP)
f. the robots from Boston (D N PP)
g. the six small robots from Boston (D Num Adj N PP)

We could abstract across all of these, and write a general phrase structure rule for
NPs, putting parentheses around all optional elements. A subscript “n” indicates that that element can be repeated any number of times.

(3) NP → (Det) (Num) (Adj)n N (PP)n

This can be read as:

An NP can consist of a determiner (optional), followed by a numeral
(optional), followed by any number of adjectives (all optional),
followed by a noun (required), followed by any number of prepositional
phrases (all optional).

We could represent the structure of some of the specific NPs in (2) as in Figure 6.2.

Tree diagrams: [NP [N robots]] [NP [D the] [N robots]] [NP [D the] [Num six] [Adj small] [N robots] [PP from Boston] ]
Figure 6.2 Tree diagrams for [robots], [the robots], and [the six small robots from Boston]

The phrase structure rule for NPs referred to prepositional phrases (PP). These have fewer possible shapes than NPs:

(4) a. from Boston (P NP)
b. outside (P)
c. just outside (Deg P)
d. way beyond my knowledge (Deg P NP)

We can abstract this in a single phrase structure rule:

(5) PP → (Deg) P (NP)n N (PP)n

We could then expand the “NP” symbol using our phrase structure rule for NPs above. That NP might contain another PP inside it—here we’ve encountered recursion again. Figure 6.3 shows tree structures for some of the PPs in (4).

Tree diagrams: [PP [P from] [NP Boston]], [PP [P outside]], [PP [Deg way] [P beyond] [NP [D my] [N knowledge]]]
Figure 6.3 Tree diagrams for [from Boston], [outside], and [way beyond my knowledge]

Now let’s look at some verb phrases (VPs). In the following examples, the VPs are all in [ square brackets ].

(6) a. The crew [repaired the ship].
b. The captain [gave the crew orders].
c. The spaceship [arrived].
d. The crew [travelled across the galaxy].

How do we know these are VPs? Well, they come after the subject of the sentence (an NP in all these examples), so that means they are predicates. In one case the predicate is a single word arrived—this word is a verb, so the only thing it could be is a verb phrase. All the other sequences in square brackets could be swapped into the same position as arrived, so they must be phrases of the same type. For example:

(7) The spaceship [gave the crew orders].

This sentence might be pragmatically odd unless you assume the spaceship is an artificial intelligence, but it is grammatical. Another test would be the replacement test for VPs, which involves replacement with do.

Based on these tests, we know that a verb by itself (like arrived) can be a VP, and that the object is inside the VP with the preceding verb. We have intransitive VPs with just a verb, transitive VPs with one NP, and ditransitive VPs with two NPs.

Many ditransitive verbs in English can also appear with an NP and a PP, though some ditransitive verbs, like put, only allow the NP PP version). The alternation between a ditranstive [NP NP] as in (7) and [NP PP] as in (8) is called the dative alternation.

(8) The spaceship [gave orders to the crew].

The VPs we’ve seen in this section can be derived with the following phrase structure rules:

(9) a. VP → Vintrans
b. VP → Vtrans NP
b. VP → Vditrans NP NP
b. VP → Vditrans NP PP

If we looked at a wider range of VPs we’d also find that adverb phrases can go at the beginning or end of VPs, though not typically in the middle. So we’d end up with the following general phrase structure rule for VP:

(10) VP → (AdvP) V (NP) (NP/PP) (AdvP)n

By putting together the structures we’ve proposed for NPs, PPs, and VPs, we’re now in a position to show some trees for full sentences. Figure 6.4 shows a tree for the sentence [The crew from Mars repaired the spaceship].

Tree diagram: [S [NP [D\\the] [N\\crew] [PP [P\\from] [NP [N\\Mars]] ] ] [VP [V\\repaired] [NP [D\\the] [N\\spaceship] ] ] ]
Figure 6.4 Tree diagram for The crew from Mars repaired the spaceship.

There are also some cases where a verb can be followed by an adjective phrase (*We are happy.*; *They seem nice.*). There isn’t an easy way to collapse this with our previous phrase structure rule; to account for this we could add a second phrase structure rule for VPs (VP → V AdjP).

We can also formulate phrase structure rules for various modifier phrases, AdvP, AdjP, and NumP.

Adverb phrases consist of an adverb (11a), preceded by an optional degree phrase (11b). You can also sometimes get a PP after an adverb (11c).

(11) a. quickly
b. very quickly
c. quite quickly for a sloth

This is summarized by the phrase structure rule in (12):

(12) AdvP → (DegP) Adv (PP)n

Adjective phrases are similar:

(13) The robot is [AdjP very proud of itself].
(14) AdjP → (DegP) Adj (PP)n

Number phrases are also modified by degree phrases!

(15) a. exactly six
b. approximately 30
c. very many
(16) NumP → (DegP) Numn

The “objects” of adjectives are almost always expressed by PPs—that is, if there’s something in an AdjP that comes after the adjective, it usually can’t be an NP (or a VP), but instead has to be a PP.

Adjectives with NP complements

In English there are a very tiny number of exceptions to the generalization that complements of adjectives are PPs, though the exact number is different for different speakers. The one exception that all English speakers have is the adjective worth. So we can say:

(17) This object is [ worth a lot of money. ]

Here the adjective is followed by the NP [a lot of money]. The NP has to be something that expresses a value.

Some English speakers, including most Canadian English speakers and some in upstate New York and Pennsylvania, have another exception with the deverbal adjectives finished and done:

(18) a. I am [finished my coffee].
b. The children are [done their homework].

If you aren’t from one of those places, you might need to use the verbal perfect (have finished my coffee), or use the preposition with (are finished with their homework).

Some people allow a few more adjectives in this construction, but they’re all deverbal (that is: derived from verbs): started, completed, etc.

Let’s return to phrase structure rules for whole sentences. We’ve already seen that sentences can consist of an NP followed by a VP:

(19) [The robot] [repaired the spaceship].

They can also have an auxiliary (modal or non-modal), and can have
adverbs at the beginning or end.

(20) a. [The robot] will [repair the spaceship].
b. Maybe [the robot] will [repair the spaceship] tomorrow.

You can also put negation in a sentence—though negation always requires an auxiliary in English (if there isn’t already an auxiliary, we apply Do-support):

(21) a. [The robot] didn’t [repair the spaceship].
b. [The robot] hasn’t [repaired the spaceship].

From these we can get a full phrase structure rule with several optional elements, but an obligatory NP subject and an obligatory VP predicate:

(22) S → (AdvP) NP (Aux) (Neg) VP (AdvP)

Phrase structure rules are useful for describing the sequences that can occur in phrases of different types, but neither these rules nor the trees we’ve seen in this section do more than list the elements that can occur in phrases of different types. In the remaining sections of this chapter, we’ll explore a theory that limits possible tree structures to a few basic configurations, with the goal of explaining not only how languages can vary, but also explaining limits on the variation seen across human languages.

6.14 Trees: Introduction


Constituency tests and phrase structure rules provide a useful starting point for thinking about the structure of possible sentences, but they don’t really start explaining why certain structures are grammatical, or predicting what possible and impossible grammars might look like. In this section we introduce X-bar theory, which aims to make stronger predictions by restricting the shape of possible trees. It’s called that because it introduces an extra layer of structure inside phrases called the “bar level”.

To see why we might want to constrain what trees are possible, let’s begin by thinking about a type of structure that’s really easy to describe using a phrase structure rule:

This rule is weird because it describes a noun phrase that would be made up of a verb, followed by an optional adjective, followed by an obligatory PP.

This rule is weird precisely because it’s missing the noun: we already saw in Section 6.3 is that what makes something a noun phrase is precisely that it has a noun inside it. The restriction that all natural languages phrases have heads of the same category is the first limit we’ll put on possible structures in X-bar theory:

And this goes the other way as well: all heads (words) project (or “occur inside”) a phrase of their category:

What this means is that even when a noun or verb—or any other category—doesn’t obviously have any other words in the same phrase as it, it’s still inside an NP or a VP. In other words, while the two sentences in (1) are in one sense very different (one has two words, the other has 11), in another sense they have the same structure: both sentences consist of an NP followed by a VP.

(1) a. Cats sleep.
b. The many very fast spaceships carried a lot of valuable cargo.

By default, in X-bar theory we assume that the same constraints apply to all categories and phrases, and that they apply in all languages. In the absence of evidence to the contrary, we assume that determiners occur inside determiner phrases (DPs), degree words occur inside degree phrases (DegPs), and so on.

The key feature of X-bar theory (and the source of its name) arises from the observation that phrases aren’t just a flat structure.

Our phrase structure rule for NPs, for example, could build NPs that contain a determiner (or DP), a noun, and a PP, but there was no sub-grouping. The tree diagram in Figure 6.5 shows this; the triangle over robots indicates that we have abbreviated structure inside this constituent.

Tree diagram: [NP [ Det \\ a ] [ N \\ picture ] [ PP [ [ P \\ of ] [NP [robots ] ] ] ] ]
Figure 6.5 Tree diagram for [a picture of robots]

What we find if we look at phrases of all types, in many languages, is that head is always in a closer relationship with one other element inside the phrase, than with anything else. Specifically, heads are in a closer relationship with their complement, which in English follows the head. We saw in Section 6.3, for example, that verbs determine whether and how many objects they combine with. Above we saw that adjectives generally combine with PP complements, but that a few adjectives idiosyncratically allow NP complements.

This means that there are units—constituents—inside phrases. So not only do all heads have phrases, and all phrases have heads, but there is what we might call a “mid sized sub-phrase” in every phrase (or an “intermediate phrase”). This mid-sized phrase is called X-bar (written X’), which is where the theory gets its name.

So we expand X-bar theory to the following generalizations, expressed in phrase structure rules:

XP, YP, and ZP are all variables over any category of phrase. These rules can be read as saying:

Every phrase (XP) must have a bar-level of the same category (X’) within it, optionally preceded by another phrase (YP). Every bar-level (X’) must have a head of the same category within it, optionally followed by another phrase (ZP).

The positions occupied by YP and ZP are argument positions, and they have special names. The names for structural relations in trees are adapted from family relationships: parent, child, etc.

Sibling of the head X (child of X’) is its complement
Heads select their complement (including if they take a complement)
The child of XP, sister of X’ is the specifier of the phrase

If we put these labels in the tree in place os “YP” and “ZP” above, we get a general X-bar template for English (specific to English because it includes the linear order found in English).

X-bar schema for English: [ XP [ (Specifier) ] [ X' [ X ] [ (Complement) ] ] ]
Figure 6.6 Generalized X-bar template (for English, head initial)

What is the evidence for bar levels? In the remainder of this section we review the evidence for sub-constituents inside NPs and VPs.

Evidence for N’

The evidence for N’ (“N-bar”) involves showing that a noun is in a closer relationship with a PP that follows it than it is with a previous determiner.

We can show this with constituency tests that target this sub-NP unit. These tests are a bit trickier to apply than the constituency tests covered in Section 6.4, but they follow the same general principle.

Here we will only go through one of these tests: one-replacement. Just as a pronoun can replace a whole NP, the word “one” can (for at least some speakers of English) replace a noun and a following prepositional phrase, leaving behind anything before the noun. Like other kinds of replacement, one-replacement also requires that there’s an earlier NP that “fills in” what’s being replaced.

(2) [NP Yesterday’s launch of a spaceship ] was exciting, but [ today’s one ] was not. (where [one]=[first launch of a spaceship])

By contrast, you can’t replace a determiner and an N with one, leaving the PP behind:

(3) *[NP The launch of a spaceship ] is exciting, but [ one of a mining drone ] is not. (where [one]=[the launch])

This gives us the following overall structure of an NP, showing a closer relationship between the N and a following PP than between either of those and the preceding determiner or possessor.

Tree diagram: [ NP [ NP [ yesterday’s ] ] [ N' [ N [launch] ] [ PP [P' [P [of] ] [NP [a spaceship] ] ] ] ] ]
Figure 6.7 Tree diagram for [ yesterday’s launch of a spaceship ]

Evidence for V’

We can do similar tests to find a constituent inside VP, consisting of the verb and its object. For example, we can elide a verb and its object, leaving a previous AdvP behind, but we cannot elide AdvP + V, leaving the NP object behind.

(4) a. They will [VP quickly build a spaceship], and we will [VP slowly _ ]
b. *They will [VP quickly build a spaceship], and we will [VP _ an orbital station ]
(ungrammatical if what’s missing is [quickly build])

For many speakers the contrast is clearer with do so replacement: do so can replace a verb and its object, but can’t replace an adverb and verb if this strands the object:

(5) a. They will [VP quickly build a spaceship], and we will [VP slowly do so ]
b. *They will [VP quickly build a spaceship], and we will [VP do so an orbital station ]
(ungrammatical if what’s missing is [quickly build])

As with noun phrases, we can represent the fact that the verb and its object form a constituent, to the exclusion of any adverbs, by putting them both under the V’ node.

Tree diagram: [ VP [ AdvP [Adv' [Adv quickly ] ] ] [ V' [ V \\ build] [ NP [DP [D' [ D a ] ] ] [N' [N spaceship ] ] ] ] ]
Figure 6.8 Tree diagram for [ quickly build a spaceship ]

“Empty” bar levels

As with the hypothesis that all heads project phrases, even when there are no other words in the phrase, X-bar theory assumes that all phrases contain at least one bar level, even when it is not needed to host a complement.

So for the sentence in (6), we would have the tree in Figure 6.9, where every phrase has a bar level even though none of the phrases we’ve drawn includes a complement:

(6) The spaceships landed.
Tree diagram: [S [NP [DP [D' [D\\The] ] ] [N' [N\\ spaceships] ] ] [VP [V' [V\\landed] ] ] ]
Figure 6.9 Tree diagram for The spaceships landed.


This tree also illustrates something that’s still missing from our implementation of X-bar theory: we’ve said that every phrase has to have a head, but our sentences are currently headless. In the next section we turn to the proposal that all sentences are projected from a tense head.

6.15 Trees: Sentences as TPs


So far we’ve applied X-bar theory to a range of phrase types. But what about sentences? Up to this point we’ve simply been labelling them as “S”, as in Figure 6.10.

Tree diagram: [ S [ NP [DP [D' [D the ] ] ] [N' [N robot] ]] [ VP [V' [V \\repaired ] [NP [DP [D' [D a ] ] ] [N' [N spaceship] ] ] ] ] ]
Figure 6.10 Tree diagram for The robot repaired a spaceship. (to be revised)

But if a “phrase” is a string of words that form a constituent, then sentences are also phrases—and X-bar theory requires that all phrases have heads, a hypothesis we don’t want to abandon unless we have evidence that it’s incorrect.

What could the head of the sentence be?

Recall that we had a phrase structure rule for sentences like the following:

This rule allows sentences to include an auxiliary between the subject NP and the predicate VP, as in Figure 6.11:

Tree diagram: [ S [ NP [ DP the ] ] [ N' [ N \\robot ] ] ] [ Aux \\will ] [ VP [ V' [ V repair ] [ NP [ DP a ] [ N' [ N spaceship ] ] ] ] ] ]
Figure 6.11 Tree diagram for The robot will repair a spaceship. (to be revised)

This tree has two problems from the perspective of X-bar theory: now not only does the sentence (S) not have a head, but the auxiliary is a head without a phrase! We could simply put the Aux into an AuxP, as we did with determiners, degree adverbs, and so on. But there’s another option open to us: we can solve both the lack of a phrase for Aux and the lack of a head for S in one stroke, by analyzing the auxiliary itself as the head of the phrase:

Tree diagram: [ AuxP [ NP [DP [the ] ] [N' [N robot ] ] ] [Aux' [Aux will ] [ VP [V' [V \\repair ] [NP [DP [a ] ] [N' [N spaceship ] ] ] ] ] ] ]
Figure 6.12 Tree diagram for The robot will repair a spaceship., with the sentence as category AuxP (to be revised)

What if there weren’t an auxiliary, though? Are all sentences AuxPs? No. In fact, if we think about what auxiliaries in English express, they are always inflected for tense. Even in the absence of an auxiliary, we see tense on the main verb, and in nonfinite clauses the nonfinite marker to takes the place on an auxiliary.

Based on this, the proposal in X-bar theory is that sentences aren’t auxiliary phrases, but they are tense phrases (TPs). Tense represents finiteness—we say that sentences when they stand independently are always finite, which is a term meaning that they have tense.

This is an example of a case where the greater technical detail of X-bar theory motivates us to look at sentences and reconsider whether they are projections of some category, just like all other phrases are. In fact, TP is a very nice phrase from the perspective of X-bar theory, because it always has both a specifier (the subject) and a complement (the predicate).

What things are of category T?

So the final version of the tree for The robot will repair the spaceship. is as in Figure 6.13, and the final version of the tree for The robot repaired the spaceship. (without an auxiliary) is as in Figure 6.14.

[ TP [ NP [DP [the] ] [N' [N robot ] ] ] [T' [T will ] [ VP [V' [V repair ] [NP [DP [a] ] [N' [N spaceship ] ] ] ] ] ] ]
Figure 6.13 Tree diagram for The robot will repair a spaceship., with the sentence as category TP (final version)
Tree diagram: [ TP [ NP [DP [the] ] [N' [N robot ] ] ] [T' [T +PAST ] [ VP [V' [V repair ] [NP [DP [a] ] [N' [N spaceship ] ] ] ] ] ] ]
Figure 6.14 Tree diagram for The robot repaired a spaceship., with the sentence as category TP (final version)

What about languages that don’t have tense?

There are different options! We could say that languages that don’t require tense—like Mandarin or Cantonese, for example—don’t have sentences that
are TPs, but instead have some other category. (Can you think of any plasible options?)

The other option is to assume that even though we don’t pronounce tense in all languages, it’s nonetheless the case that something abstract makes a sentence a sentence—something that corresponds to “finiteness”. So even if it doesn’t have the same meaning as English tense, there’s something that does the same grammatical work of anchoring a clause, and gluing the subject and predicate together.

This second option is fairly widely assumed, in the type of syntactic theory that we’re learning in this class (descendants of X-bar theory). People sometimes use the label “Inflection Phrase” (InflP or IP), but it’s also common to simply use the label “TP” even if you’re assuming that the semantic content of this functional phrase might vary.

X-bar theory and language variation: Head direction

We saw in Section 6.3 that languages can vary systematically in their basic word order, and characterized some differences in terms of the relative order of heads and their complements.

This analysis is very easy to encode in X-bar theory, by a simple switch in the X-bar template of languages of the two types.

Recall the basic shape of phrases of several categories in English, illustrated in the trees in Figure 6.15.

(1) a. I [VP ateV [NP an apple ].
b. [PP toP [NP Toronto ]
c. [NP pictureN [PP of a robot ]
Tree diagrams: [VP ate [NP an apple] ], [PP of [NP Toronto]], [NP picture [PP of robots]]
Figure 6.15 Tree diagrams showing head initial word order in English

In contrast to English, Japanese is a strictly SOV language. And in Japanese, heads always follow their complements, the reverse of the order we get in English.

What X-bar theory allows us to say is that phrases in Japanese have the
same structure as phrases in English, but a different order.

Specifically, in Japanese complements are still the siblings of their heads, but they precede the head instead of following it, as illustrated for the examples in (2) in Figure 6.16.

(2) a. Watasi-wa [VP ringo-o tabe-ta. ]
I-TOPIC apple-ACC eat-PAST
“I ate (an) apple.”
b. [PP Tokyo e ]
Tokyo to
“to Tokyo”
c. [NP robotto no shasin ]
robot of picture
“picture of (a) robot”
Tree diagrams: [VP [NP ringo-o] tabeta], [PP [NP Tokyo] e], [NP [PP robotto-no] shasin]
Figure 6.16 Tree diagrams illustrating head final word order in Japanese

If we draw a tree for Japanese, we would extend this template to TP, as well as all the other phrases we’ve looked at, as shown in Figure 6.17.

Tree diagram: [TP [NP Watasi-wa] [VP [NP ringo-o] tabe-ta ] +Past ]
Figure 6.17 Tree diagram showing a head-final TP structure in Japanese

When you’re drawing a tree for another language, it’s important that the words come in the right order if you read the words off the bottom of the tree! If you’re analyzing an unfamiliar language, and need to figure out its word order, one of the first questions you should ask is whether it appears to be head initial or head final.

In contrast to complements contrast, specifiers don’t show the same variation. They always come before their complements, across all known languages.

When analyzing a new language, the starting assumption is that all structural relations are the same, but that linear order and the distribution of silent functional heads may be different. Beginning in Section 6.19, we will also see the possibility that languages may exhibit different types of movement.

6.16 Trees: Modifiers as adjuncts


When we introduced X-Bar theory, we gained the ability to represent the asymmetric relationship between heads and their complements on the one hand, and heads and their specifiers on the other hand.

At the same time, with X-Bar structure as we’ve had it so far, we lost a bit of empirical coverage that we’d been able to include in phrase structure rules: we lost a place to put modifiers:

With adjuncts we expand X-Bar structure to accommodate modifiers.

The basic idea of adjuncts is that while there can only be one head in a phrase, and there can only be one phrase (because it’s the final projection of a head), a bar level is a “mid-sized phrase” or “partial phrase”, and in principle there can be many partial phrases within a larger phrase.

Let’s see how this works in practice. Consider the noun phrase (NP) in (1).

(1) [NP the early arrival of spring ]

This NP contains a modifying adjective phrase [AdjP early ]. Without that AdjP, the structure would be as shown in Figure 6.18.

Tree diagram: [NP the arrival of spring]
Figure 6.18 Tree diagram for [the arrival of spring], showing that both Specifier and Complement positions are occupied.

In this NP both the specifier and complement positions are filled, so there’s no more space for the adjective phrase [AdjP early ].

By adding additional bar levels, we can create structural “space” for modifiers. These positions are neither specifiers nor complements, instead they are adjuncts.

A constituent that is both the child and sibling of X’ is an adjunct.

Unlike specifiers and complements, adjuncts are flexible in their position: they can appear on either the left side or the right size of a phrase structure.

Figure 6.19 illustrates how an additional N’ creates space for [AdjP early ] to appear as an adjunct.

Tree diagram: [NP the early arrival of spring]
Figure 6.19 Tree diagram showing [AdjP early] as an Adjunct to N’, with the extra N’ level creating “space” for the modifier.

The same expansion of X-Bar structure gives us space within an NP to represent two PPs after the head noun, as in [NP a letter [PP from home ] [PP in the mailbox ] ]. If we run our one-replacement test, we can show that letter can be replaced by one, leaving either both PPs behind, or leaving just the second one behind. If one replaces an N-bar constituent, this means that there must be an N-bar that contains letter but not either of the PPs.

(2) a. I saw a letter from home in the mailbox, and one from the bank on the table.
b. I found that letter from home in the mailbox, and this one on the table.

The tree showing both PPs as adjuncts within NP appears in Figure 6.20.

Tree diagram for [NP the letter from home in the mailbox]
Figure 6.20 Tree diagram for [ the letter [PP from home] [PP in the mailbox], showing both PPs as Adjuncts.

Adverbs within verb phrases are also adjuncts. We’ve already seen that adverbs can go either at the beginning or end of verb phrase, as in (3a-b); we can also get more than one adverb in a verb phrase, as in (3c).

(3) a. They [VP [AdvP quickly] left the room]
b. They [VP left the room [AdvP quickly]].
c. We [VP [AdvP deliberately] left the room [AdvP slowly]].

Adverbs appearing in adjunct positions to the left and right of VP are shown in Figure 6.21.

Tree diagram: [VP deliberately left the room slowly]
Figure 6.21 Tree diagram for [VP [AdvP deliberately] left the room [AdvP slowly]], showing both adverbs as Adjuncts to V’.

All adverbs occur in adjunct positions, as do all adjective phrase inside NP. (Predicate adjectives, as in The book is long. are complements of a verb.)

PPs sometimes occur as complements, and sometimes as adjuncts—we’ve seen examples of both in this section. constituency tests like replacement with one (for N’) and do so (for V’) are very useful for figuring out if a particular PP is a complement or an adjunct.

6.17 Trees: Structural ambiguity


When we talked about compounds, we saw a first example of structural ambiguity: cases where the same string of morphemes can have more than one structure, with each structure corresponding to a different

The same thing is found in syntax. Consider the following example:

(1) I saw someone with a telescope.

This has two possible interpretations:

  1. I was using a telescope, and I saw someone. (PP modifies VP)
  2. I saw someone, and that person had a telescope. (PP modifies NP)

In the first interpretation, the prepositional phrase [PP with a telescope] modifies the verb phrase headed by saw. In the second interpretation, the same prepositional phrase modifies the noun phrase someone. These two structures are illustrated below:

Tree diagram: [I saw someone with a telescope], [with a telescope] is child and sibling of V'
Figure 6.22: Tree diagram showing [PP with a telescope] as an Adjunct of the verb, meaning “I used a telescope to see someone”.
Tree diagram: [I saw someone with a telescope], [with a telescope] is child and sibling of N' above [N someone]
Figure 6.23: Tree diagram showing [PP with a telescope] as an Adjunct of the NP object, meaning “I saw a person and that person had a telescope”

The same will be true for other cases of structural ambiguity—each meaning will correspond to a different potential tree structure.

6.18 Trees: Embedded clauses


In Section 6.6 we observed that complementizers allow clauses to be embedded—that is, to be complements of a verb.

Following our principles of X-bar structure, this means that the complementizer (C) must project a CP. Because verbs can select whether they take an embedded clause, this CP should be the complement of the verb, and should take TP as its own complement.

Recall the sentences we looked at in Section 6.6:

(1) Deniz said something.
(2) Samnang might leave.

Based on the principle of X-bar theory we have seen in previous sections, these would correspond to the tree structures in Figure 6.24.

Tree diagrams: [TP Deniz [VP [V said] [NP something]]] and [TP Samnang might leave]
Figure 6.24 Tree diagrams for two TP clauses, [Deniz said something.] and [Samnang might leave.]

Now we can ask: how do these two clauses fit together when (2) is embedded below the verb said of (1), as in the complex sentence in (3)?

(3) Deniz said that [Samnang might leave].

In addition to what we said in previous sections about the fact that some verbs select CPs as their complements, here’s no space to put the C head, or the CP it’s part of, within either the embedded TP to the main clause VP. Instead, we put the TP inside the CP, as in the tree in Figure 6.25.

Tree diagram: [TP Deniz [VP [V' [V said] [CP [C that] [TP Samnang might leave]]]]]
Figure 6.25 Tree diagram for an embedded clause structure. The embedded clause is a CP that is the sibling of the main clause V


What if the complementizer [C that ] is missing, as in (4)?

(4) Deniz said that [Samnang might leave].

In this case we’d wouldn’t say that there’s no CP, but that in English the C head can be null. (In Section 6.19 we’ll see other cases where we assume there is an empty C head, because its specifier position is filled.) The tree for (4) would be just like the tree in Figure 6.25, but with no word in the C head.

In Section 6.6 we saw that embedded Yes-No questions are structurally similar to embedded declaratives, but with the complementizer if or whether. Since they have similar structures, their trees are also fundamentally the same, with if or whether occurring in a C head. Thus the tree for a sentence like (5) is as in Figure 6.26.

(5) Eryl wonders if [ghosts exist].
Tree diagram: [TP Eryl [VP [V' [V wonders] [CP [C whether +Q] [TP ghosts exist]]]]]
Figure 6.26 Tree diagram showing an embedded Yes-No question. The CP headed by a +Q C is the sibling of the main clause V


Both that and ∅ are declarative C heads, which we could represent with a [–Q] feature, while both if and whether are interrogative C heads, represented with a [+Q] feature. Verbs select whether they compose with [-Q] CPs, [+Q] CPs, both, or neither.

Finally, in Section 6.6 we also introduced embedded nonfinite clauses. These have a complementizer that is either ∅ or for, which we could call [–FIN] complementizers. They also differ in that the head of T is the nonfinite marker to.

(6) I want for [ghosts to exist].
Tree diagram: [TP I [VP [V' [V want] [CP [C for -FIN] [TP ghosts [T' [T to] [VP exist] ] ] ] ] ] ] ]
Figure 6.27 Tree diagram for an embedded nonfinite clause. The CP headed by -FIN C is the sibling of the main clause V; the embedded TP is headed by to

6.19 Trees: Movement


X-bar theory: Subject-Aux Inversion as Head Movement

The first transformation we saw, in Section 6.7, was Subject-Auxiliary Inversion, which reverses the order of the subject and the auxiliary.

Thinking not in terms of the linear order of the subject and the auxiliary, but instead in terms of our X-Bar structure, could we state this transformation more precisely?

The tree for [They have left.], an ordinary declarative clause, will be as in Figure 6.28.

Tree diagram: [TP [NP They] [T have] [VP left]]
Figure 6.28 Tree diagram for the declarative sentence [They have left.]

The structural relations in this tree encode the grammatical relations between the subject, the clause as a whole, and the predicate. Those relations should not be fundamentally different in a question. We just want to add a difference in the order of constituents, in order to mark that this is a question.

The simplest way to change the order of the subject and the auxiliary is to move one of them. We could either move the auxiliary up and to the left, or move the subject down and to the right.

If we think about embedded questions, which we developed an X-bar theory analysis for in Section 6.18, these had a +Q complementizer above the TP, if or whether. This complementizer is in the same position that the auxiliary appears in in main clause questions: right before the subject. This gives us a way to understand Subject-Auxiliary Inversion as movement of the auxiliary from T up and to the left, to land in C. This is illustrated in Figure 6.29.

Tree diagram: Pre-movement [CP [C' [C +Q] [TP they [T have] left] ] ] Post-movement [CP [C' [C +Q have ] [TP they [crossed out T crossed out have ] left ] ] ], arrow from T to C
Figure 6.29 Tree diagrams for the question [ Have they left? ] before and after T-to-C Head Movement

The movement in Figure 6.29 is an example of Head Movement, which changes a tree by moving a head to the next head above it.

Head Movement:
Movement of a head (X) into the next higher head position.

We can now restate the generalization about how Yes-No Questions are formed in English main clauses. To name an instance of head movement, you can identify the start and end points. So the movement we see in English main clause questions is called T-to-C movement.

Yes-No Question Formation in English:
Yes-No Questions are formed by moving the auxiliary in T to C.

This is a derivational way of representing the relationship between a fronted auxiliary and the position it occupies in statements: we start with one tree structure, and make a change to it in order to arrive at the final structure. There are other ways to represent this dependency, some of which are pursued in non-derivational approaches to syntax.

Notation for Head Movement

In the history of generative linguistics, there have been several different notations used for movement. In this textbook we draw a line through the moving head in its base position (like this), and draw an arrow to the position it moves to.

There are other ways of indicating movement, which you might encounter online or in other resources. These include trace notation, where the original position of the moved element has a “trace” (written t) left in it. This can be thought of as a variable, or as the empty space left
behind by the thing that moved. Trace notation won’t be used in this textbook, but we mention it so that you won’t be confused if you see it elsewhere.

X-bar theory: Question word fronting as Phrasal Movement

As we saw in Section 6.8, content questions in English also involve a change in word order from corresponding statements. However, we’ll see in this section that we can’t describe that change just in terms of head movement. Instead, we’re going to introduce a second (and final) type of movement: Phrasal Movement.

Recall some examples of content questions in English:

(1) a. What has the squirrel hidden?
b. Where is it snowing?
c. When was it snowing?
d. How do squirrels hide nuts?

All these questions involve Subject-Aux Inversion, which we analyzed earlier as T-to-C movement when looking at main clause Yes-No questions. We can tell this has applied because the auxiliary is before the subject in all the content questions in (1).

But we can’t use T-to-C movement to analyze how the content question word gets to the front of the sentence for two reasons:

  1. The auxiliary is already in C. We can’t put two words in one head, so we need to put the WH word somewhere else—and somewhere higher up.
  2. The thing that moves to the front of the sentence in a WH-question isn’t just a head, it’s a whole phrase.

How can we tell that what moves is a whole phrase? We can tell by looking at a wider range of content questions.

(2) a. What kind of nuts has the squirrel hidden?
b. Which city is it snowing in?
c. Which nuts did the squirrels hide?

Here instead of the single word what or where, we have larger NPs moving to the front of the question—though these larger NPs still contain content question words. Here what and which are determiners, occurring in the same position that this or the or a would occur.

So we know that the content question phrase isn’t pronounced in the C head in content questions. Where is it pronounced, then?

To answer this question, let’s consider again word order for the statement The squirrel has hidden nuts. The auxiliary has is in T, and the object nuts is the complement of the verb hidden. We can represent this in a labelled bracket structure:

(3) [TP [NP the squirrel] [T’ [T has ] [VP hidden nuts ] ] ]

In the content question, what changes is that we have what as the object of hidden, instead of nuts. We also have a +Q C head above TP, because that’s where the auxiliary in T moves. We can schematize the structure before we do any movement as in Figure 6.30. (The tree before any movement occurs is called Deep Structure in some theories of syntax, though we won’t focus on that terminology here.)

Tree diagram: [CP [C' [C +Q] [TP [NP the squirrel] [T' [T has] [VP [V hidden] [NP what] ] ] ] ]
Figure 6.30 Tree diagram for the question [What has the squirrel hidden?] prior to any movement

Now we need to transform this clause so that the question phrase appears in initial position, at the beginning of the sentence. This isn’t head movement, it’s Phrasal Movement, also referred to as XP Movement. A phrase can’t go in a head position, but it can move to the empty Specifier position in CP.

Phrasal Movement:
Movement of a phrase (XP) into a higher specifier position.

This type of Phrasal Movement is known as WH-movement; Phrasal Movement is usually named for the type of phrase that moves.

Move a WH-phrase from its original position into Spec,CP.

Figure 6.31 shows what the tree structure will look like after both T-to-C Movement and WH-movement have applied.

Tree diagram: [CP [NP what] [C' [C +Q have] [TP [NP the squirrel] [crossed out T crossed out have] [VP [V hidden] [crossed out NP crossed out what] ] ] ] ] Arrows from T to C and from lower what to higher what
Figure 6.31 Tree diagram for the question [What has the squirrel hidden?] after both T-to-C and WH-movement

What does it look like when we have a complex NP moving to Spec,CP? Basically the same, as shown in Figure 6.32. This tree also shows the auxiliary did in C, inserted as a result of Do-Support:

Tree diagram: [CP [NP [DP what] [N' kind of nuts] ] [C' [C +Q did ] [TP [NP the squirrel] [crossed out T+PAST] [VP [V hidden] [crossed out NP what kind of nuts ] ] ] ] ] Arrows from T to C and from lower "what kind of nuts" to "what kind of nuts in Spec,CP
Figure 6.31 Tree diagram for the question [What kind of nuts did the squirrel hide?] after both T-to-C and WH-movement

Embedded content questions, which we saw in Section 6.9, have very similar tree structures. They are like main clause content questions in putting the WH-phrase at the front of the CP, but unlike main clause content phrases in that they don’t do Subject-Auxiliary Inversion (T-to-C movement).

What would this look like in a tree? Consider this embedded content question:

(4) I know [CP what squirrels hide].

The tree for this sentence would be as in Figure 6.32.

Tree diagram: [TP [NP I ] [T' [T -PAST] [VP [V' [V know] [CP [NP [N' [N what] ] ] [C' [C +Q] [TP [NP squirrels] [T' [T -PAST] [VP [V' [V hide] [crossed out NP nuts] ] ] ] ] ] ] ] ] ] ] arrow from lower "what" to "what" in embedded Spec,CP
Figure 6.31 Tree diagram for embedded content question I know what squirrels hide.

Notice that the embedded C is empty! In many varieties of English, when you get a phrase in embedded Spec,CP, it’s impossible to also have an overt auxiliary. So sentences like (E) are always ungrammatical, even though if is a +Q complementizer.

(5) *I know [CP what if squirrels hide].

This isn’t true in all languages! In many languages WH-movement is totally compatible with an overt complementizer; we saw this for Japanese content questions in Section 6.8.

6.20 Trees: Movement beyond questions


Head Movement outside questions: V-to-T movement of auxiliaries

Based on the discussion so far, you might think of movement as something that we only find in questions. But that isn’t the case! It happens that questions are one of the places that we clearly see movement in English, but both Head Movement and Phrasal Movement can be found in other contexts as well.

In this section we’ll see evidence that auxiliaries like be and have start out lower than T and move to it via Head Movement, then evidence that the same is true for all verbs in a language like French.


The following sentences all have one auxiliary in them:

(1) a. The leaves will fall.
b. The leaves have fallen.
c. The leaves are falling.

We saw in Section 6.5 that auxiliaries all have the same distribution in English sentences, a distribution that is different from main verbs: they appear before negation and participate in Subject-Auxiliary Inversion (T-to-C movement). They also appear before adverbs like always, as in (2):

(2) a. The leaves will always fall.
b. The leaves have always fallen.
c. The leaves are always falling.

We explained this similarity in distribution—and the way the auxiliaries are all different from main verbs—by analyzing all the auxiliaries as belonging to a single syntactic category: T.

But it turns out that the picture is a bit more complex. There’s a difference between modals (and nonfinite to) on the one hand, and all the other auxiliaries on the other.

First, for many speakers of English, modals (and to) cannot stack—you always get exactly one of them.

(3) a. The leaves will might fall. (cf. will maybe fall)
b. The leaves must can fall. (cf. must be able to fall)

By contrast, have and be can stack, with a modal or with each other. And the order is always the same: the modal must be the highest auxiliary, the one that shows the distribution that we associated with the head T.

(4) a. The leaves will have fallen. (Future + Perfect)
b. The leaves will be falling. (Future + Progressive)
c. The leaves will have been falling. (Future + Perfect + Progressive)
d. The leaves have been falling. (Perfect + Progressive)
Some varieties of English, including Southern American English, and also some varieties of Scots do allow more than one modal in a clause. For these varieties, we might have a slightly different analysis of where modal auxiliaries start out, and whether any of them also move to T in declarative clauses. In these other varieties, it’s still the case that all modals come before have and be auxiliaries.

If we check all these sentences for the distributional properties that we’ve associated with being in T—being before negation + adverbs like always, undergoing Subject-Auxiliary Inversion—it turns out that only the first auxiliary passes those tests. All the subsequent auxiliaries suddenly have the same distribution of main verbs. Let’s see this for the Future + Perfect:

So where is the second auxiliary—or in the Future + Perfect + Passive, where is the third auxiliary?

Proposal (for English)
Only tense features, the modals, and nonfinite to start out in T—that is, only these morphemes truly belong to the functional category T. All other auxiliaries move to T, but they only do so if that T isn’t already filled by a modal or to.

So when there’s a modal in T, the lower auxiliary will appear in an extra VP layer—sometimes called a VP “shell”. (We could also label this phrase AuxP, or even PerfectP or ProgressiveP, but for simplicity we’ll call it VP here.) This is illustrated in Figure 6.34.

Tree diagram: [TP [NP the leaves] [T' [T will] [VP_prog [V' [V be] [VP falling ] ] ] ] ]
Figure 6.34 Tree diagram for [The leaves will be falling.], showing a progressive VP shell

But if there’s nothing in T—or rather, if all that’s in T is a tense feature—the auxiliary verb will move from V to T, as illustrated in Figure 6.35.

Tree diagram: [TP [NP the leaves] [T' [T are] [VP_prog [V' [crossed out V are] [VP falling ] ] ] ] ], arrow from [V are] to [T are]
Figure 6.35 Tree diagram for [The leaves are falling.], showing a progressive VP shell and movement of auxiliary are to T

Very few verbs move in most contemporary varieties of Modern English. Only be (as an auxiliary and as a main verb copula), and have (only as an auxiliary) show evidence of moving to T.

The same isn’t true in other languages, necessarily. For example in French (and in earlier stages of English), we have reason to think that all verbs move to T.


In contemporary English it’s only auxiliaries that ever appear in T—main verbs always show a different distributions. But in French—and in earlier stages of English—when there’s no auxiliary the main verb also appears in the T position.

French auxiliaries, like English auxiliaries, show up before negation, before auxiliaries like toujours (“always”), and can undergo Subject-Aux inversion (though only with pronominal subjects, and even then it isn’t very natural in casual speech for most speakers).

(5) Les feuilles ont tombé.
the leaves have fallen
“The leaves fell/have fallen.”
(6) Les feuilles (n’)ont pas tombé.
the leaves (NEG)have NEG fallen
“The leaves have fallen.”
(7) Les feuilles ont toujours tombé.
the leaves have always fallen
“The leaves always fell / have always fallen.”
(8) Ont-ils tombé?
Have-they fallen
“Have they fallen?.”
Negation in French is traditionally described as involving a ne before the tensed verb, and a pas after the verb—kind of like a circumfix. But in spoken French in both Quebec and France, the ne is almost never pronounced, and so we have marked it as optional in this example.

What’s different about French is that main verbs show exactly the same distribution—whereas English verbs are after negation and adverbs, and can’t do Subject-Aux inversion (instead they require the support auxiliary do):

(9) Les feuilles (ne) tombaient pas.
the leaves (NEG)fell NEG
“The leaves didn’t fall/weren’t falling.”
(10) Les feuilles tombaient toujours.
the leaves fell always
“The leaves always fell / were always falling.”
(11) Tombaient-ils?
“Did they fall? / Were they falling?.”

English verbs do not have the same distribution as auxiliaries—though they did in Early Modern English, ca. 1600s).

(12) *The leaves fell not.
(13) *The leaves fell always.
(14) *Fell the leaves?

We can analyze this difference in word order between English and French by saying that while in English only be and auxiliary have move to T, in French all verbs undergo V-to-T movement. This is illustrated for (10) in Figure 6.36.

Tree diagram: [TP [NP les feuilles] [T' [T tombaient] [VP [AdvP toujours] [crossed out V tombaient] ] ] ] arrow from V to T
Figure 6.36 Tree diagram for [Les feuilles tombaient toujours.], showing a movement of the main V to T

We’ve now introduced two types of movement in our theory:

Though we find them both in English questions (as T-to-C and WH-movement, respectively), what we see in English auxiliaries and with all French verbs is that these movement types can be found in other
contexts as well—and that languages can differ in what types of movement they exhibit.

Head movement and phrasal movement in passives

So far we’ve talked about how to identify passives—but what is their syntax like? Remember the pair of active and passive sentences we saw in Section 6.11:

(15) a. The pirates sank the ship. (active)
b. The ship was sunk (by the pirates). (passive)

In a theory of syntax that employs movement, the natural way to think about the passive is to say that its syntax (e.g. the presence of the passive be) prevents the subject from being introduced in the first place, leaving an empty position (indicated by an underscore).

(16) [TP _ was sunk [the ship] ]

Then because English is a language that always requires a subject, in Spec,TP, something needs to be done to fill that empty position. This is done by moving the object NP into the subject position:

(16) [TP [the ship] was sunk the ship ]

This is a new case of phrasal movement: movement of an NP into subject position.

NP movement:
Move an NP into Spec,TP, to fill an otherwise-empty subject position.

We start with the theme argument the ship as the complement of the verb, and the passive auxiliary be in a VP shell. To get the correct output, we apply two instances of movement:

  1. The passive auxiliary moves to T: V-to-T movement
  2. The object NP [the ship] moves to the subject-position in Spec,TP: NP-movement

The result of these two steps of movement is illustrated in Figure 6.37.

Tree diagram: [TP [NP the ship] [T' [T was] [VP_pass [crossed out V was] [VP sunk [crossed out NP the ship] ] ] ] ] arrow from [V was] to T, and from lower [the ship] to Spec,TP
Figure 6.35 Tree diagram for [The ship was sunk.], showing a passive VP shell, movement of auxiliary was to T, and NP movement of the passive subject

This section has illustrated our final tool in accounting for word-order differences across languages: not just the parameters of head-initial vs. head-final ordering, but also what types of movement arise in what contexts.

6.21 Trees: Summary


We’ve now expanded our theory of syntax a little bit further. It now consists of X-Bar Theory as well as two types of Movement.

X-Bar Theory accounts for the overall shape of trees in individual languages—it describes possible and impossible tree shapes for a given language.

Movement is a theory about how you can change (or transform) an existing syntactic tree once you have built it. Adding movement to our tree allows us to expand the explanatory power of our syntactic theory in two ways:

  1. We have a new tool for talking about differences across languages in terms of word order: while X-Bar Theory offers the variation of head-initial vs. head-final word order, Movement allows us to say that languages transform their basic word order in different ways (or in different contexts).
  2. We can talk about relationships between different sentence types—between statements and questions, or between the basic order of a sentence and one where some phrase has been topicalized or fronted.

By introducing movement into our theory, we have a way of talking about the fact that elements are sometimes displaced: they are pronounced in a different position than they “belong”, in some sense.

With the tools we’ve developed in this chapter, we could investigate relationships between many more types of sentences, both in English and in any other language. In Chapter 7 we’ll also see that syntax is relevant for the computation of certain types of linguistic meaning (though not all types).

Chapter 7: Semantics


This chapter is about linguistic meaning, particularly semantics: how the meaning of words combine to form the meaning of sentences. We will start by examining lexical meaning: what goes into the meaning of a word and other smaller linguistic expressions stored in your mental lexicon. We will examine various theories of lexical meaning and evaluate the pros and cons of each one. The latter half of the chapter focuses on case studies of linguistic meaning across categories and across languages, and along the way, we will think about what it means for the meaning of one word to combine with the meaning of another word. We will examine various data across categories and across languages in order to appreciate the complexity of human semantic competence.


When you’ve completed this chapter, you’ll be able to:

  • Acknowledge the plurality of linguistic theories concerning the status of the lexicon, explain the differences between each theory, and evaluate the pros and cons of each theory;

  • Explain why the dictionary is not the ultimate authority of linguistic meaning;

  • Analyse linguistic meaning critically based on descriptive observations;

  • Gain a general understanding of what kinds of concepts lexical meaning encodes in language;

  • Explain the difference between sense and denotation;

  • Use diagnostics to identify entailments, implicatures, and presuppositions;

  • Evaluate the usefulness of each kind of meaning in linguistic analysis;
  • Appreciate the complexity and diversity of linguistic meaning.

7.1 Linguistic meaning


One or more interactive elements has been excluded from this version of the text. You can view them online here:

We use the word “meaning” in various ways in our everyday lives. Consider (1)-(4).

(1) In Japanese culture, what does it mean when a tea stalk floats vertically in your green tea?
takai mishin-o katte-mo tsukaikata-ga wakaranakereba imi-ga nai
expensive sewing.maching-ACC buy-even.if understand.NEG.if meaning-NOM NEG.exist
‘There is no point in buying an expensive sewing machine if you don’t know how to use it’ (Japanese)
(3) I said coffee is just as tasty as tea, but I didn’t mean it.
(4) Ode’imin and strawberry mean the same thing.
(Examples inspired by Bach 1989)
Japanese green tea in a traditional Japanese-style, blue and white checkered tea cup. A single tea stalk floats vertically in the tea.
Figure 7.1. A tea stalk floating vertically in a cup of green tea.

In this chapter (Chapter 7) and the next chapter (Chapter 8), we