1.7 Do ChatBots have Language?

Catherine Anderson; Bronwyn Bjorkman; Derek Denis; Julianne Doner; Margaret Grant; Nathan Sanders; Ai Taniguchi

Chapter 1: Human Language and Language Science

1.7 Do ChatBots have Language?

In the previous sections we learned that generativity is a key attribute of human grammar. You have some idea in your mind, and you do something with your body (your voice or your hands) to generate a signal and send it out into the world. Your friend receives that signal with their senses, and they interpret it. And if all goes well, their interpretation is pretty similar to the idea you started with. And they can do the same in return. You can both generate an infinite number of new words and new sentences all the time, and you can trust that you’ll understand each other because you share a mental grammar.

But lately that word generative also gets used a lot in the phrase “Generative AI”, and it kind of feels like ChatGPT and Claude and Gemini are generating human language, doesn’t it? Let’s look more closely at what so-called generative AI can do.

In their book The AI Con, Emily Bender and Alex Hanna (2025) point out that the label “artificial intelligence” is pretty vague, and gets applied to a bunch of different computerized tasks. The main thing these functions have in common is that they have typically relied on human judgment. The term “AI” is used to refer to systems that:

make decisions, like “should we offer this candidate an interview, based on their resume”
classify inputs, like when you upload a photo and the app automatically tags you and your friends
make recommendations: users who listened to these songs also like these other artists
translate from one language to another, or from speech to text
generate images or text.

Let’s focus on this last one, text generation, since this is Linguistics class. How does text generation work? It relies on Large Language Models. A Large Language Model is a huge database of language scraped from millions of sources: from Wikipedia and novels and newspapers and podcasts and probably from this textbook. Most of the creators of the works that are used to train LLMs did not consent to have their work used in this way. In other words, the training data is pretty much all plagiarized or stolen.

It’s important to know that an LLM has no semantics: it has no ideas, no concepts, no meaning. All it knows about language is how frequently words tend to co-occur. So when you type in a prompt about, say, mushrooms, the LLM does a bunch of statistics and finds a bunch of other words that are mentioned alongside mushrooms in its training data. It combines those words in a way that is statistically plausible and gives you a response.

When you get that response, you can’t help but interpret it! Humans have so much experience assigning meaning to a physical language signal that it’s almost impossible to turn it off. It’s kind of like the phenomenon of pareidolia, where we interpret anything even vaguely face-shaped as a face.

So you typed in your mushroom prompt and the machine synthesized some sentences related to mushrooms that you interpreted. But it doesn’t know anything about mushrooms. All it knows is that these words that it spit out occurred frequently near the word mushroom in its training data. Do you really want to trust your mushroom dinner to the statistics in the training data?

Michael Townsen Hicks and his colleagues put it really bluntly in a paper published in the journal Ethics and Information Technology in 2024. They conclude:

“it’s not surprising that LLMs have a problem with the truth. Their goal is to provide a normal-seeming response to a prompt, not to convey information that is helpful to their interlocutor”
(Hicks et al., 2024).

The title of their paper is even more blunt: “ChatGPT is bullshit”.

So let’s compare human language with the output of Large Language Models:

Human language is generative in the sense that users can always do new things with it.

It’s governed by systematic principles, which we’ll explore throughout this whole book.

Language is shared among humans: we use it to communicate meaning to each other, to share our ideas and feelings, and to understand each other.

On the other hand, Large Language Models are combinatorial.

Using the frequencies of words in their training data, they can produce strings of words that behave like sentences.

But their sentences have no meaning to them: they’re just combinations of word forms. The only meaning their outputs have is the meaning that humans attribute to them.

So when you encounter the term generative AI, ask yourself what is meant by generative. For that matter, what is meant by intelligence? And remind yourself that the word artificial is the most accurate part of that term.

Check your understanding

References

Bender, E. M., & Hanna, A. (2025). The AI Con: How to fight big tech’s hype and create the future we want. HarperCollins.

Hicks, M. T., Humphries, J., & Slater, J. (2024). ChatGPT is bullshit. Ethics and Information Technology, 26(2), 38.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Essentials of Linguistics, 2nd edition Copyright © 2022 by Catherine Anderson; Bronwyn Bjorkman; Derek Denis; Julianne Doner; Margaret Grant; Nathan Sanders; and Ai Taniguchi is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.

Check your understanding

References

License

Share This Book