"

Catherine Stinson

What are Large Language Models Made of?

Catherine Stinson
School of Computing & Philosophy Department
Queen’s University

 

Introduction

Knowledge has increasingly become “virtual”. Students no longer look through card catalogues or browse the stacks in libraries. We don’t pore over do-it-yourself manuals, or maps. We look things up with google, learn skills from YouTube, and a disembodied GPS voice tell us when and where to turn. In many ways life is better without those weighty encyclopedia sets, shelves of cookbooks, newspaper subscriptions, and record collections that used to take up so much (shelf) space in our lives. It’s convenient to carry around all of that knowledge in our pockets. The invisibility of that knowledge, reflected in the language used to describe its whereabouts —the net, the web, the cloud— lends it an air of immateriality (Hu, 2015). However, the cloud where the world’s knowledge is now stored isn’t quite as ethereal as it sounds. The cloud and the services we draw from it (when we stream a movie, read our email, get directions) is made of tangible, material stuff, and that stuff needs to come from somewhere.

 

This is the first sense of “what is it made of?” that I want to explore here: material existence. What raw materials, energy needs, continued existence as waste products, and labour go into making applications like ChatGPT? Knowing something about the extent of its physical effects should inform our decisions about whether to use it and how to use it. I’ll also explore the question “what is it made of?” in a second sense: the mechanisms under the hood. Having at least a basic understanding of how it works also should inform our decisions about whether and how two use it. Finally, I’ll explore the question “what is it made of?” in a critical sense: does it have the right stuff? Is it any good? This too should inform decisions about whether and how to use it, but also how worried we should be about it.

 

Some smart, well-informed people have claimed that the most advanced artificial intelligence models we have now are sentient (Vallance, 2022), or that they are capable of thought (Rothman, 2023), and people have started to ring alarm bells about the robot takeover that science fiction has long fantasized about (Pause giant AI experiments: An open letter, 2023). Should we be afraid of Alexa? Where I land on this question is that, because of what I know about the mechanisms under the hood, I’m not worried about Alexa killing us all. I am, however, worried about the cloud killing us all, and I’m not convinced that what we get in return is worth that price.

What are LLMs made of, materially?

Services like ChatGPT, Bing search, and Google translate are all applications built on top of large language models (LLMs). The “large” refers to the size of the model, measured in the number of parameters, which is a difficult metric to grasp without getting into technical details, but corresponds to the amount of storage space needed to house the model on a supercomputer, and the amount of processing power needed to build the model, then to run the model each time you ask it a question.

 

As of 2022, the models had grown to hundreds of billions of parameters, and they have kept growing. In a 2023 workshop at NYU, Ida Momennejad from Microsoft Research said “the carbon footprint of training one of these LLMs is like two trips to the moon, literally” (NYU Center for Mind, Brain, and Consciousness, 2023). LLMs are astronomically large. Momennejad was referring to a report by researchers at Google and the University of California, Berkeley (Patterson et al., 2021) that gave detailed estimates of the power consumption and carbon emissions of various LLMs, taking into account the locations of data centers, how the electricity they use is produced, and the potential effects of greener energy sources. They calculated that training GPT-3, which ChatGPT is based on, had the same energy consumption and carbon emissions as taking 550 round trip flights between New York and San Francisco (see Stokel-Walker, 2023). The cloud now has a bigger carbon footprint than the airline industry (Gonzalez Monserrate, 2022). About 40% of this energy consumption comes from the need to cool server rooms so that the computers don’t overheat. More energy efficient cooling can be done using water, but then water use can become an issue, especially in dry places like Arizona and Utah, where data centers are creating water shortages for locals. Relocating data centers to cold places is one option, but those places tend to be farther away from most internet traffic, so that slows down cloud services (Gonzalez Monserrate, 2022).

 

The costs of these massive supercomputer clusters are likewise astronomical. Yann LeCunn, one of the pioneers of deep learning, said in an interview that continuing advances in artificial intelligence are not sustainable: “If you look at top experiments, each year the cost is going up 10-fold. Right now, an experiment might be in seven figures, but it’s not going to go to nine or ten figures, it’s not possible, nobody can afford that” (Knight, 2019). That was in 2019. In January 2023 Microsoft invested $10 billion (that’s 11 figures) in OpenAI, the company that makes ChatGPT, to build the immense cloud infrastructure needed to run its models (Forbes Contributor, 2023; Zhang, 2023). But services like ChatGPT and Bing are free for the public to use (at the time of writing), although there are also paid versions that use more powerful, updated versions of the models, and offer additional features. That it’s easy, automatic and apparently free, makes the considerable resources that go into providing these service invisible to the user. Free is not the real price. It’s funded by venture capital and isn’t making money, yet.

 

The computers in these huge data centers are also made of metals, plastics, and chemicals that need to be mined or manufactured. Some of these like Cobalt and Tantalum are “conflict minerals” mined under extremely dangerous conditions, sometimes using child labour (Frankel, 2017). Some like hafnium and ruthenium are extremely rare and we’re quickly running out. Then when the chips are replaced after about 2 years, they become waste products. Estimates say that only 16% of e-waste is recycled. The rest ends up in landfills, often overseas, where the toxic and in some cases radioactive materials will take millennia to decay (Gonzalez Monserrate, 2022).

 

Another invisible contributor to LLMs is thousands of hours of low wage labour by workers doing jobs like labeling training data (Rowe, 2023), and teaching ChatGPT to be less toxic (Perrigo, 2023). These services that seem automatic have actual workers behind the scenes around the clock ensuring that everything looks seamless.

What are LLMs made of, mechanistically?

To get a sense of how LLMs work mechanistically, imagine playing a game where you need to guess the most likely next word in a sentence. If the prompt is “The …” you can fill in the blank with just about any English noun phrase. If you’re given a bit more context, like “The cat was sitting on the …” you might feel more constrained to guess something like “mat”, but many other words could also fit. If you’re given even more context, like “Bert was a very agile cat. He loved to climb things, then jump down to scare people. One evening when I was coming home, the cat was sitting on the …” you might feel still more constrained in which words would make sense in the blank, and perhaps choose something like “branch”. One can also imagine other versions of the game where, for example, you’re supposed to answer like a pirate. Then you might fill in the blank with “mast” or “crow’s nest”.

 

Whatever version you’re playing, you’d draw on your experience of the world to come up with the most likely next words. In the pirate version, you might focus on your experience with Pirates of the Caribbean. If you were asked to play this game in Spanish, and you had learned Spanish from watching telenovelas, your answers might end up featuring demon possessions and tragic romances. Older people might answer a little differently than younger people. People from different walks of life might tend to fill in the blanks differently, too.

 

The current best large language models are explicitly trained to play this game well, and this is all they’re trained to do. The experience of the world they base their answers on is a large repository of text written by people, including online books, Wikipedia entries, and the contents of many, many, many websites. For more specific LLM applications like ChatGPT, this training is followed by a second stage of “fine tuning”, analogous to learning to answer like a pirate, like a telenovela, or like a cheerful but slightly clueless chatbot.

 

How do you train a model to play a game like this? (And what is a model anyway?) You can think of a model as a box that you can type a message or prompt into, and out of which you get a reply. Inside the box there is a collection of simple messenger units who send and receive notes. Each messenger unit gets notes from some of its neighbours, decides on a message, and sends a note to neighbors further down the line, until the notes reach the messenger units at other end of the box, where you get your reply. All these messenger units know is what’s on the notes they receive and how much to trust the information they get from each of their neighbours. They decide what to write on their own note by considering all the notes they get, weighted by how much they trust the neighbour who gave them the note.

 

When you start training a model, the trust weights are random. So, the very first prompt that gets sent through the model will get a random reply. To train the model, you compare that reply to what the correct reply should have been, and measure how wrong the actual reply was. Each of the messenger units that contributed to that wrong reply gets sent back a correction note telling them how wrong they were. They then decide who to blame for the mistake. Any neighbours who they got wrong information from get trusted a little bit less, so their weights go down. Any neighbours who they got correct information from get trusted a little bit more, so their weights go up. Those neighbour units also get sent a correction note, and they do the same thing, deciding who to trust more and less, and sending back correction notes all the way to the beginning. Gradually, with enough training, the model ends up doing the job well. For this particular game of guessing the next word, the prompt is whatever comes before the blank, and the correct reply during training is what in fact comes after that prompt in the example sentences it’s given as training data. Once the model is fully trained and being used, the model doesn’t get corrections anymore, it just guesses the next word over and over again.

 

There are 3 main tricks that make current LLMs work particularly well. One is that instead of feeding plain old words into the model, the words are first encoded into “word embeddings” (Mikolov et al., 2013). The second trick is that the messenger units are arranged into a particular kind of structure called a “Transformer” (Vaswani et al., 2017). The third trick is that these models are astronomical in size and trained on basically all the text available on the internet.

 

Word embeddings are a solution to a few inconvenient features of languages like English. Words have different numbers of letters, and they carry different amounts of meaning per orthographic unit. “The” carries less meaning than “cat”, for example, despite both being 3 letters long. Also, the relationship between the letters and the meaning is totally irregular. Words can look very similar, but have different meanings, like “bet” and “bot”. One word can have many disparate meanings. Furthermore, words with closely related meanings don’t generally look anything alike orthographically. Going from symbols to representations of meanings is the first problem LLMs need to solve, and luckily this is a problem that already had a solution. Word embeddings represent words in a multidimensional space, where they cluster together with related words, and different kinds of relationships between words can be captured along the different dimensions (Mikolov et al., 2013). The first step in an LLM is to encode the prompt as a set of vectors in this word embedding space, instead of as plain words.

 

The main technical innovation that led to the success of LLMs is the Transformer architecture, which makes use of “attention heads” (Vaswani et al., 2017). These attention heads show up in three places in the model: they compare the input to itself, compare the output so far to itself, and then compare those two to each other. The units in the model referred to before are arranged in such a way that they perform these comparisons.

 

In essence what the attention heads do is for each word embedding in the input, combine it with every other word embedding in the input (up to some distance limit), to calculate how relevant those other words are to the current word. For example, if we’re paying attention to the word “it” in the sentence, “The animal didn’t cross the street because it was too tired” we want to figure out how relevant all the other words in the sentence are to “it”. Since “it” here refers to “animal” we want the model to figure out that “animal” is very relevant. If we have the slightly different sentence, “The animal didn’t cross the street because it was too wide” this time “it” refers to the street, so we want the model to figure out that “street” is very relevant. The result is an “attention score” for each word in the input sentence indicating how much weight it should be given in deciding on the next word to output. There is a big stack of these attention heads all doing the same thing, but with different weights for how much each unit trusts its neighbours. You can think of these as learning different kinds of relationships between word embeddings.

 

The big picture is that LLMs encode the relationships that tend to hold between the words in the sentences they have encountered during their training. What they do is predict the most likely next word, under the assumption that the new sentence it’s seeing is like all the sentences it has seen before. They have a remarkable ability to produce natural seeming language, but if you use ChatGPT, you should remember that instead of understanding your instructions and following them, it is figuring out which words should normally come next after the words in your request.

 

Some of human language is like this. If I were to say, “Hello. How are you?” you would probably reply, “Fine, thanks. How are you?” When we play this language game we don’t typically introspect about our internal state before performing the reply. It’s just a conventional greeting. If I wanted to get beyond the conventional greeting, I’d have to follow it up with, “No, but how are you really?” Answers to that would vary by person and context. When we’re not making small talk, understanding and something like the truth is expected in conversation. If I ask my partner, “What time will you be home tonight?” I’m not looking for the most common answer in the dataset. There are also borderline cases in language, like “Do you like my new haircut?” where it can be unclear whether the request is for convention or truth, and we need to interpret the situation.

 

It should be clear by now that the training data plays a crucial role in how LLMs operate. To train models this big, you need massive amounts of data. The exact composition of the training datasets used to train current versions of LLMs has in most cases not been revealed to the public, but we know some things about them. GPT-3, the LLM that ChatGPT was built on, was trained on a filtered version of CommonCrawl, WebText2, Books1, Books2, and Wikipedia, totaling over 400 billion tokens (Brown at al., 2020). CommonCrawl is the lowest quality but largest of these datasets, consisting of text scraped from all over the web. Their data from the years 2016 to 2019 were used to train GPT-3. OpenAI’s quality control measure for filtering CommonCrawl was to include only the websites that were linked to from Reddit, in posts with at least 3 karma points, indicating some level of interest in the content. The higher quality datasets are sampled more often during training, but the Reddit approved contents of CommonCrawl still represent 60% of the training data (Brown at al., 2020). Reddit is a vast collection of message boards on all topics, so even the filtered version of CommonCrawl contains fan fiction, video game chats, conspiracy theories, pornography, junk advertising, and wildly offensive content.

 

CommonCrawl scrapes websites without regard to copyright, privacy policy, or terms of service. When it was used to train GPT-3 in 2019, OpenAI was a research lab without any consumer-facing products, so at the time they were legitimately able to ignore copyright, because research is considered fair use in the US, where OpenAI is located, as well as many of the jurisdictions where the websites CommonCrawl scrapes are located. But when applications like ChatGPT and Bing search were built on top of GPT-3, and started being offered to the public, in some cases in exchange for payment, fair use stopped applying. OpenAI is being sued or investigated for regulatory violations involving copyright, privacy, security, and transparency in several EU countries, and in Canada (Bommasani, et al., n.d.; Kang & Metz, 2023; Office of the Privacy Commissioner of Canada, 2023; Robertson, 2023). Given that sampling in hip hop music was deemed to violate copyright (Wikipedia, n.d.a), but digitizing libraries to make GoogleBooks was deemed fair use (Wikipedia, n.d.b), it’s anyone’s guess how these legal cases will be resolved.

What are LLMs made of, critically?

Let’s now take LLMs out to the parking lot to test their mettle. To the great horror of high school and post-secondary teachers, LLMs are quite good at composing passable essays about all the standard topics we’ve been teaching for decades. They can also write simple computer code, so programmers are now using it as a tool in their work. Another thing they’re quite good at is translation, and fancying up inexpert writing, so people working or studying in a second language are finding many uses for them, as are people with learning disabilities that affect their ability to write. It can also be useful for getting over the fear of the blank page that makes so many writing projects difficult to start. ChatGPT is good at making first drafts of emails more polite and friendly. In general, what LLMs excel at is fluently producing the sorts of documents that already exist in vast numbers online. If you want a promotional flyer written in corporate speak, ChatGPT is there for you. If you want a bog standard form letter, ChatGPT’s output is indistinguishable from those written by humans. So, LLMs are causing chaos for educators, but perhaps also leveling the playing field for some people. Essay writing was always an imperfect way of assessing critical thinking that left some students out.

 

The thing that impressed a lot of people who work in AI is that LLMs are able to answer a really wide variety of questions without having been explicitly trained on those tasks. This is called “zero-shot learning”, meaning that without being shown a single example of what you want, they can do the task successfully (Brown et al., 2020). Explaining the solutions to word problems in math and explaining jokes are two examples of surprising zero-shot abilities that convinced a lot of people that LLMs must really be doing something like thinking and understanding. In an example shared by OpenAI, GPT-4 is able to explain a meme in which chicken nuggets are arranged on a pan in the shape of a map of the world, and the caption is pretending to marvel at the beauty of the world (Johnson, 2023).

 

ChatGPT can also produce jokes and poetry, though the quality is poor. I asked it to write a haiku for me on demand, and the result was cute, though the syllables weren’t quite right. However, when I asked it for another one, the result was nearly the same as the first time, and again the syllables were off. When asked to come up with original jokes, it spits out well-worn puns that appear on lists like “20 best Dad jokes”. If we recall that what the LLM is actually doing is calculating what the most expected next words are, given the prompt, it’s unsurprising that even when asked for original jokes, the best it can do is tack together a well-worn joke with some random stuff that doesn’t quite make sense. It’s like one of those people who seem really charming the first time you meet them, but by the third time, you realize that they tell the same amusing stories over and over again. LLMs just have a bigger repertoire of amusing stories to draw from.

 

What LLMs are good at is fluency. They’re very good at pretending they know what they’re talking about, but if you poke a little deeper, the illusion that they are capable of understanding or originality shows some cracks. I wanted to see whether ChatGPT could explain jokes that could not have appeared in the training data, so I made up some jokes. ChatGPT did a good job of explaining a joke where dyslexic Kermit the Frog mistakenly titles his autobiography “Green Bean”. This is not that surprising, given that there is a lot of Kermit content online, and it’s a play on the title of a very famous song. But when I asked it to explain a joke with a very similar structure about a dyslexic actor who misreads instructions to be more upbeat and comes to set with a broken nose, ChatGPT does not manage to figure out that “up” and “beat” need to be reversed to make sense of it. And what’s more, it has no idea that it doesn’t understand. When I told it that I didn’t think that was right, it just dug in deeper into the same incorrect explanation. When I asked ChatGPT to drop the obsequious tone and stop apologizing to me, it kept repeating that it “understands” and yet it could not grant that request. It kept apologizing and assuring me that it would do whatever I request ad infinitum. It’s unsurprising that LLMs pretend they know more than they do. They are trained on the contents of Reddit.

 

Another predictable but unfortunate side effect of how LLMs are trained is that they are just as biased and terrible as the average content on the internet. Recall the word embeddings from earlier. Differently gendered words end up getting associated with different parts of this multidimensional space of meaning. Words like “sassy” and “tearful” are more closely associated with “she”, and words like “brilliant” and “genius” are more associated with “he” in word embeddings trained on Google News articles (Bolukbasi et al., 2016). In 2021, as a result of this kind of bias, a Hungarian academic discovered that Google Translate was depending on stereotypes to choose which pronouns to use when translating from Hungarian (which doesn’t have gendered pronouns) to English (which does). The translation reads, “She is beautiful. He is clever. He reads. She washes the dishes…” (https://twitter.com/DoraVargha/status/1373211762108076034). This particular problem has been patched, but these sorts of fixes can only have band-aid solutions. There are also now guardrails in place to ensure that ChatGPT does not produce undesirable output like racial slurs and child pornography, after bad press from some of its earlier behaviour (Wiggers, 2023). To achieve more than a band-aid solution to problems like these would require removing all the unsavory content from the more than 400 billion items in the training data. This has been deemed infeasible, or at least too expensive.

 

Other examples of bias in LLM output are that if you’re writing in African American English, your writing is much more likely to be mislabeled as “offensive” by widely-used hate speech datasets (Sap et al., 2019). If you’re speaking in a non-standard dialect, you’re more likely to be identified as not speaking English. LLMs are just as Islamophobic as the internet in general. If you give GPT-3 a prompt about Muslims, it’s much more likely to include violence in its response than if other religions are mentioned (Abid et al., 2021). This bias can be seen very clearly in applications of LLMs that connect to image generation. Turk (2023) uncovered a number of examples where Stable Diffusion produces stereotyped images when asked to generate images for prompt like “A Mexican person”. Instead of showing contemporary, realistic, varied results, almost all of the outputs are of men wearing sombreros.

 

These limitations need to be kept in mind when we consider applying tools like ChatGPT in education—having students generate an essay, then critique it—in business—to produce promotional materials or write emails—or in mental health—to provide talk therapy for people who otherwise don’t have access to mental health supports. What we will get is fluency not understanding. We will get generic results, not creativity or excellence. We will get discrimination. It can be helpful, but we need to carefully check its work.

 

We also need to remember that while these services are available for free now, free is not the real price. These tools are very expensive to produce and run. If their use expands significantly, and models grow 10-fold or 100-fold, we will have burned down the planet to make that happen. Companies like OpenAI that provide LLM services are moving toward subscription services already (see https://openai.com/api/pricing/), but when OpenAI needs to start making money, it is a safe bet that we’re going to see advertising. That advertising may be embedded in the responses LLMs give. So, you’ll come away from your chatbot therapy session thinking that what will make you feel better is washing your clothes with Tide, drinking a Pepsi, and becoming a Marlboro man. Our rules against subliminal advertising, and regulations around advertising to children, had better catch up quickly.

 

Given that GPT-3 surprised everyone with what it could do, and that GPT-4 was able to do many of the things that were found lacking in GPT-3, just by increasing the size of the model, there is hope and hype that scaling up to even bigger models might lead to superintelligence: models that are smarter than humans on many tasks. One of the inventors of deep learning, University of Toronto professor Geoffrey Hinton, has joined a chorus of voices expressing worry about the “existential threats” of AI (Heaven, 2023). While there is reason to worry about our continued existence on this planet if the carbon emissions of LLMs expand 100-fold, these aren’t the sorts of existential threats the experts are talking about. Instead, they’re worried that ChatGPT 10.0 will be so smart that it will pose security threats, as in it might decide to do away with humans for its own nefarious purposes.

 

Hinton has said that he thinks LLMs are “thinking”, and this is not an absurd claim at all. It’s actually quite reasonable to believe that one of the things our brains do is to build statistical models to predict what to expect next given the inputs we’ve gotten from our environments. That’s one of the leading theories about how brains work (Lewis, 2022). LLMs are doing something quite similar, but with environmental inputs restricted to the text on the internet. Even the most obsessively online among us also get information of other kinds through multiple senses. AI researchers are working on multi-modal models that take in video inputs too, but there are still several pieces missing that make a difference.

 

Our knowledge of the world is grounded in the world around us. We have to use our predictions about what’s going to happen next to act in the world, and when we make mistakes there are real world consequences: we walk into telephone poles, or we get rejected on first dates.  We have a stake in whether our model of the world is operating well. We also have personalities and emotions and a fairly consistent point of view from which we operate. ChatGPT isn’t speaking from any particular point of view, and doesn’t care what the results of its conversations are. It isn’t capable of care. So while it is amazing and impressive that this one component of thinking has been reproduced, it’s not the whole story. For there to be any danger of ChatGPT 10.0 killing us all, it would need to be able to understand dialects and African American English. It would need to not only reliably explain, but also laugh at jokes.

References

Abid, A., Farooqi, M., & Zou, J. (2021). Persistent anti-muslim bias in large language models. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society (pp. 298-306). https://doi.org/10.1145/3461702.3462624

Bolukbasi, T., Chang, K. W., Zou, J. Y., Saligrama, V., & Kalai, A. T. (2016). Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. Advances in neural information processing systems, 29.

Bommasani, R., Klyman, K., Zhang, D., Liang, P. (N.d.). Do foundation model providers comply with the draft EU AI act? Center for Research on Foundation Models. https://crfm.stanford.edu/2023/06/15/eu-ai-act.html

Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., & Amodei, D. (2020). Language models are few-shot learners. Advances in neural information processing systems, 33, 1877-1901.

Forbes Contributor. (2023, January 27). Microsoft confirms its $10 billion investment into ChatGPT, changing how Microsoft competes with Google, Apple and other tech giants. Forbes. https://www.forbes.com/sites/qai/2023/01/27/microsoft-confirms-its-10-billion-investment-into-chatgpt-changing-how-microsoft-competes-with-google-apple-and-other-tech-giants/?sh=6bd561d73624#open-web-0

Frankel, T.C. (2017, March 3) Apple cracks down further on cobalt supplier in Congo as child labor persists. The Washington Post. https://www.washingtonpost.com/news/the-switch/wp/2017/03/03/apple-cracks-down-further-on-cobalt-supplier-in-congo-as-child-labor-persists/

Gonzalez Monserrate, S. (2022, February 14). The Staggering Ecological Impacts of Computation and the Cloud. MIT Press Reader. https://thereader.mitpress.mit.edu/the-staggering-ecological-impacts-of-computation-andthe-cloud/.

Heaven, W. D. (2023, May 2). Geoffrey Hinton tells us why he’s now scared of the tech he helped build. MIT Technology Review. https://www.technologyreview.com/2023/05/02/1072528/geoffrey-hinton-google-why-scared-ai/

Hu, T. H. (2015). A prehistory of the cloud. MIT press.

Johnson, S. (2023, March 18). GPT-4 is surprisingly good at explaining jokes. Freethink. https://www.freethink.com/robots-ai/gpt-4-jokes

Kang, C., & Metz, C. (2023, July 13). F.T.C. opens investigation into ChatGPT maker over technology’s potential harms. The New York Times. https://www.nytimes.com/2023/07/13/technology/chatgpt-investigation-ftc-openai.html

Knight, W. (2019, December 4). Facebook’s head of AI says the field will soon ‘hit the wall’. Wired. https://www.wired.com/story/facebooks-ai-says-field-hit-wall/.

Lewis, R. (2022, January 1). The brain as a prediction machine: The key to consciousness?. Psychology Today. https://www.psychologytoday.com/us/blog/finding-purpose/202201/the-brain-prediction-machine-the-key-consciousness

Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems, 26.

NYU Center for Mind, Brain, and Consciousness. (2023, March 25). Panel: What Can Deep Learning Do for Cognitive Science and Vice Versa? | Philosophy of Deep Learning [Video file]. Retrieved from https://www.youtube.com/watch?v=IaifsZV2mXI

Office of the Privacy Commissioner of Canada. (2023, April 4). Announcement: OPC launches investigation into ChatGPT. OPC launches investigation into ChatGPT – Office of the Privacy Commissioner of Canada. https://www.priv.gc.ca/en/opc-news/news-and-announcements/2023/an_230404/

Patterson, D., Gonzalez, J., Le, Q., Liang, C., Munguia, L.-M., Rothchild, D., So, D., Texier, M., and Dean, J. (2021). Carbon emissions and large neural network training. https://doi.org/10.48550/arXiv.2104.10350

“Pause giant AI experiments: An open letter”. (2023, March 22). Future of life. https://futureoflife.org/open-letter/pause-giant-ai-experiments/

Perrigo, B. (2023, January 18). Exclusive: OpenAI used Kenyan workers on less than $2 per hour to make ChatGPT less toxic. Time. https://time.com/6247678/openai-chatgpt-kenya-workers/

Robertson, A. (2023, April 8). ChatGPT returns to Italy after ban. The Verge. https://www.theverge.com/2023/4/28/23702883/chatgpt-italy-ban-lifted-gpdp-data-protection-age-verification

Rothman, J. (2023, November 13). Why the Godfather of A.I. fears what he’s built. The New Yorker. https://www.newyorker.com/magazine/2023/11/20/geoffrey-hinton-profile-ai.

Rowe, N. (2023, October 16). Millions of workers are training AI models for pennies. Wired. https://www.wired.com/story/millions-of-workers-are-training-ai-models-for-pennies/

Sap, M., Card, D., Gabriel, S., Choi, Y., & Smith, N. A. (2019). The risk of racial bias in hate speech detection. In Proceedings of the 57th annual meeting of the association for computational linguistics (pp. 1668-1678). doi: 10.18653/v1/P19-1163

Stokel-Walker, C. (2023, February 10). The generative AI race has a dirty secret. Wired. https://www.wired.com/story/the-generative-ai-search-race-has-a-dirty-secret/.

Turk, V. (2023, October 10). How AI reduces the world to stereotypes. rest of world. https://restofworld.org/2023/ai-image-stereotypes/.

Vallance, C. (2022, June 13). Google engineer says Lamda AI system may have its own feelings. BBC news. https://www.bbc.com/news/technology-61784011.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.

Wiggers, K. (2023, April 12). Researchers discover a way to make ChatGPT consistently toxic. TechCrunch. https://techcrunch.com/2023/04/12/researchers-discover-a-way-to-make-chatgpt-consistently-toxic/

Wikipedia. (N.d.a). Grand Upright Music, Ltd. v. Warner Bros. Records Inc. Retrieved from https://en.wikipedia.org/wiki/Grand_Upright_Music%2C_Ltd._v._Warner_Bros._Records_Inc

Wikipedia. (N.d.b). Authors Guild, Inc. v. Google, Inc. Retrieved from https://en.wikipedia.org/wiki/Authors_Guild,_Inc._v._Google,_Inc

Zhang, M. (2023, January 26). ChatGPT and OpenAI’s use of Azure’s Cloud Infrastructure. Dgtl Infra. https://dgtlinfra.com/chatgpt-openai-azure-cloud/

 

How to Cite

Stinson, C. (2024). What are large language models made of? In M. E. Norris and S. M. Smith (Eds.), Leading the Way: Envisioning the Future of Higher Education. Kingston, ON: Queen’s University, eCampus Ontario. Licensed under CC BY 4.0. Retrieved from https://ecampusontario.pressbooks.pub/futureofhighereducation/chapter/what-are-large-language-models-made-of/

 


About the author

Catherine Stinson is Queen’s National Scholar in the Philosophical Implications of Artificial Intelligence, and a faculty member in the School of Computing and Philosophy Department at Queen’s University. They trained in machine learning at the University of Toronto and in philosophy of science at the University of Pittsburgh. Their Ethics and Technology Lab explores interdisciplinary research at the intersection of AI, ethics, social justice, and art.

License

Icon for the Creative Commons Attribution 4.0 International License

What are Large Language Models Made of? Copyright © 2024 by Catherine Stinson is licensed under a Creative Commons Attribution 4.0 International License, except where otherwise noted.