1.2 What do we know about how large language models work?

Dani Dilkes

1.2 What do we know about how large language models work?

What do we know about how large language models work?

Training

The AI model is pre-trained on a large dataset, typically of general texts or images. For specialized AI models, they may be trained on a specific dataset of subject or domain specific data. The AI analyses the data, looking for patterns, themes, relationships and other characteristics that can be used to generate new content.

For example, early models of GPT (the LLM used by both ChatGPT and Copilot) were trained on hundreds of gigabytes of text data, including books, articles, websites, publicly available texts, licensed data, and human-generated data.

Human intervention can occur at all stages of the training.

Humans may:

Create and modify the initial dataset, removing messy or problematic data
Assess the quality of output from the AI model

The model is then released for use and can be accessed by users

Generation

LLMs generate text in response to a user-provided prompt.

A cycle showing 4 steps: User inputs prompt; LLM tokenises prompt; LLM predicts response; LLM shares output. — A diagram showing the cycle of interacting with a Large Language Model.

For more a more detailed introduction to generative AI, see this video from Google:

Introduction to Generative AI

For a more in-depth look at how LLMs function, see this article from the Financial Times:

Generative AI exists because of the transformer

What are the Limitations of Large Language Models?

Generative AI is evolving quickly but still has certain limitations. Large Language Models (LLMs) are constrained by the data upon which they are trained and the methods through which they are trained. It’s important to be aware of the limitations of the tools that you’re using, especially if currency or accuracy is important for the tasks that you’re using generative AI to complete.

Currently, LLMs function as pattern replicators, which means they generate output based on averages or probabilities of patterns.
LLMs are susceptible to hallucinations or the creation of nonsensical words, phrases, or ideas. This can also result in the generation of non-existent references .
LLMs do not fact-check, meaning that the information that they share is not guaranteed to be accurate or logical.
Many LLMs are pre-trained and have knowledge cut-off dates, meaning that data may be out of date or inaccurate. However, ongoing advancements, including the ability to access and process information in real time, have allowed some models to overcome this limitation by having the added ability to perform a web search.
LLMs are susceptible to reproducing biases found in their data sets, including but not limited to biases based on human biases that may be embedded in historical records, cultures, patterns of research, societal norms, and any other elements reflected in the text data used for their training. This will be discussed more in the Ethics section.

← Previous
Next →

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

What do we know about how large language models work?

Training

Generation

What are the Limitations of Large Language Models?

License

Share This Book