2000s – 2023

Jakub Hyzyk; Melanie Misanchuk

A Brief History of Machine Learning and LLMs

2000s – 2023

Promise Realized and Mass Adoption

Since the turn of the millennium, development in AI and machine learning has accelerated rapidly. AI systems have seen huge increases in capability, driven by better models, advances in computer hardware, and huge investment of capital from the largest companies in the world. Some of the lofty targets in the early days of AI began to be realized, and with each new milestone, progress seems only to accelerate. Machine learning and AI have been incorporated “under the hood” of virtually every major software platform and new technology. But beyond that, conversational AIs such as chatbots and assistants have brought the technologies to a mass audience in a very visible way, making AI a tangible technology that many people interact with consciously on a daily basis.

We will outline these developments in three broad categories:

capability milestones,
technical milestones, and
mass adoption.

Leaps in Capability Capture Public Imagination

In 1997, IBM’s “super-computer” Deep Blue defeated then-world chess champion Gary Kasparov in a full chess match under tournament rules, after having won a game (but losing the match) in the previous year. The win captured the public imagination and gave some of the more skeptical AI researchers pause for thought, as it had at times been claimed AI would never be able to beat a top player outside of controlled circumstances.

Then in 2011, IBM’s Watson DeepQA computer defeated television game show Jeopardy! champions Ben Rutter and Ken Jennings on live TV. Watson was a question answering computer that could deploy a variety of algorithms in parallel to perform natural language processing in real time, generate hypotheses, and validate them against a knowledgebase to come up with answers and state its confidence level in them.

Google acquired DeepMind Technologies, a firm specializing in generalized neural network models for playing video games in 2014. DeepMind had already found success teaching their models to perform at superhuman levels in early arcade games. The following short documentary gives some insight into how DeepMind’s early models learned the games, and the company’s culture in those early days:

DeepMind then moved on to more complicated and modern games, but also more importantly, classic strategy games.

DeepMind’s AlphaGo (Lee) , a version of their model trained to play Go, defeated Lee Sedol , a highly rated South Korean Go champion 4–1 in March 2016. This was seen as a major progression in AI capability, as Go is a game with more permutations of possible moves and is generally considered to allow for more creativity in play than chess. This made it theoretically less suitable to brute force strategies than chess, and thus harder to design effective AI for. Then in May 2017, Google DeepMind’s AlphaGo (Master) defeated Ke Jie, the then-top ranked Go player in the world for two years running, in 3 straight matches.

The DeepMind team announced a new version of AlphaGo, called “Zero,” with a Nature paper in October of 2017. AlphaGo (Zero) was another step forward, this time in terms of architecture and training. Previous versions of AlphaGo had been trained using reinforcement learning on historical games and by playing against humans. AlphaGo (Zero) was trained without any knowledge of the game, historical matches, or observing any humans playing at all. It learned to play Go entirely by unsupervised reinforcement learning, playing against itself. Without playing against or observing any human play, it was able to consistently defeat the previous versions of AlphaGo, Lee and Master, which had been trained on and against human play. It was able to attain this level of mastery much faster, and with less processing power, than those prior models.

Readers looking for more information about Go, and the AlphaGo story, may enjoy the award-wining documentary of the same title:

DeepMind continued creating models for various other games, notably Chess and Starcraft II, but also turned its attention to other fields, finding success in areas such as protein folding, voice synthesis, and even programming.

While superhuman feats in gaming captured much press and public attention, it was performance in natural language tasks that brought public awareness of AI into the mainstream.

In 2018, AI models from both Microsoft and Alibaba outscored the average of a large sample of human respondents in Stanford’s SQuAD 1.1 test of reading comprehension. This was a milestone for natural language processing, and since then, many more models have surpassed human performance on a variety of SQuAD variants.

Since the release of ChatGPT in 2022, it has become the norm to list various academic and professional tests and certifications each iteration of a model or chatbot can pass, and how it performs relative to human students/workers. Readers have by now likely seen a great many such reports, but a few for ChatGPT can be found here: What Exams Has ChatGPT Passed? We will discuss the mass adoption of chatbots further in a subsequent section.

Research Advances and Technical Milestones

To mark the anniversary of the Dartmouth Summer Research Project, Dartmouth College hosted the The Dartmouth Artificial Intelligence Conference: The Next 50 Years (AI@50) in 2006. The conference featured presentations by veterans of the first project 50 years earlier, young researchers, and even futurists and popularizers.

In 2013, a group of Google researchers led by Tomáš Mikolov published Word2vec, a natural language processing technique for encoding the meaning and syntax of a word into vectors, which can then be mathematically evaluated against other words.

Ian Goodfellow created generative adversarial networks (GANs) in 2014. GANs are a machine learning method in which two different neural networks are given the same training data and then made to “compete” with each other, with their output being submitted to a third “discriminator” network for relative scoring. GANs allowed for competitive training of neural network models against each other without human supervision (although they have proven useful with supervision as well).

The attention model introduced by Dzmitry Bahdanau and his team in 2015 was a major step forward for natural language processing. Neural networks no longer needed to retain the entirety of a sentence in their memory, they could now pinpoint relevant words, improving accuracy and handling longer, more complex sentences efficiently.

In 2017, a team of Google researchers led by Ashish Vaswani proposed a new simple network architecture, the Transformer, based solely on attention mechanisms and doing away with recurrent neural networks. The architecture their paper introduced marked a pivotal inflection point and led directly to the current crop of transformer based LLMs and chatbots. We will discuss transformers in greater depth in the “How LLM Technology Works” section.

Following on the release of the transformer paper, several important LLMs were released, including BERT from Google, ELMo, and ULMFiT. But it was OpenAI’s GPT models that would drive advancement from 2018 onward.

OpenAI Models

Alec Radford and his colleagues at OpenAI made waves in the AI community with their generative pre-training (GPT) model. They demonstrated the power of training a language model without the constraints of explicit supervision, on a vast and diverse dataset. This first version, known as GPT-1, had 117 million parameters and was trained on Bookscorpus, a dataset of 7000 books.

OpenAI released its GPT-2 model in 2019, trained on a larger corpus of data (8 million web pages) , and with a larger parameter set (1.5 billion). GPT-2 also featured several algorithmic improvements over GPT-1.

GPT-3 was released in 2020 with the paper “Language Models are Few-Shot Learners.” GPT-3 featured 175 billion parameters, and the introduction of few-shot learning. GPT-3 can largely be thought of as a scaled-up GPT-2, with some small architectural changes, and showcased near-human-level performance in many language tasks.

After GPT-3, OpenAI released GPT-3.5 in 2022 as an interim update. GPT-3.5 featured fewer parameters than GPT-3, and focused on giving more helpful, less biased responses. This was accomplished using careful Fine Tuning and Reinforcement Learning from Human Feedback (RHLF), both of which will be discussed further in the “How LLM Technology Works” section.

GPT-4, introduced in 2023, represented a further significant improvement. Though OpenAI has released less and less information about its models over time as it commercialized them, GPT-4 was rumoured to feature anywhere from 1-100 trillion parameters, as well as advancements in model architecture, training techniques, and a broader dataset. GPT-4 was also the first iteration that was fully multi-modal, allowing for input and generation of text as well as images. It could also interact with external tools and interfaces through a plugin architecture.

For more information on the technical evolution of OpenAI’s models, readers may consult this article.

Conversational AI & Chatbots Bring AI to the Masses as Consumer Products

The first interaction many non-enthusiast users had with a conversational AI came in 2011, when Apple made Siri a centerpiece of its iOS operating system and marketing for the new iPhone 4s, having acquired the app in 2010. The digital voice assistant used predefined commands to perform actions and answer questions. Siri was followed in 2012 by Google Now and Microsoft’s Cortana in 2014. These digital assistants brought conversational AI to millions of non-technical smartphone and computer users, impressing with their ability to use a natural language interface and deep OS integrations to accomplish many tasks.

But it was the arrival in of ChatGPT in November of 2022 that took mass adoption to another level. ChatGPT extended the conversational abilities of the digital assistants with rich generative capabilities. Its ability to converse, explain, and generate human-quality text with surprising fluency surprised both critics and enthusiasts. OpenAI’s tool had a million users within 5 days, and 100 million within 2 months, making it the most rapidly adopted consumer application in history.

Microsoft integrated ChatGPT technology into Bing search in February 2023, marking the first deployment of an LLM chatbot at scale by one of the “big five” consumer-facing software companies (Facebook, Apple, Amazon, Google, Microsoft). It also marked the beginning of a race among those players for market share in this new segment of personal computing. All of these companies are making investments and acquisitions in the space to retain optionality at a minimum or vie for dominance at a maximum.

In March 2023, OpenAI released its improved GPT-4 model,; available immediately in it’s paid “plus” tier of service, while the free service continued to use GPT-3.5. GPT-4 was also integrated in Bing Chat, possibly even before the public release.

Simultaneously, Google opened early access to its own LLM chatbot, Bard. In April 2023, Google consolidated and redoubled their AI efforts, absorbing DeepMind fully into the company, and merging it with the Google Brain research team. In May 2023, Google introduced a new, more advanced language model, PaLM 2, and incorporated it into Bard.

In November of 2023, Elon Musk announced Grok, a chatbot integrated into X (formerly Twitter) that has a focus on free speech and his particular sense of humour.

Most recently in December of 2023, Google introduced Gemini 1.0 Ultra, a rebranded and updated version of Bard, with full multimodal input and output capabilities, along with two lower tier Gemini models, and announced a paid service tier called Gemini Advanced to launch sometime in 2024.

Media Attributions

This image was created using DALL·E
This image was created using DALL·E
This image was created using DALL·E

License

Icon for the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License

ChatGPT in STEM Teaching: An introduction to using LLM-based tools in Higher Ed Copyright © by Jakub Hyzyk and Melanie Misanchuk is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, except where otherwise noted.