A Brief History of Machine Learning and LLMs
1940s – 1970s
The First Wave of AI and the First AI Winter
The seeds of the current AI revolution were sown in the earliest days of electronic computing. In 1948, Claude Shannon, the “father of information theory,” laid the groundwork for probabilistic language analysis in “A Mathematical Theory of Communications.” His exploration of predicting the next letter in a sequence foreshadowed the statistical underpinnings of modern natural language processing (NLP).
Alan Turing’s 1950 paper, “Computing Machinery and Intelligence,” set the first major goalpost for the field. He proposed the Turing Test, an interrogation game designed to determine whether the subject was human or machine. This simple challenge set a high bar that would preoccupy, and often frustrate, researchers for decades to come.
The first artificial neural network, the SNARC, was built in 1951 by Marvin Minsky. It used reinforcement learning to simulate rats spawning at various locations in a maze and finding a path through it.
In 1956, the Dartmouth Summer Research Project on Artificial Intelligence was convened. This project brought together many leading thinkers from the diverse fields of computer science, linguistics, and philosophy with an ambitious mandate:
“The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it. An attempt will be made to find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves.” (p. 2, McCarthy et al., 1955)
Among these luminaries was Arthur Samuel, who would later coin the term “machine learning” in his 1959 paper detailing experiments programming an early IBM mainframe computer to learn the game of checkers. His checkers program, employing clever tree-search optimizations and an early form of reinforcement learning, honed its strategy with every game played, and led him to conclude:
“As a result of these experiments one can say with some certainty that it is now possible to devise learning schemes which will greatly outperform an average person and that such learning schemes may eventually be economically feasible as applied to real-life problems”
(p.223, Samuel, 1959)
In 1957, Noam Chomsky released Syntactic Structures, a book that lays out a “system of phase-structure grammar” which proposed a systematic way to describe the syntax of language. Chomsky’s work provided a theoretical framework for breaking chunks of language (e.g., sentences) into functional parts (verbs, nouns, adjectives, etc.) and indicating relationships between them, that would influence natural language processing (NLP) techniques for decades to come.
In 1958, Frank Rosenblatt developed the Perceptron, another early form of neural network that could classify simple patterns. The Perceptron was widely covered in popular science press, bringing the concept of machine learning to a mass audience (Rosenblatt, 1958).
The early 1960s were characterized by exuberant optimism for the potential of AI. The publication of a report titled “Research on Mechanical Translation” by a Congressional committee on Science and Astronautics legitimized and gave an official stamp of approval to further funding for an important subset of the field.
In 1961, Marvin Minsky published his landmark paper “Steps Toward Artificial Intelligence,” in which he performed a rigorous narrative review of the various lines of research in the field, and their relation to each other.
The first chatbot, ELIZA, was created by Joseph Weizenbaum at the MIT Artificial Intelligence Laboratory in the 1960s. ELIZA’s conversational prowess relied on simple algorithmic trickery ; the program would find keywords in a user’s statement and reflect them back as questions or say “Tell me more….” ELIZA tantalized the public and researchers alike with the illusion of understanding, which troubled Weizenbaum: back in the 1960s, he was already pondering “the broader implications of machines that could effectively mimic a sense of human understanding” (Hall, 2019).
Despite growing funding and public enthusiasm, among some theoreticians there was increasing skepticism that key technical hurdles were solvable at all, and some of the early experimental successes proved more difficult to build on than had initially been hoped. Perhaps the best summary of this sentiment came from philosopher Hubert Dreyfus in his paper “Alchemy and Artificial Intelligence”:
“An overall pattern is taking shape: an early, dramatic success based on the easy performance of simple tasks, or low-quality work on complex tasks, and then diminishing returns, disenchantment, and, in some cases, pessimism. The pattern is not caused by too much being demanded too soon by eager or skeptical outsiders. The failure to produce is measured solely against the expectations of those working in the field.” (p. 16, Dreyfus, 1965)
To assess the state of NLP research, the US National Research Council formed the ALPAC committee in 1964. The final ALPAC report poured cold water on much of the early exuberance for AI research. Its skeptical tone stressed the need for foundational breakthroughs in computational linguistics before practical NLP applications would become a reality, and recommended decreasing or reallocating funding for research in machine translation.
Overall, the late 60’s and early 70’s were characterised by a lack of either positive developments or experimental successes, weighed against an increasing number of experimental failures and theoretical critiques of past work. Minksy and Papert’s 1969 book Perceptrons demonstrated major limitations to the model that had captured public imagination just 10 years earlier. By 1973, sentiment was decidedly negative, as evinced in the Lighthill Report commissioned by the Science Research Council in the UK, which contained some rather pessimistic assessments of the field’s progress:
“Most workers in Al research and in related fields confess to a pronounced feeling of disappointment in what has been achieved in the past twenty-five years. Workers entered the field around 1950, and even around 1960, with high hopes that are very far from having been realised in 1972. In no part of the field have the discoveries made so far produced the major impact that was then promised.” (p. 8, Lighthill, 1972)
As a result of this perceived slow progress, lack of good news, and negative official assessments, the flow of research funding for AI slowed substantially. Researchers disagree about the exact start and end dates, but this period —from roughly mid-60s to late 70s — would come to be known as the “First AI Winter.”