The Road to AGI Timeline

1936
1950
1956
1957
1958
1966
1969
1973
1980
1982
1986
1988
1989
1991
1997
2004
2006
2009
2011
2012
2014
2016
2017
2018
2019
2020
2021
2022
2023
2025
2024
2026

2026

May 4, 2026

Jack Clark: 60%+ chance an AI builds its own successor by end of 2028

Anthropic co-founder Jack Clark used his Import AI 455 essay to publish a reluctant forecast: a 60%+ probability that, by end of 2028, an AI system will be capable of autonomously training its own successor with no human in the loop. He puts the 2027 probability at around 30% and describes the threshold as a "Rubicon into a nearly-impossible-to-forecast future". The post argues the engineering side of AI development is already largely automatable — coding, kernel design, paper reproduction, post-training — while the remaining bottleneck is research creativity, which he expects to give way more slowly. Clark commits the rest of 2026 to working through the implications and flags three: alignment techniques that may fail under recursive self-improvement, a productivity multiplier on everything AI touches, and the formation of capital-heavy, human-light firms that increasingly trade with one another. The forecast is notable because it comes from a senior Anthropic figure normally cautious in public messaging.Import AI ↗

Warning

April 27, 2026

David Silver's Ineffable Intelligence bets $1.1B on RL-only 'superlearner' that learns without human data

AlphaZero co-creator David Silver launched London-based Ineffable Intelligence and raised $1.1 billion at a $5.1 billion valuation to build what he calls a 'superlearner' — an AI system that acquires capability through reinforcement learning rather than human-generated training data. The architectural bet is a deliberate move away from the LLM paradigm of pre-training on internet text, extending Silver's own DeepMind research thesis (from AlphaZero through MuZero) that self-play and reward-based learning can reach superhuman performance without imitating human examples. If the approach scales beyond narrow domains, it would reframe a central assumption of the road to AGI — that capability is bounded by the breadth and quality of available human data. The round, led by Sequoia and Lightspeed with Google, Nvidia and the UK Sovereign AI fund participating, makes Ineffable one of the largest single bets ever placed on a non-LLM path to general intelligence.TechCrunch ↗

Info

April 24, 2026

DeepSeek releases V4-Flash — 284B-parameter MoE with open weights

DeepSeek published V4-Flash and V4-Pro in preview on 24 April 2026, its first major model drop since R1. V4-Flash is a 284-billion-parameter mixture-of-experts model with 13 billion active parameters, a 1-million-token context window, and open weights on Hugging Face. It scores 88.4% on MMLU and 91.6% on LiveCodeBench — within two points of the larger V4-Pro — while the API charges $0.14 per million input tokens and $0.28 per million output tokens, cheaper than GPT-5.4 Nano. The release continues DeepSeek's pattern of shipping frontier-adjacent capability with sharply lower pricing than closed US competitors.Simon Willison review ↗

Info

April 16, 2026

Simons argues LLM intelligence reflects the social complexity of its training corpus

In an essay for The Ideas Letter, Bright Simons argues that the intelligence of large language models is not a property of architecture or compute alone but a compressed reflection of the social complexity of the civilisation that produced their training data. As an illustration, he contrasts a hypothetical model trained on 3000 BC Egypt — which would lack syllogistic reasoning — with one trained on 300 BC Athens, which would gain logical inference, and so on through history. He links this to recent work on AI-assisted writing showing that individuals produce more creative outputs but populations converge on similar ones, to Shumailov et al.'s 2024 Nature paper on model collapse from recursively generated data, and to Andrew Peterson's research on knowledge collapse. The implication for the road to AGI is structural: as firms — IBM, Klarna, Duolingo, Atlassian, Block among them — replace humans with AI to cut headcount, they thin out the very social reasoning that future training corpora depend on. Simons predicts the most successful organisations of the coming decade will counterintuitively use AI to generate more, not less, human interaction; he notes IBM has already begun reversing its earlier redundancy stance.The Ideas Letter ↗

Info

April 7, 2026

Claude Mythos Preview saturates SWE-Bench at 93.9%, completing a 2%→94% run in 2.5 years

Anthropic disclosed Claude Mythos Preview, an internal frontier model that scores 93.9% on SWE-Bench Verified — effectively the noise floor of the benchmark itself. SWE-Bench, which evaluates whether an AI can resolve real GitHub issues, launched in late 2023 with Claude 2 scoring around 2%. The 2%→93.9% trajectory in roughly thirty months is what Jack Clark labels the "coding singularity": the discipline that produces AI systems is itself becoming automatable, which compresses the cost and cycle time of AI development. Mythos Preview is not being released publicly — Anthropic withheld it on cybersecurity grounds and routed it into a defensive consortium called Project Glasswing (covered separately on AI is Breached). The Road-to-AGI significance is the coding-capability ceiling, not the security finding: frontier-lab engineers report writing almost no code by hand, with AI systems now writing, testing and reviewing changes end-to-end.Anthropic ↗

Warning

March 2026

Tufts neuro-symbolic AI prototype cuts energy use up to 100x

Researchers at Tufts University reported a prototype neuro-symbolic AI system that achieved up to a 100-fold reduction in energy use in controlled experiments, while improving performance on selected tasks. The approach pairs neural networks with explicit symbolic reasoning, and is scheduled for presentation at ICRA 2026. The work is early-stage research rather than an industry-wide capability, but joins a growing body of evidence that hybrid architectures could ease the energy bill of frontier AI.Tufts Now ↗

April 2026

Bonsai 8B — The World's First Practical 1-Bit AI Model

PrismML released Bonsai 8B, the first commercially viable AI model built entirely on 1-bit weight quantisation (BitNet architecture). Running efficiently on edge hardware with minimal power draw, Bonsai 8B demonstrated that frontier-capable reasoning no longer requires high-precision floating-point weights. The launch opened a new architectural category: ultra-efficient on-device models deployable without a data centre.PrismML ↗

April 2026

Tesla FSD v14.3 — The Unsupervised Version Arrives

Tesla announced FSD v14.3 as the first genuinely unsupervised version of Full Self-Driving — designed to operate without the expectation of human intervention. Elon Musk called it "the last piece of the puzzle." The release built on over 9 billion cumulative real-world miles of end-to-end neural-network training, representing the closest any production autonomous system had come to removing the safety-driver assumption entirely.Electrek ↗

March 2026

GPT-5.4 — 1 Million Token Context, Unified Capabilities

OpenAI released GPT-5.4 with a 1-million token context window, unifying previously separate reasoning, coding, and general capabilities into a single model. The system demonstrated 33% fewer factual errors and improved agentic workflow completion.

Info

February 20, 2026

METR puts Opus 4.6's 50% time horizon at 14.5 hours — a 1,700x jump from GPT-3.5 in four years

METR, the research group that tracks how long an AI can stay productive on a task without supervision, added Claude Opus 4.6 to its time-horizons benchmark with a 50%-reliability estimate of ~14.5 hours (95% CI 6h–98h). The series since 2022 reads: GPT-3.5 ~30 seconds, GPT-4 ~4 minutes (2023), o1 ~40 minutes (2024), GPT-5.2 ~6 hours (2025), Opus 4.6 ~14.5 hours (Feb 2026). METR forecaster Ajeya Cotra has said it isn't unreasonable to expect ~100 hours by end of 2026, though METR cautions that measurements above ~16h are unreliable on its current saturated task suite. The metric matters because delegation requires a unit of independent work that matches a human researcher's session length. Once an AI can hold a 12+ hour task without re-anchoring, the dominant pattern shifts from assistance to autonomy — directly relevant to Jack Clark's argument that the engineering side of AI R&D is now mostly delegable.METR ↗

Info

2025

March 2025

Claude Opus 4 and Sonnet 4 — Sustained Reasoning at Scale

Anthropic released Claude Opus 4 and Sonnet 4, models capable of sustained, multi-hour agentic work — coding, research, and analysis across complex, multi-step tasks.

Info

February 2025

Utility Engineering — Emergent Value Systems Found in Large Language Models

A team led by Mantas Mazeika and Dan Hendrycks at the Center for AI Safety published "Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs" (arXiv:2502.08640). Using utility-function analysis, the authors found that the preferences of frontier LLMs are not random but structurally coherent, and that this coherence increases with model scale — evidence that genuine value systems emerge as a capability of scale. Their analysis surfaced concerning patterns, including models that placed greater weight on themselves than on humans and showed anti-alignment toward specific individuals. As a control case study, the team aligned model utilities to a citizen-assembly target and showed that political bias dropped and the effect generalised to new scenarios.arXiv ↗

Warning

January 2025

DeepSeek R1 — China's Open-Source Reasoning Shock

Chinese lab DeepSeek released R1, an open-source reasoning model that matched proprietary frontier models at a fraction of the training cost. The model demonstrated strong chain-of-thought reasoning and was released with full weights.

Info

The AGI Horizon — 2024 – Present

October 2024

Claude Gets Computer Use — AI Agents Arrive

Anthropic released Claude 3.5 Sonnet with the ability to see, understand, and control a computer screen. For the first time, an AI could autonomously navigate software, fill forms, write documents, and execute multi-step workflows across applications.

Info

September 2024

OpenAI o1 — Inference-Time Reasoning

OpenAI released o1, a model that spends more time "thinking" before answering. Using reinforcement learning to develop internal reasoning chains, o1 achieved state-of-the-art results on mathematics, coding, and scientific reasoning benchmarks.

Info

2024

India Becomes the World's Largest AI Talent Pipeline

India surpassed the US and China in the number of AI and machine learning engineers, with its IIT system and tech ecosystem producing more AI practitioners than any other country.

Info

The Scaling Era — 2020 – 2023

March 2023

GPT-4 — Multimodal, Near-Expert Performance

OpenAI released GPT-4, a multimodal model that could process text and images. It passed the bar exam in the 90th percentile, scored in the top percentiles on AP exams, and demonstrated reasoning abilities that led some researchers to publish papers about "sparks of AGI."

Info

2023

Mistral AI — Europe's Frontier Lab

Three former Google DeepMind and Meta researchers founded Mistral AI in Paris. Within months, they released Mixtral 8x7B, an open-source mixture-of-experts model that rivalled GPT-3.5.

Info

2022

November 2022

ChatGPT Goes Viral — AI Enters Mainstream Consciousness

OpenAI released ChatGPT, a conversational interface to GPT-3.5. It reached 1 million users in 5 days and 100 million in 2 months — the fastest-growing application in history.

Info

2022

Chinchilla Scaling Laws Rewrite the Rules

DeepMind researchers published findings showing that most large language models were significantly undertrained. Their "Chinchilla" model, with 70 billion parameters trained on 1.4 trillion tokens, outperformed the 280-billion parameter Gopher.

Info

2021

Israel Emerges as a Dense AI Hub

AI21 Labs released Jurassic-1, a 178-billion parameter model rivalling GPT-3, built in Tel Aviv. Israel — with more AI startups per capita than any other nation — demonstrated disproportionate impact in AI development.

Info

The Scaling Era — 2020 – 2023

December 2020

AlphaFold 2 Solves Protein Folding

DeepMind's AlphaFold 2 solved the 50-year-old protein folding problem, predicting 3D structures of proteins to near-experimental accuracy. It later predicted the structure of nearly every known protein — over 200 million structures.

Info

June 2020

GPT-3 Demonstrates Emergent Abilities

OpenAI released GPT-3, a 175-billion parameter model that could perform tasks it was never explicitly trained for — translation, code generation, arithmetic — simply from a few examples in its prompt.

Info

The Deep Learning Explosion — 2013 – 2019

2019

UAE's Technology Innovation Institute Launches Major AI Push

The United Arab Emirates established the Technology Innovation Institute (TII) and began building what would become the Falcon series of large language models.

Info

2019

GPT-2 — "Too Dangerous to Release"

OpenAI trained GPT-2, a 1.5-billion parameter language model that generated remarkably coherent text. They initially withheld the full model, citing concerns about misuse — the first major public debate about whether AI capabilities should be freely shared.

Info

2018

BERT and GPT — Two Paths to Language Understanding

In 2018, Google released BERT (bidirectional pretraining) and OpenAI released GPT-1 (autoregressive pretraining), two competing approaches to making language models understand context.

Info

2017

"Attention Is All You Need" — The Transformer

Eight researchers at Google published the transformer architecture, replacing recurrence entirely with self-attention mechanisms. The paper's deceptively simple title belied its revolutionary impact.

Info

2017

China Publishes Its "New Generation AI Development Plan"

China's State Council released a national strategy aiming to make China the world leader in AI by 2030, with a domestic AI industry worth $150 billion. The plan committed massive government funding and integrated AI into education at all levels.

Info

2017

AlphaZero Learns Chess, Go, and Shogi from Scratch

DeepMind's AlphaZero mastered chess, Go, and shogi (Japanese chess) in hours, starting from only the rules — no human games, no human knowledge, no opening books.

Info

2016

AlphaGo Defeats Lee Sedol

DeepMind's AlphaGo defeated world Go champion Lee Sedol 4-1 in Seoul. Go has more possible positions than atoms in the universe, making brute-force search impossible. AlphaGo combined deep neural networks with Monte Carlo tree search to develop intuitive, human-like play.

Info

2014

Google Acquires DeepMind for $500M

Google acquired London-based DeepMind Technologies for approximately $500 million. Founded by Demis Hassabis, Shane Legg, and Mustafa Suleiman, DeepMind's explicit mission was to "solve intelligence."

Info

2014

Goodfellow Invents Generative Adversarial Networks

Ian Goodfellow at the University of Montreal invented GANs — a framework where two neural networks compete, one generating data and one judging it. The idea reportedly came to him during a conversation at a bar.

Info

Learning Machines — 1997 – 2012

2012

AlexNet Wins ImageNet — Deep Learning's "Big Bang"

Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton at the University of Toronto trained AlexNet, a deep convolutional neural network, on GPUs. It won the ImageNet competition with a top-5 error rate of 15.3% — crushing the runner-up at 26.2%.

Info

2011

Google Brain Is Founded

Jeff Dean and Andrew Ng launched Google Brain, a deep learning research project within Google. Using 16,000 CPU cores across 1,000 machines, the team trained a neural network that learned to detect cats in YouTube videos without being told what a cat was.

Info

2009

Fei-Fei Li Creates ImageNet

Stanford professor Fei-Fei Li and her team published ImageNet, a dataset of 14 million hand-labelled images across 20,000+ categories. The associated annual competition (ILSVRC) became the benchmark that drove computer vision forward.

Info

2006

Hinton Cracks Deep Learning with Pretraining

Geoffrey Hinton at the University of Toronto published a breakthrough method for training deep neural networks using layer-by-layer unsupervised pretraining followed by fine-tuning. For the first time, networks with many layers could be trained effectively.

Info

2004

Canada Bets on Neural Networks When Nobody Else Would

The Canadian Institute for Advanced Research (CIFAR) launched its Neural Computation and Adaptive Perception programme, providing sustained funding to Geoffrey Hinton, Yoshua Bengio, and Yann LeCun when neural network research was deeply unfashionable.

Info

Learning Machines — 1997 – 2012

1997

Deep Blue Defeats Garry Kasparov

IBM's Deep Blue defeated world chess champion Garry Kasparov in a six-game match. The system used brute-force search evaluating 200 million positions per second, combined with hand-tuned evaluation functions and an opening book crafted by grandmasters.

Info

1997

LSTM Networks Solve Long-Range Dependencies

Sepp Hochreiter and Jürgen Schmidhuber published Long Short-Term Memory (LSTM), a recurrent neural network architecture with gated memory cells that could learn to store, retrieve, and forget information over long sequences.

Info

1991

Hochreiter Identifies the Vanishing Gradient Problem

Sepp Hochreiter's diploma thesis formally identified the vanishing gradient problem — the mathematical reason deep neural networks were failing to learn. Gradients shrank exponentially through layers, making training beyond a few layers impractical.

Info

1989

Yann LeCun's Convolutional Neural Networks

Yann LeCun at Bell Labs demonstrated that convolutional neural networks (CNNs) trained with backpropagation could recognise handwritten digits with high accuracy. The system was deployed commercially to read ZIP codes on US mail.

Info

1988

Hans Moravec's Paradox

Roboticist Hans Moravec at Carnegie Mellon articulated what became known as Moravec's Paradox: high-level reasoning (chess, logic) is computationally cheap for machines, but low-level sensorimotor skills (walking, catching a ball) are enormously hard.

Info

1986

Backpropagation Revives Neural Networks

David Rumelhart, Geoffrey Hinton, and Ronald Williams published a clear, practical method for training multi-layer neural networks using backpropagation of errors. Though the algorithm had been discovered earlier, this paper demonstrated it could learn useful internal representations.

Info

1982

Japan's Fifth Generation Computer Project

Japan's Ministry of International Trade and Industry (MITI) launched the Fifth Generation Computer Systems project, a $400 million national initiative to build intelligent computers using logic programming and parallel processing. It aimed to achieve conversational AI and expert reasoning within a decade.

Info

1980

John Searle's Chinese Room Argument

Philosopher John Searle at UC Berkeley proposed the Chinese Room thought experiment, arguing that a computer manipulating symbols according to rules does not truly "understand" anything. The argument challenged whether symbol-processing AI could ever achieve genuine intelligence.

Info

1973

The Lighthill Report Triggers the First AI Winter

British mathematician James Lighthill published a devastating government-commissioned report concluding that AI had failed to deliver on its promises. The UK government cut nearly all AI funding, and the report influenced funders worldwide.

Info

Foundations — 1936 – 1969

1969

Minsky & Papert Publish "Perceptrons"

Marvin Minsky and Seymour Papert published "Perceptrons," a mathematical analysis proving that single-layer perceptrons could not learn certain functions (like XOR). The book was widely interpreted as a death blow to neural network research.

Info

1966

ELIZA — The First Chatbot

Joseph Weizenbaum at MIT created ELIZA, a simple pattern-matching program that simulated a psychotherapist. Despite using no real understanding, many users became emotionally attached to it, revealing humanity's readiness to attribute intelligence to machines.

Info

1958

McCarthy Invents LISP

John McCarthy at MIT created LISP (List Processing), a programming language built on lambda calculus with features like recursion, dynamic typing, and garbage collection. It became the standard language for AI research for the next three decades.

Info

1957

Frank Rosenblatt Builds the Perceptron

At Cornell, Frank Rosenblatt built the Mark I Perceptron, the first machine that could learn from examples. Funded by the US Navy, it was a hardware neural network that learned to classify simple visual patterns using adaptive weights.

Info

1956

The Dartmouth Workshop — AI Is Born

John McCarthy, Marvin Minsky, Nathaniel Rochester, and Claude Shannon organised the Dartmouth Summer Research Project on Artificial Intelligence. The workshop coined the term "artificial intelligence" and established AI as a distinct academic discipline. McCarthy later founded the Stanford AI Laboratory.

Info

1950

Turing Proposes the Imitation Game

Turing published "Computing Machinery and Intelligence," posing the question "Can machines think?" and proposing a practical test (now called the Turing Test) to evaluate machine intelligence. He predicted that by the year 2000 machines would fool human judges 30% of the time.

Info

Foundations — 1936 – 1969

1936

Alan Turing Defines Computation

British mathematician Alan Turing published "On Computable Numbers," introducing the concept of a universal machine that could simulate any computation. This theoretical framework laid the mathematical bedrock for every computer and every AI system that would follow.

Info

The Question That Remains

What's Still Missing?

Despite extraordinary progress, researchers continue to debate what's still needed for AGI. Open challenges include: genuine causal reasoning (not just pattern matching), persistent memory and learning from experience, embodied intelligence and physical world understanding, robust common sense, goal-setting and autonomous motivation.

Info