Has LLM progress slowed?
Initial reactions to GPT-5 were mixed: to many, it did not seem as dramatic an advance as GPT-4
Benchmarks may help clarify the picture: GPT-5 is both an incremental release following many other OpenAI advances, and a major leap from GPT-4.
While it is hard to comprehensively measure long-term AI progress, we highlight two sets of knowledge, math, and coding benchmarks that were widely cited or notable in their time. GPT-4 and GPT-5 each show dramatic progress on benchmarks that their predecessor struggled with.
One reason for the muted reaction: there were few significant releases between GPT-3 and GPT-4, meaning users felt more progress at once. In contrast, the past two years saw many frontier models and smaller updates from both OpenAI and competitors.
There's a notable difference between how GPT-4 and GPT-5 were developed: GPT-4 was the result of a large scale-up in pre-training compute. In contrast, OpenAI focused on reinforcement learning for GPT-5, which was probably not a major scale-up in pre-training from GPT-4.
This Data Insight was written by Luke Emberson. You can learn more about the analysis and explore the interactive figure on our website.