LLMs progress on high school math contests
Despite rapid progress in a year, LLMs have not yet solved the hardest problems
In less than a year LLMs have climbed most of the high school math contest ladder. Every tier of problem difficulty has either been saturated or is well on its way—except for the very highest tier.
We rated the difficulty of problems from three premier contests (AIME, USAMO, IMO) by splicing together two popular scales (AoPS, MOHS). No LLM has solved a single problem in the highest difficulty tier.
How does this square with the AI gold medal results from the IMO? Models managed to squeeze into the gold medal range without having to solve any top-difficulty problems.
While we haven't observed LLMs solving any problems in the highest difficulty tier, the sample size is small. Thus, it is possible that models can solve some problems in this tier, but complete saturation is unlikely.
For now, high school math contests have a bit more to tell us about AI math capabilities progress.
This Data Insight was written by Greg Burnham. You can learn more about the analysis on our website.