What was striking to me was how close the 35-point performance of the top AIs was to the cutoff. Is it reasonable to think that, if one of the problems had been slightly easier (for human contestants), AI might not have gotten gold at all?
Yes, definitely could have gone that way. My guess is an easier version of this year's P6 would still have stumped models -- though I wouldn't be too certain, that's the point. :)
It was a weird year for gold medals in general, see the chart here that didn't quite make the cut for the main post:
What was striking to me was how close the 35-point performance of the top AIs was to the cutoff. Is it reasonable to think that, if one of the problems had been slightly easier (for human contestants), AI might not have gotten gold at all?
Yes, definitely could have gone that way. My guess is an easier version of this year's P6 would still have stumped models -- though I wouldn't be too certain, that's the point. :)
It was a weird year for gold medals in general, see the chart here that didn't quite make the cut for the main post:
https://x.com/GregHBurnham/status/1953624751676563768