9 Comments
User's avatar
Herbie Bradley's avatar

Nice analysis! I do think section IV underestimates the dynamics from data, and I think the data improvements model could benefit from some application of diminishing returns. For example, most researchers estimate we're years past the point of severe diminishing returns from pretraining on public data. Labs have moved more towards synthetic data, which is decent but is dominated by domains in which it's easy to verify (software), and private data (requires paying companies).

At every point in the last 5y at which it seemed like there might be a slowdown, we found a new mini-paradigm to continue (e.g., pretraining -> RLHF data -> RL env data -> inference time compute). But many of these are approaching diminishing returns. It's useful to bear in mind that at the point where we get automated AI R&D, we should expect labs to have spent tens of billions on custom RL environments for every aspect of knowledge work, and potentially more on pretraining data cumulatively. That's just a huge amount of effort, and this is one of the main reasons why I don't expect a software-only singularity.

Daniel Kokotajlo's avatar

Nit: “Will compute be a major bottleneck to the software intelligence explosion?” should read "will *training* compute be a major bottleneck..."

Because obviously experiment compute will be; AI 2027 and the AI Futures Model are built around that assumption.

I think you are clear on this elsewhere in the piece, which is why this is just a nitpick of a typo.

Anson Ho's avatar

Hmm yeah I think this was an issue of vague wording on my part — I was trying to gesture at both training and experiment compute bottlenecks, but it's not clear what exactly is meant by "a major bottleneck"

If I wanted to be more precise I'd probably say something more like "what is the probability that we see >5 OOM of training compute efficiency improvement in <1 year?"

Daniel Kokotajlo's avatar

Huh, why not just say "Will *training* compute be a major bottleneck...?"

The "what is the probability that we see >5 OOM training compute efficiency improvement in <1 year" sounds like a proxy for "will takeoff be fast" rather than a crux for it!

Anson Ho's avatar

>sounds like a proxy for "will takeoff be fast" rather than a crux for it!

Yep you're right, I didn't explain this very well, lemme try again

What I was trying to say is that in my head I tend to think of this as "to what extent will experiment + training compute bottlenecks change the probability of [operationalisation of SIE]?". Under this framing we can't just say "yes experiment compute will be a bottleneck" and move on, because what we really care about is "to what extent does experiment compute matter?". The same is true for training compute IMO, and having the specific operationalisation of the SIE is useful to put probabilities on things

I ended up compressing this into "(2) Will compute be a major bottleneck to the software intelligence explosion?" but that loses the technicalities above. So maybe a better way to phrase this is "(2) To what extent will compute be a bottleneck to the software intelligence explosion?"

Pawel Jozefiak's avatar

The distinction between algorithmic progress and data quality improvement is crucial and almost nobody makes it clearly.

What struck me: if most measured "software progress" comes from better data rather than breakthrough architectures, then the compounding story changes completely. Data improvements have diminishing returns. Architectural innovations might not.

The DeepSeek example is interesting precisely because it's unclear which driver did the heavy lifting. Was it genuinely novel architecture or just better training data curation at scale?

Would love to see this analysis extended to code-specific models where the training data quality question is even murkier.

Anson Ho's avatar

>Would love to see this analysis extended to code-specific models where the training data quality question is even murkier.

Interesting, why is it murkier for code-specific models?

Klement Gunndu's avatar

Your point about "This post is part of Epoch AI’s Gradient Updates newsletter, which shares more opinionated or informal takes on big ques" resonates with what I've been seeing in production systems. The gap between the theory and what actually ships is where most teams struggle, especially around reliability.

Soren Vale's avatar

One implication of the data-vs-algorithms distinction is that it changes where the moat lives. If a lot of measured “software progress” is really better data pipelines, filtering, synthetic data, and post-training environments, then the strategic bottleneck shifts away from model architecture alone and toward who can operationalize those feedback loops inside a trusted production shell. That makes the story feel less like pure research progress and more like systems/control-plane competition.