Feb 16

RL scaling might be better than it looks, and inference costs to reach a capability level fall fast

2 Comments

Tao Lin?

This 5-10x annual cost drop tracks with what I'm seeing in practice. I switched my AI agent from Opus to Haiku three months ago - not as a theoretical exercise, but because I hit my weekly spending limit and got charged an extra 50 euros.

The counterintuitive finding: Haiku with well-defined task boundaries actually performed better than Opus on 90% of workflows. The cost dropped 15x. Quality held or improved.

Your point about distillation and algorithmic improvements is the mechanism behind this. But I think there's a practical layer missing from the analysis - most teams overspend because they route everything to the biggest model, not because frontier costs are too high.

Wrote up the full economics of running an agent on different model tiers: https://thoughts.jock.pl/p/claude-model-optimization-opus-haiku-ai-agent-costs-2026

Epoch AI

How persistent is the inference cost burden?