Discussion about this post

User's avatar
Thomas DeWitt's avatar

Fascinating! I suppose the claudiness finding could cut both ways on the generalization aspect? It shows that Anthropic is not focusing on math, which seems consistent with what they have said, so generalization may be imperfect. But nonetheless Claude is showing significant improvements in math (eg Math L5 or Frontier math), suggesting Anthropics programming focus is generalizing at least somewhat to math.

Expand full comment
Kenny Easwaran's avatar

Are you sure it’s a Claude-iness dimension and not a negative of a ChatGPT-iness dimension? I don’t know how many companies’ models are in the dataset, but if it’s more than two, it’s quite striking that the top 5 are all one company and the bottom 5 are all another. It would be interesting to see which others are on one side or the other!

Expand full comment
3 more comments...

No posts