Yep we did, in fact your post was one of the inspirations for the first part of our piece!
We've cited you in this paragraph:
> That said, whether these techniques actually end up working depends on more than just increasing the size of context windows on paper. It also requires building infrastructure so that more relevant context (e.g. all recent work interactions) can be digitized and shared with LLMs.
Many research papers suggest that huge context windows are ineffective. Many researchers point out that information in the middle of a large context window is apparently less likely to be used to guide the LLM.
I’ve seen this issue first-hand when building a system that relies on multiple LLM calls to iteratively write longer documents. As the document grows, each successive call either drifts into incoherence or becomes narrowly fixated on a single aspect of the earlier context. This kind of context poisoning makes it very hard to sustain balanced, coherent generation across long horizons. If long-context inference is going to support continual learning in the way you suggest, how do we address this systemic fragility ...where models overfit to prior slices of context instead of integrating them holistically ?
I’m guessing y’all saw I made the same argument? https://open.substack.com/pub/robotic/p/contra-dwarkesh-on-continual-learning
Join the team
Yep we did, in fact your post was one of the inspirations for the first part of our piece!
We've cited you in this paragraph:
> That said, whether these techniques actually end up working depends on more than just increasing the size of context windows on paper. It also requires building infrastructure so that more relevant context (e.g. all recent work interactions) can be digitized and shared with LLMs.
Many research papers suggest that huge context windows are ineffective. Many researchers point out that information in the middle of a large context window is apparently less likely to be used to guide the LLM.
I’ve seen this issue first-hand when building a system that relies on multiple LLM calls to iteratively write longer documents. As the document grows, each successive call either drifts into incoherence or becomes narrowly fixated on a single aspect of the earlier context. This kind of context poisoning makes it very hard to sustain balanced, coherent generation across long horizons. If long-context inference is going to support continual learning in the way you suggest, how do we address this systemic fragility ...where models overfit to prior slices of context instead of integrating them holistically ?