Why We Need World Models for AGI: Where LLMs Fail and How World Models May Outperform
A paper published on arXiv (2605.23972) argues that large language models (LLMs) are fundamentally limited in tasks requiring causal reasoning, persistent state tracking, and long-horizon planning. The authors attribute this to an objective-level mismatch: LLMs are trained for sequence prediction, but reasoning over latent environment dynamics requires inferring underlying transition rules. To formalize this, they introduce Latent Dynamics Inference (LDI), a perspective that treats language and multimodal observations as partial evidence of hidden state transitions. As a proof of concept, they present Flux, a sequential reasoning environment defined entirely by natural-language rules. These rules are compiled into an explicit state-transition simulator, allowing a controlled comparison between LLMs operating on textual observations and reinforcement learning agents trained directly on the extracted latent state. The paper suggests that world models—explicit representations of environment dynamics—may outperform LLMs in such settings. No specific results or benchmarks are provided in the abstract; the work is presented as a conceptual framework and environment design.
Highlights a fundamental limitation of LLMs and proposes a path toward world models for AGI.