Why LLMs Fail at Causal Discovery and How Interventional Agents Escape
A new paper on arXiv (2605.27567) demonstrates that large language models fundamentally fail at causal discovery, even after fine-tuning. The authors prove via a kernel obstruction theorem that supervised fine-tuning, direct preference optimization, and in-context learning cannot distinguish between causal graphs that produce similar observational data; doing so would require unbounded internal representations. To overcome this, they introduce Agentic Causal Bayesian Optimization (A-CBO), where a frozen LLM acts as an interventional oracle answering queries about intervention effects, while an external Bayesian optimization loop concentrates beliefs over candidate graphs in logarithmically many rounds. The approach avoids the need for the LLM to learn causal structures internally, instead leveraging it for targeted queries. The paper is dated May 28, 2026, and is categorized under cs.AI.
LLMs cannot autonomously discover causal structures, requiring hybrid systems for reliable causal inference.