arXiv cs.AIWednesday · May 27, 2026FREE

OmniToM: Benchmarking Theory of Mind in LLMs via Explicit Belief Modeling

llmtheory-of-mindbenchmarksocial-reasoning

OmniToM, introduced in arXiv:2605.26322v1, addresses limitations in existing Theory of Mind (ToM) evaluations for large language models. Current benchmarks rely on end-point question answering, which may not capture whether models actually build mental-state representations. OmniToM requires explicit modeling of belief structures for all actors in a narrative, using belief propositions—minimal statements of what an actor believes about the world or others' mental states. This format allows analysis of knowledge, intentions, emotions, and false beliefs. Evaluation occurs in two stages: Stage 1 (Belief Extraction) extracts beliefs relevant to social dynamics from the story; Stage 2 (Belief Labeling) assigns each belief a seven-dimensional label. The benchmark aims to provide a more rigorous test of LLMs' social reasoning capabilities, particularly in scenarios with divergent, evolving, or mistaken beliefs.

// why it matters

Developers gain a more rigorous method to test if LLMs truly understand mental states, not just mimic answers.

Sources

Primary · arXiv cs.AI

▸ Read original at arxiv.org

Cross-Entropy Games and Frost Training Hierarchical Prompt-Domain Control and Learning for Resource-Constrained Agentic Language Models When Context Flips, Safety Breaks: Diagnosing Brittle Safety in Aligned Language Models

OmniToM: Benchmarking Theory of Mind in LLMs via Explicit Belief Modeling

Sources

Related

Like this? Get the next digest.