arXiv cs.AIMonday · May 25, 2026FREE

Evaluating Large Language Models in a Complex Hidden Role Game

llmai-safetyevaluationdeception

Researchers introduced an open-source framework to benchmark LLMs in Secret Hitler, a complex hidden role game requiring reasoning, persuasion, and deception. They propose three metrics: Role Identification Accuracy, Deception Retention Rate, and Game State Impact Rate. Testing models like Llama 3.1 70B against rule-based algorithms and human games, they found a gap between conversational ability and strategic depth. Chain-of-Thought prompting and internal memory did not improve performance, with fascist roles seeing up to 23.2% worse win rates. Rule-based agents matched expert human voting decisions 86.7% of the time, while Llama 3.1 70B achieved only 59.7% accuracy. Models playing as Fascists consistently yielded negative impact scores and failed to sustain deception, resulting in roughly 40% shorter games compared to humans. The study highlights limitations in current LLMs for sustained strategic deception.

// why it matters

Highlights LLM limitations in sustained strategic deception, critical for AI safety in adversarial environments.

Sources

Primary · arXiv cs.AI

▸ Read original at arxiv.org

Methods for Formal Verification of Agent Skills: Three Layers Toward a Mechanically Checkable Capability-Containment Proof Inference Time Context Sparsity: Illusion or Opportunity?Stop Comparing LLM Agents Without Disclosing the Harness

Evaluating Large Language Models in a Complex Hidden Role Game

Sources

Related

Like this? Get the next digest.