Evaluating Large Language Models in a Complex Hidden Role Game
Researchers introduced an open-source framework to benchmark LLMs in Secret Hitler, a complex hidden role game requiring reasoning, persuasion, and deception. They propose three metrics: Role Identification Accuracy, Deception Retention Rate, and Game State Impact Rate. Testing models like Llama 3.1 70B against rule-based algorithms and human games, they found a gap between conversational ability and strategic depth. Chain-of-Thought prompting and internal memory did not improve performance, with fascist roles seeing up to 23.2% worse win rates. Rule-based agents matched expert human voting decisions 86.7% of the time, while Llama 3.1 70B achieved only 59.7% accuracy. Models playing as Fascists consistently yielded negative impact scores and failed to sustain deception, resulting in roughly 40% shorter games compared to humans. The study highlights limitations in current LLMs for sustained strategic deception.
Highlights LLM limitations in sustained strategic deception, critical for AI safety in adversarial environments.