arXiv cs.AIMonday · May 25, 2026FREE

One Policy, Infinite NPCs: Persona-Traceable Shared RL Policies for Scalable Game Agents

reinforcement-learninggame-ainpcpersona-conditioning

A new paper from arXiv (cs.AI) presents PCSP (Persona Conditioned Shared Policy), a method for creating hundreds to thousands of NPCs with distinct, controllable personalities using a single reinforcement learning policy. The policy is conditioned on frozen LLM embeddings of free-form persona descriptions, combining once-per-NPC persona encoding, low-rank projection, neural conditioning, and a PPO + InfoNCE consistency + KL diversity training objective. On a 300-persona life-simulation benchmark, PCSP achieves compositional zero-shot persona identification up to 17x above chance, Spearman rho ≈ 0.73 semantic-behavioral alignment, and 22x faster inference than an LLM-as-policy baseline. Ablations show the InfoNCE trajectory-consistency objective is critical: removing it collapses identification to chance. External validation on Melting Pot 2.4.0 substrates confirms the method's effectiveness. The approach addresses key constraints in game AI: persona consistency, controllability via natural language, and real-time inference.

// why it matters

Enables scalable, consistent NPCs with real-time inference for game developers.

Sources

Primary · arXiv cs.AI
▸ Read original at arxiv.org

Like this? Get the next digest.