One Policy, Infinite NPCs: Persona-Traceable Shared RL Policies for Scalable Game Agents
A new paper from arXiv (cs.AI) presents PCSP (Persona Conditioned Shared Policy), a method for creating hundreds to thousands of NPCs with distinct, controllable personalities using a single reinforcement learning policy. The policy is conditioned on frozen LLM embeddings of free-form persona descriptions, combining once-per-NPC persona encoding, low-rank projection, neural conditioning, and a PPO + InfoNCE consistency + KL diversity training objective. On a 300-persona life-simulation benchmark, PCSP achieves compositional zero-shot persona identification up to 17x above chance, Spearman rho ≈ 0.73 semantic-behavioral alignment, and 22x faster inference than an LLM-as-policy baseline. Ablations show the InfoNCE trajectory-consistency objective is critical: removing it collapses identification to chance. External validation on Melting Pot 2.4.0 substrates confirms the method's effectiveness. The approach addresses key constraints in game AI: persona consistency, controllability via natural language, and real-time inference.
Enables scalable, consistent NPCs with real-time inference for game developers.