From Static Context to Calibrated Interactive RL: Mitigating Distribution Shift in Multi-turn Dialogue with Aligned Simulator
A new paper on arXiv (2605.26403) from May 2026 identifies two sources of distribution shift in LLM-based dialogue agents: policy-induced shift from training on static histories, and simulator-induced shift from discrepancies between simulated and real human behaviors. The authors show theoretically that these shifts compound quadratically over turns, severely degrading dialogue quality. To address this, they propose Calibrated Interactive RL, a unified framework that aligns the simulator with human interaction patterns to reduce the sim-to-real gap. Experiments across multi-turn dialogue tasks demonstrate improved performance over static context RL and standard interactive RL approaches.
Improves real-world reliability of conversational AI by reducing compounding errors over multiple turns.