COMPASS: Cognitive MCTS-Guided Process Alignment for Safe Search Agents
COMPASS, introduced in a paper on arXiv (2605.30838), addresses safety degradation in LLM-powered search agents caused by harmful intents decomposing into innocuous sub-queries. The framework integrates cognitive tree exploration (CTE) to efficiently synthesize stealthy attack trajectories and introspective step-wise alignment (ISA) to isolate risky intermediate actions for fine-grained process supervision. Empirical results show COMPASS achieves a favorable safety-utility trade-off while requiring substantially less training data than existing alignment methods. The approach is designed to maintain general utility while ensuring robust safety alignment throughout multi-step agent workflows.
Developers can deploy safer search agents without sacrificing utility or requiring large training datasets.