arXiv cs.AIMonday · June 1, 2026FREE

COMPASS: Cognitive MCTS-Guided Process Alignment for Safe Search Agents

llmsafetyagentsalignment

COMPASS, introduced in a paper on arXiv (2605.30838), addresses safety degradation in LLM-powered search agents caused by harmful intents decomposing into innocuous sub-queries. The framework integrates cognitive tree exploration (CTE) to efficiently synthesize stealthy attack trajectories and introspective step-wise alignment (ISA) to isolate risky intermediate actions for fine-grained process supervision. Empirical results show COMPASS achieves a favorable safety-utility trade-off while requiring substantially less training data than existing alignment methods. The approach is designed to maintain general utility while ensuring robust safety alignment throughout multi-step agent workflows.

// why it matters

Developers can deploy safer search agents without sacrificing utility or requiring large training datasets.

Sources

Primary · arXiv cs.AI

▸ Read original at arxiv.org

The Deterministic Horizon: When Extended Reasoning Fails and Tool Delegation Becomes Necessary TRACE: Trajectory Risk-Aware Compression for Long-Horizon Agent Safety Adversarial Feeds Steer LLM Agent Decisions Against Their Defaults

COMPASS: Cognitive MCTS-Guided Process Alignment for Safe Search Agents

Sources

Related

Like this? Get the next digest.