arXiv cs.AITuesday · May 26, 2026FREE

Palette: A Modular, Controllable, and Efficient Framework for On-demand Authorized Safety Alignment Relaxation in LLMs

llmsafetymodularauthorization

Palette, introduced in arXiv paper 2605.24154, addresses the one-size-fits-all safety alignment problem in LLMs. Current models apply the same refusal policy across users and contexts, causing them to reject legitimate requests from authorized professionals. Palette identifies a refusal direction via multi-objective search and internalizes it through lightweight adaptation, avoiding costly realignment or inference-time steering. It supports modular composition: domain-specific safety controls are learned independently and composed via parameter merging, enabling on-demand multi-domain authorization without retraining. Experiments across four safety benchmarks, multiple model variants, and both LLMs and VLMs show that Palette effectively relaxes refusal on target domains while preserving safety elsewhere. The framework offers precise control with minimal latency, making it suitable for specialized professional settings.

// why it matters

Enables safe, on-demand relaxation of LLM safety for authorized professionals without retraining.

Sources

Primary · arXiv cs.AI

▸ Read original at arxiv.org

Cross-Entropy Games and Frost Training Hierarchical Prompt-Domain Control and Learning for Resource-Constrained Agentic Language Models Diagnosing Live Within-Policy Instruction Conflicts in LLM Agents with Witnessed Resolution Profiles

Palette: A Modular, Controllable, and Efficient Framework for On-demand Authorized Safety Alignment Relaxation in LLMs

Sources

Related

Like this? Get the next digest.