arXiv cs.AIWednesday · May 27, 2026FREE

Counteraction-Aware Multi-Teacher On-Policy Distillation for General Capability Recovery with Domain Preservation

llmdistillationdomain-specializationon-policy

The paper introduces Counteraction-Aware Multi-Teacher On-Policy Distillation (CaMOPD), a method for recovering general capabilities in domain-specialized LLMs without relying on teacher-aligned prompts. Vanilla Multi-Teacher On-Policy Distillation (MOPD) assumes prompt coverage matching teachers' training distributions, which is often infeasible for open-source general teachers. CaMOPD identifies two failure modes: recovery-preservation counteraction from conflicting gradients and weak-signal flattening from uniform averaging. It addresses these with decoupled alternating training and gap-based sample selection. Experiments show CaMOPD outperforms vanilla MOPD on general capability benchmarks while maintaining domain performance. The paper is available on arXiv (2605.27115).

// why it matters

Enables better general capability recovery in domain-specialized LLMs without requiring teacher-aligned prompts.

Sources

Primary · arXiv cs.AI
▸ Read original at arxiv.org

Like this? Get the next digest.

Counteraction-Aware Multi-Teacher On-Policy Distillation for General Capability Recovery with Domain Preservation — aigest.dev