Counteraction-Aware Multi-Teacher On-Policy Distillation for General Capability Recovery with Domain Preservation
The paper introduces Counteraction-Aware Multi-Teacher On-Policy Distillation (CaMOPD), a method for recovering general capabilities in domain-specialized LLMs without relying on teacher-aligned prompts. Vanilla Multi-Teacher On-Policy Distillation (MOPD) assumes prompt coverage matching teachers' training distributions, which is often infeasible for open-source general teachers. CaMOPD identifies two failure modes: recovery-preservation counteraction from conflicting gradients and weak-signal flattening from uniform averaging. It addresses these with decoupled alternating training and gap-based sample selection. Experiments show CaMOPD outperforms vanilla MOPD on general capability benchmarks while maintaining domain performance. The paper is available on arXiv (2605.27115).
Enables better general capability recovery in domain-specialized LLMs without requiring teacher-aligned prompts.