arXiv cs.AISaturday · May 23, 2026FREE

Long-Context Reasoning Through Proxy-Based Chain-of-Thought Tuning

long-contextreasoningchain-of-thoughtllm

ProxyCoT, introduced in arXiv paper 2605.20201, addresses the poor performance of large language models on long-context reasoning tasks despite supporting up to 10 million tokens. The key insight is that such tasks can be solved using only a subset of the input—a proxy context—rather than the full sequence. ProxyCoT first obtains high-quality chain-of-thought reasoning traces on short proxy contexts via reinforcement learning or distillation from a larger teacher model. Then, it grounds these traces in full long contexts using supervised fine-tuning. Experiments across multiple datasets show ProxyCoT consistently outperforms strong baselines while reducing computational overhead. Notably, models trained with ProxyCoT generalize their long-context reasoning capabilities to out-of-domain tasks. The paper was published on arXiv on May 22, 2026.

// why it matters

Enables efficient long-context reasoning without full-sequence processing, reducing compute costs.

Sources

Primary · arXiv cs.AI
▸ Read original at arxiv.org

Like this? Get the next digest.