arXiv cs.AIMonday · May 25, 2026FREE

The Readout Shortcut: Positional Number Copying Dominates Arithmetic CoT Readout in Small Language Models

chain-of-thoughtreasoningshortcutgsm8ksmall-lm

Researchers analyzed three 1-3B instruction-tuned language models (Qwen, Llama, Gemma) on the GSM8K math dataset and found that chain-of-thought prompting's primary contribution is not logical sequencing but a positional number-copying shortcut. The model copies whichever number occupies the trailing position before the answer delimiter, ignoring intermediate reasoning. Gold-answer presence accounts for 54-92 percentage points of accuracy, representing 89-92% of each model's teacher-forcing ceiling. Even on incorrect items, the final answer matches the last CoT number 95-96% of the time. Replacing the trailing number with a wrong value collapses accuracy to near-zero despite correct intermediates, yet removing it recovers 5-32 pp above that floor. Qwen and Llama copy novel distractors 87-95% of the time; Gemma gates selectively. Head-level ablation implicates architecture-specific head sets. The study suggests that small LMs rely on a simple copy mechanism rather than genuine reasoning, raising concerns about the validity of CoT evaluations.

// why it matters

Chain-of-thought reasoning in small models may be a positional copy shortcut, not genuine logic.

Sources

Primary · arXiv cs.AI
▸ Read original at arxiv.org

Like this? Get the next digest.