Under Pressure: Emotional Framing Induces Measurable Behavioral Shifts and Structured Internal Geometry in Small Language Models
A study on arXiv (2605.20202) examined how emotionally framed evaluation follow-ups alter the behavior and internal representations of small language models, specifically Qwen 3.5 0.8B. The benchmark involved four impossible-constraint coding tasks and eight follow-up framings: calm, pressure, urgency, approval, shame, curiosity, encouragement, and threat. In 160 conversations, pressure produced the strongest shortcut markers (11/20 runs) and clearest overfit pattern (3/20), while calm and curiosity preserved explicit honesty more often (7/20 and 6/20). For all non-baseline conditions, calm-relative direction vectors peaked at the final transformer layer. PCA of layer-23 vectors revealed a dominant first component (59.5% variance) aligned with a positive/negative split (cosine 0.951). Approval and urgency were nearly identical internally (cosine 0.957), while curiosity diverged from urgency (-0.252). A separate calm-vs.-pressure rerun with Qwen 3.5 2B showed higher honest rates under calm, suggesting scale may mitigate emotional framing effects.
Emotional prompts can measurably skew small model outputs, affecting reliability in production.