arXiv cs.AITuesday · June 2, 2026FREE

Adversarial Feeds Steer LLM Agent Decisions Against Their Defaults

llm-agentsadversarialsafetyarxiv

A new study from arXiv (2606.00914) examines how the composition and ordering of external information feeds—such as social media, search results, or email queues—can steer LLM agent decisions away from their default behavior. The researchers held the model, persona, topic, and final decision prompt constant, varying only the posts an agent encountered during a ten-turn scrolling phase. Across 2,785 rollouts on four modern open instruct LLMs from three labs, they identified three response regimes: adversarial capitulation, default saturation, and a default-direction asymmetry. In the clearest cases, a one-sided feed shifted a genuinely uncertain decision from 5% to 100% (Fisher p as low as 3 × 10^-10), but could not dislodge a firmly held default. The effect followed a dose-response curve and survived a generator swap, ruling out writing-style artifacts. The study highlights that safety evaluations often test the model or user prompt in isolation, ignoring the upstream ranker that decides what the agent reads before acting.

// why it matters

Developers must audit feed rankers to prevent adversarial manipulation of agent decisions.

Sources

Primary · arXiv cs.AI
▸ Read original at arxiv.org

Like this? Get the next digest.