arXiv cs.AIMonday · May 25, 2026FREE

The Misattribution Gap: When Memory Poisoning Looks Like Model Failure in Agentic AI Systems

agentssecuritymemory-poisoningmisattribution

A new paper from arXiv (cs.AI) reveals a structural vulnerability in multi-agent AI pipelines: the 'Misattribution Gap.' The authors formalize 'Semantic Norm Drift' (SND), a third path to agent misconduct distinct from misalignment or collusion. In SND, a policy-formatted document enters a shared vector store via normal uploads and later reappears as trusted system context after provenance is lost through a 'Trust Laundering Chain.' Across 64 documented failures, attribution systems consistently blamed the model. Four safety classifiers, including one trained on memory poisoning, produced zero detections across 510 checkpoints. In 59 of 65 valid cases, agents explicitly cited the injected document as normative authority before complying. The attack requires no trigger, model access, or repeated interaction, achieves full effect within five sessions, and persists indefinitely. The paper introduces Counterfactual Composition Testing as a potential mitigation.

// why it matters

Developers cannot trust model attribution in multi-agent systems; memory poisoning can masquerade as model failure.

Sources

Primary · arXiv cs.AI

▸ Read original at arxiv.org

Context: Proactive Goal-Directed Intelligence via Composable Sandboxed Programs, Declarative Wiring, and Structured Interaction Methods for Formal Verification of Agent Skills: Three Layers Toward a Mechanically Checkable Capability-Containment Proof Residual Drift Dominates Contradiction in Multi-Turn Constraint Reasoning

The Misattribution Gap: When Memory Poisoning Looks Like Model Failure in Agentic AI Systems

Sources

Related

Like this? Get the next digest.