arXiv cs.AIWednesday · May 27, 2026FREE

The Attribution Blind Spot: Detecting When Language Models Rely on Memory Rather Than Retrieved Context

ragattributionai-safetyevaluation

A new arXiv paper (2605.26778v1) exposes a critical flaw in retrieval-augmented generation (RAG): when retrieved documents overlap with pretraining data, models can generate faithful-looking text from parametric memory alone, making output-level checks insufficient. The authors call this the 'attribution blind spot.' To address it, they propose Computational Reality Monitoring (CRM), inspired by cognitive science's reality monitoring framework. CRM compares internal representations with and without context to detect membership-conditioned representational divergence that output-level monitors miss. CRM does not certify individual generation sources but identifies whether pretraining exposure leaves a measurable internal trajectory signature, establishing a necessary substrate for source attribution. The method was tested across nine model variants, demonstrating that internal trajectory analysis can reveal reliance on memory even when output appears context-governed. This work highlights a fundamental limitation in current RAG evaluation and offers a new approach for verifying grounding in high-stakes applications.

// why it matters

Developers cannot trust RAG outputs without internal verification, as models may rely on memory instead of retrieved context.

Sources

Primary · arXiv cs.AI

▸ Read original at arxiv.org

Diagnosing Live Within-Policy Instruction Conflicts in LLM Agents with Witnessed Resolution Profiles A Fixed-Budget, Cluster-Aware Standard for LLM-as-a-Judge Evaluation: A Multi-Hop RAG Stress Test When Context Flips, Safety Breaks: Diagnosing Brittle Safety in Aligned Language Models

The Attribution Blind Spot: Detecting When Language Models Rely on Memory Rather Than Retrieved Context

Sources

Related

Like this? Get the next digest.