arXiv cs.AISaturday · May 23, 2026FREE

Insights Generator: Systematic Corpus-Level Trace Diagnostics for LLM Agents

llm-agentsdiagnosticsarxivagents

A new paper from arXiv (cs.AI) formalizes the problem of corpus-level trace diagnostics for LLM agents. The proposed system, Insights Generator (IG), is a multi-agent framework that answers diagnostic questions by proposing and testing hypotheses across a corpus of execution traces, producing an evidence-backed insights report. This addresses the current manual process where practitioners inspect small subsets of traces, missing patterns that only emerge across populations. The paper evaluates IG both qualitatively and objectively, including rubric-based report assessment and downstream performance improvements. Human experts using IG reports improved scaffold performance by 30.4 percentage points over the unmodified baseline scaffold. Additionally, coding agents leveraging IG-derived insights showed consistent and stable gains. The approach scales to production corpora where individual traces span tens of thousands of tokens, enabling systematic diagnosis of failures in LLM agents.

// why it matters

Automates failure diagnosis in LLM agents, enabling scalable, evidence-driven improvements.

Sources

Primary · arXiv cs.AI
▸ Read original at arxiv.org

Like this? Get the next digest.