arXiv cs.AIThursday · May 28, 2026FREE

Dr-CiK: A Testbed for Foresight-Driven Agents

agentsbenchmarkforecastingcontext-retrieval

Dr-CiK, introduced in a paper on arXiv (2605.27904), is a benchmark designed to test agents' ability to identify and use external context for time series forecasting. Unlike existing benchmarks that provide context directly, Dr-CiK requires agents to retrieve relevant supporting context from a document corpus, filter out distractors, distill it into useful evidence, and generate forecasts. Evaluations of state-of-the-art deep research and forecasting methods revealed that while high-quality context improves performance, most agents recover only a small fraction of ground-truth supporting evidence, with typically 80% of citations being distractors. This leads to forecasters performing worse with retrieved context than without. The benchmark highlights the need for foresight-driven agents that can actively discover relevant context from noisy, heterogeneous sources.

// why it matters

Highlights critical gap in agent ability to autonomously find relevant context for forecasting.

Sources

Primary · arXiv cs.AI
▸ Read original at arxiv.org

Like this? Get the next digest.