arXiv cs.AIThursday · May 28, 2026FREE

DeepSciVerify: Verifying Scientific Claim--Citation Alignment via LLM-Driven Evidence Escalation

llmscientific-verificationclaim-citationbenchmark

DeepSciVerify, introduced in a paper on arXiv (2605.27710), is a pipeline that verifies alignment between scientific claims and their cited evidence. It first checks claims against paper abstracts, then escalates uncertain cases to retrieve and analyze full-text passages. This selective escalation leverages complementary behaviors across LLMs: some models are more conservative, others more decisive under uncertainty. On the SCitance benchmark, DeepSciVerify achieves 86.7 Micro-F1, outperforming strong abstract-only baselines by +4.5 points. It resolves 67% of instances without needing full-text retrieval, improving both accuracy and efficiency. The system targets a common failure mode in LLM-generated reports where claims misalign with citations, limiting reliability in scientific and high-stakes settings.

// why it matters

Improves reliability of LLM-generated scientific reports by verifying claim-citation alignment efficiently.

Sources

Primary · arXiv cs.AI
▸ Read original at arxiv.org

Like this? Get the next digest.