arXiv cs.AIWednesday · May 27, 2026FREE

Which Changes Matter? Towards Trustworthy Legal AI via Relevance-Sensitive Evaluation and Solver-Grounded Reasoning

legalaillmsevaluationreasoningtrustworthyai

A new arXiv paper, "Which Changes Matter? Towards Trustworthy Legal AI via Relevance-Sensitive Evaluation and Solver-Grounded Reasoning," published on May 27, 2026, addresses a critical challenge in legal AI: the inability of large language models (LLMs) to consistently distinguish legally relevant changes from irrelevant perturbations. The authors formulate this as a legal-relevance-sensitive evaluation problem, arguing that LLMs should remain stable under legally irrelevant variations but adapt when changes alter legally material points. This distinction is crucial for trustworthy legal reasoning. To assess this, they introduced a unified evaluation suite covering "should-change" and "should-not-change" scenarios across judicial fairness, robustness, and statute-confusion contexts. Their findings indicate that existing legal LLMs are systematically sensitive to legally irrelevant variations and frequently fail to differentiate related legal elements and statutory rules, leading to unreliable outputs. To mitigate these issues, the researchers propose LexGuard, an adversarial multi-agent framework. LexGuard grounds its reasoning in formal methods by formalizing statutes into executable constraints. It employs adversarial agents to extract competing fact-statute arguments and then utilizes SMT (Satisfiability Modulo Theories) solvers to verify legal satisfaction and logical consistency. Experiments demonstrate that LexGuard significantly enhances legal reasoning reliability by reducing susceptibility to manipulative framing and improving the model's ability to focus on legally material facts.

// why it matters

Developers can leverage formal reasoning frameworks like LexGuard to build more reliable and trustworthy legal AI applications.

Sources

Primary · arXiv cs.AI

▸ Read original at arxiv.org

Why LLMs Fail at Causal Discovery and How Interventional Agents Escape Prefix-Safe Bayesian Belief Tracking for LLM Reasoning Reliability:Separating Calibration from Ranking Diagnosing Live Within-Policy Instruction Conflicts in LLM Agents with Witnessed Resolution Profiles

Which Changes Matter? Towards Trustworthy Legal AI via Relevance-Sensitive Evaluation and Solver-Grounded Reasoning

Sources

Related

Like this? Get the next digest.