arXiv cs.AITuesday · May 26, 2026FREE

Methods for Formal Verification of Agent Skills: Three Layers Toward a Mechanically Checkable Capability-Containment Proof

formal-verificationagentsllmsafety

Building on a companion paper that introduced a four-level verification lattice (unverified, declared, tested, formal), this work closes the gap by providing a precise semantics for skill behavior. The semantics models how a skill is consumed by an LLM-driven runtime, with a deterministic script-side reachable through a non-deterministic LLM-side. The verification problem is stated as a capability-containment property. Three composable methods are presented: (1) sound static capability-containment analysis of the script-side via abstract interpretation over a small effect lattice; (2) a refinement type system for tool-call envelopes that mechanically rejects any call whose statically-inferred capability is not in the manifest's declared set; (3) SMT-bounded model checking against the parent paper's biconditional correctness criterion, with the bound chosen so any counter-example fitting the runtime's transaction-buffer horizon is exhibited as a concrete trace. The authors prove that the three layers composed soundly cover the verification problem. This work provides a mechanically checkable path to formal verification of agent skills, enabling developers to prove that an agent cannot exceed its authorized capabilities.

// why it matters

Developers can now mechanically verify that LLM agents cannot exceed their declared capabilities.

Sources

Primary · arXiv cs.AI

▸ Read original at arxiv.org

Context: Proactive Goal-Directed Intelligence via Composable Sandboxed Programs, Declarative Wiring, and Structured Interaction Residual Drift Dominates Contradiction in Multi-Turn Constraint Reasoning Inference Time Context Sparsity: Illusion or Opportunity?

Methods for Formal Verification of Agent Skills: Three Layers Toward a Mechanically Checkable Capability-Containment Proof

Sources

Related

Like this? Get the next digest.