arXiv cs.AIMonday · May 25, 2026FREE

Co-ReAct: Rubrics as Step-Level Collaborators for ReAct Agents

agentsreasoningrubricsgrpoarxiv

Co-ReAct, a new framework from arXiv, enhances ReAct-style agents by using rubrics as step-level collaborators during inference. Unlike prior work that uses rubrics only for evaluation or training, Co-ReAct injects a rubric at each decision step to guide the agent's Reason-or-Act choice, specifying targets for evidence seeking, search, reasoning, or self-evaluation. To generate reliable rubrics, the authors train a dedicated rubric generator using GRPO (Group Relative Policy Optimization) with a list-wise Spearman rank-correlation reward, moving beyond binary or pairwise preferences. This approach aims to reduce shallow, redundant, or poorly targeted trajectories common in multi-step reasoning tasks. The paper is published on arXiv as a new submission (2605.23590v1) on May 25, 2026.

// why it matters

Improves agent reasoning quality by providing structured, step-level guidance during inference.

Sources

Primary · arXiv cs.AI

▸ Read original at arxiv.org

Context: Proactive Goal-Directed Intelligence via Composable Sandboxed Programs, Declarative Wiring, and Structured Interaction Methods for Formal Verification of Agent Skills: Three Layers Toward a Mechanically Checkable Capability-Containment Proof Residual Drift Dominates Contradiction in Multi-Turn Constraint Reasoning

Co-ReAct: Rubrics as Step-Level Collaborators for ReAct Agents

Sources

Related

Like this? Get the next digest.