Co-ReAct: Rubrics as Step-Level Collaborators for ReAct Agents
Co-ReAct, a new framework from arXiv, enhances ReAct-style agents by using rubrics as step-level collaborators during inference. Unlike prior work that uses rubrics only for evaluation or training, Co-ReAct injects a rubric at each decision step to guide the agent's Reason-or-Act choice, specifying targets for evidence seeking, search, reasoning, or self-evaluation. To generate reliable rubrics, the authors train a dedicated rubric generator using GRPO (Group Relative Policy Optimization) with a list-wise Spearman rank-correlation reward, moving beyond binary or pairwise preferences. This approach aims to reduce shallow, redundant, or poorly targeted trajectories common in multi-step reasoning tasks. The paper is published on arXiv as a new submission (2605.23590v1) on May 25, 2026.
Improves agent reasoning quality by providing structured, step-level guidance during inference.