UnityMAS-O: A General RL Optimization Framework for LLM-Based Multi-Agent Systems
UnityMAS-O, introduced in a paper on arXiv (2605.26646), addresses the lack of unified reinforcement learning optimization for LLM-based multi-agent systems. Existing frameworks focus on single-policy optimization, but UnityMAS-O treats the complete workflow as the optimization unit, representing it through four first-class objects: logical agent roles, graph trajectories, user-defined rewards, and agent–model mappings. This decouples logical agents from physical model parameters, supporting full sharing, full separation, and partial sharing. Rewards can be assigned at role, turn, and trajectory levels. The framework extends verl with a Ray-based star-topology runtime, where a central controller executes workflows, invokes tools, and records state. This allows for structured interaction and role-specific credit assignment, enabling optimization of complex multi-agent systems that were previously manually orchestrated.
Enables systematic RL optimization of LLM multi-agent workflows beyond manual prompting.