arXiv cs.AITuesday · May 26, 2026FREE

Toward Reliable Design of LLM-Enabled Agentic Workflows: Optimizing Latency-Reliability-Cost Tradeoffs

llm-agentsworkflow-optimizationreliabilitycost-efficiencyai-systems

Published on arXiv cs.AI on 2026-05-26, the paper "Toward Reliable Design of LLM-Enabled Agentic Workflows: Optimizing Latency-Reliability-Cost Tradeoffs" addresses the growing complexity of modern AI systems that integrate multiple interacting agents, including those powered by large language models (LLMs) and traditional computational modules. The authors introduce detailed performance models for both LLM and non-LLM agents. These models quantify the relationship between computational effort and output quality, specifically incorporating the impact of reasoning and output tokens for LLM agents through a parametric exponential reliability function. This approach allows for a nuanced understanding of how resource allocation affects an agent's performance. The research further investigates the design of sequential workflows, considering practical constraints such as latency and cost. Among its main findings, the paper presents a novel water-filling token allocation policy, which provides a principled method for distributing computational resources, particularly tokens for LLMs, across a workflow to achieve optimal performance. Additionally, the study characterizes optimal workflow reliability in terms of shadow prices, offering insights into the economic implications of reliability improvements. This work provides a theoretical framework for designing more robust and efficient LLM-enabled agentic systems.

// why it matters

This research provides developers with models and policies to optimize the latency, reliability, and cost of complex LLM-enabled agentic workflows.

Sources

Primary · arXiv cs.AI
▸ Read original at arxiv.org

Like this? Get the next digest.