Towards trustworthy agentic AI: a comprehensive survey of safety, robustness, privacy, and system security
A comprehensive survey from arXiv (2605.23989) addresses trustworthiness in agentic AI—LLMs augmented with planning, tool use, memory, and long-horizon interactions. The paper focuses on two core dimensions critical for high-risk deployments: Safety and Robustness, and Privacy and System Security. It clarifies key concepts, identifies where risks emerge along the agent workflow (e.g., tool invocation, multi-step planning), and summarizes stage-targeted mitigation strategies. Other aspects like value alignment, transparency, fairness, and accountability are discussed as context. To support consistent comparison, the survey consolidates evaluation into a unified metrics-and-benchmarks hub, emphasizing both outcome and process signals such as constraint violations, trace completeness, and adversarial success rates. It offers scenario-to-metric guidance for release gating. The authors conclude by outlining open challenges, including self-evolving agents. This work provides a structured framework for developers to assess and improve the trustworthiness of agentic systems before deployment.
Provides a structured framework for evaluating agentic AI safety before deployment.