Build a test suite that grows with your agent with dataset management in Amazon Bedrock AgentCore
Amazon Bedrock AgentCore now supports dataset management for agent evaluation, allowing developers to create and manage versioned test fixtures as datasets. This feature enables the combination of fast-moving online signals with stable offline baselines, providing a fixed benchmark alongside changing real-world traffic. By managing test cases as datasets, developers can track whether their agent is truly improving over time. The feature brings the discipline of versioned test fixtures to agent evaluation, making it easier to maintain and evolve test suites as the agent grows. This is part of Amazon Bedrock's broader capabilities for building and deploying generative AI agents. The dataset management feature is available now within Amazon Bedrock AgentCore, with no additional pricing beyond standard Bedrock usage costs.
Developers can now maintain stable, versioned test baselines for agent evaluation, ensuring reliable tracking of improvements.