arXiv cs.AIThursday · May 28, 2026FREE

EgoBench: An Interactive Egocentric Multimodal Benchmark for Tool-Using Agents

benchmarkagentsmultimodalegocentrictool-use

EgoBench, introduced in arXiv paper 2605.27820, is the first interactive multimodal benchmark designed for tool-using agents. It comprises 1,045 egocentric-video-grounded tasks spanning four daily scenarios (e.g., cooking, assembly). The benchmark provides a user-agent-tool interactive environment for evaluation, along with a multi-agent simulated user that generates task-aligned responses. A three-stage synergistic pipeline ensures each task requires joint visual perception and tool-augmented multi-hop reasoning. A deterministic joint validation framework enables objective evaluation of dynamic interactions. This addresses the gap in existing benchmarks that fail to jointly evaluate multimodal perception, tool invocation, and user interaction. The benchmark is publicly available on arXiv.

// why it matters

Enables rigorous evaluation of AI agents that must perceive, reason, and interact in real-world tool-use scenarios.

Sources

Primary · arXiv cs.AI
▸ Read original at arxiv.org

Like this? Get the next digest.