MobileExplorer: Accelerating On-Device Inference for Mobile GUI Agents via Online Exploration
MobileExplorer, introduced in a new arXiv paper (2605.26546v1), proposes a framework to accelerate on-device inference for vision-based mobile GUI agents. The key innovation is to exploit the long per-step reasoning time of vision-language models (VLMs) by performing lightweight, parallel exploration of UI elements. During inference, the agent proactively probes semantically relevant UI elements and records these exploration traces as structured memory. A two-level rollback mechanism ensures reliable execution in live mobile environments by restoring the initial UI state when a naive backtracking strategy fails. The collected traces are summarized into contextual hints and injected into the prompt to enhance subsequent reasoning. This approach aims to fully deploy mobile GUI agents on-device, mitigating privacy concerns and network-dependent latency associated with cloud-hosted models. The paper was published on arXiv on May 27, 2026.
Enables fully on-device mobile GUI agents, reducing privacy risks and latency.