arXiv cs.AIMonday · May 25, 2026FREE

Human-in-the-Loop Multi-Agent Ventilator Decision Support with Contextual Bandit Preference Learning

agentshuman-in-the-loopmedical-aidecision-supportreinforcement-learning

The Ventilator Decision Support System (VDSS), detailed in an arXiv paper published on 2026-05-25, introduces a human-in-the-loop multi-agent framework designed to provide decision support for ventilator management. This system addresses challenges in existing approaches, such as the lack of personalization in rule-based systems and the difficulty in controlling or auditing end-to-end reinforcement learning or single large language model systems. VDSS coordinates modular decision components through contract-driven structured interfaces, generating traceable evidence for review. A core feature is its online preference adaptation mechanism, which utilizes a contextual bandit. This bandit updates clinician-specific preferences based on the final accepted decision during each adjustment cycle, subsequently guiding future recommendations. The system also incorporates structured rejection feedback, which triggers targeted replanning. This mechanism aims to reduce unproductive iterations and enhance interaction stability between the clinician and the AI. Retrospective ICU trajectory replay, combined with expert review, indicated that VDSS achieved higher recommendation acceptability and required fewer interaction rounds to reach an acceptable plan. This outcome suggests the framework's potential for clinically deployable human-AI collaboration in critical care settings. The research focuses on improving sequential decision-making in complex medical environments while respecting safety boundaries and individual clinician styles.

// why it matters

Developers can learn from VDSS's approach to building auditable, adaptable, and human-centric AI systems for critical applications, emphasizing modularity and preference learning.

Sources

Primary · arXiv cs.AI
▸ Read original at arxiv.org

Like this? Get the next digest.

Human-in-the-Loop Multi-Agent Ventilator Decision Support with Contextual Bandit Preference Learning — aigest.dev