Towards Faithful Agentic XAI: A Verification Method and an Open-World Benchmark for Better Model Faithfulness
The paper introduces FAX (Faithful Agentic XAI), a framework that enhances faithfulness in LLM-based explainable AI systems. FAX decomposes draft explanations into individual claims and cross-checks them against inherently faithful tools, filtering unsupported or contradictory claims before final generation. To evaluate faithfulness, the authors also present CRAFTER-XAI-Bench, an open-world reinforcement learning benchmark with complex policies and diverse goals. On this benchmark, FAX improves simulation faithfulness from 0.20 (strongest baseline) to 0.46 while maintaining high informativeness, relevance, and fluency. On three tabular benchmarks, FAX performs competitively with prior Agentic XAI baselines. The work addresses the risk that LLMs can amplify unreliable XAI outputs, potentially misleading users.
Reduces risk of LLMs amplifying unreliable explanations, improving trust in AI systems.