Automatic Layer Selection for Hallucination Detection
A new arXiv preprint (2605.26366) introduces First Effective Peak of Intrinsic Dimension (FEPoID), a method for automatically selecting the best intermediate layer in large language models for hallucination detection. While prior work shows hallucination signals are stronger in intermediate layers, no principled method existed for layer selection. The authors tested multiple criteria across LLM architectures (including different scales) on question answering and summarization benchmarks, finding none consistently effective. FEPoID, which is training-free and incurs negligible computation, consistently identifies optimal or near-optimal layers and outperforms both alternative criteria and existing hallucination detection baselines. This removes the need for manual layer tuning, making hallucination detection more practical for deployment.
Enables reliable hallucination detection without manual layer tuning, reducing engineering overhead.