Capability Self-Assessment: Teaching LLMs to Know Their Limits
A new arXiv paper (2606.00251) from researchers reveals that modern large language models systematically lack the ability to recognize their own limitations, overestimating their competence and attempting queries they cannot solve. This capability, termed Capability Self-Assessment (CSA), is formulated as a policy-learning problem. The study finds that reinforcement learning (RL) teaches CSA effectively, significantly outperforming supervised fine-tuning (SFT) while preserving the model's original capabilities. In contrast, SFT severely degrades the very capabilities the model is meant to assess. The learned self-assessment behavior generalizes well out of distribution, suggesting CSA is a transferable model trait. Practically, CSA improves local-cloud decision making at inference time and provides a signal for targeted data selection during training. The paper is published on arXiv under cs.AI.
Developers can build more reliable AI systems that know when to delegate tasks.