Improving Quantized Model Performance in Qualitative Analysis with Multi-Pass Prompt Verification
Researchers evaluated quantized LLaMA-3.1 (8B) at 8-bit, 4-bit, 3-bit, and 2-bit levels on qualitative analysis of 82 interview transcripts. Low-bit models produced more hallucinations and unstable results, particularly with non-expert language. To address this, they introduced a quantization-aware multi-pass prompt verification method that guides the model through controlled steps to reduce hallucinations, removing unreliable content and passing results after verification. Human coders used NVivo and BF16 LLaMA-3.1 for validation; BF16 output had semantic drift and hallucinations corrected manually. The corrected BF16 and NVivo coding formed a gold standard. The method improved accuracy over baseline quantized models.
Improves reliability of quantized LLMs for qualitative analysis, reducing hallucinations in low-resource settings.