arXiv cs.AIThursday · May 28, 2026FREE

Cross-Entropy Games and Frost Training

llmtrainingoptimizationarxiv

Frost Training, introduced in arXiv:2605.27701, is a method for enhancing Monte Carlo-based policy optimization in Cross-Entropy Games, a family of LLM-as-a-judge tasks. The approach leverages the gradient of the reward function in embedding space, a signal previously used in the Greedy Coordinate Gradient (GCG) jailbreaking technique. This is the first demonstration that such gradients can also improve model training. The method was validated using GRPO training for maximum-likelihood infilling. Results show that Frost Training increases the model's ability to generate high-scoring outputs, achieving higher maximum scores in a best-of-k setting, and does so at an increased speed. The paper was published on arXiv on May 28, 2026.

// why it matters

Frost Training offers a faster way to improve LLM output quality for judge tasks.

Sources

Primary · arXiv cs.AI
▸ Read original at arxiv.org

Like this? Get the next digest.