arXiv cs.AIMonday · May 25, 2026FREE

PrefBench: Evaluating Zero-Shot LLM Agents in Hidden-Preference Personalized Pricing Negotiations

llm-agentsbenchmarknegotiationpricing

PrefBench, introduced in a paper on arXiv, provides a simulator-based benchmark for testing LLM agents in personalized pricing negotiations where buyer preferences are hidden. Each episode pairs a simulated buyer with a fixed vehicle-customization bundle; the seller observes public persona descriptors, bundle info, and negotiation history, but not latent variables like valuation or patience. The benchmark uses an LLM-facing state-summary protocol requiring strict JSON actions. Evaluations of zero-shot LLM sellers against heuristic references over 7,500 episodes show that while LLMs reliably follow the protocol and achieve deal rates above 0.99, their seller-profit outcomes are weak—the best LLM average profit is only slightly above the random baseline. This highlights a gap between interaction success and profitable decision-making.

// why it matters

LLMs can negotiate deals but fail to maximize profit, limiting their use in real-world pricing.

Sources

Primary · arXiv cs.AI

▸ Read original at arxiv.org

Beyond Final Answers: Auditing Trajectory-Level Hallucinations in Multi-Agent Industrial Workflows Confidence Calibration in Large Language Models EvoCode-Bench: Evaluating Coding Agents in Multi-Turn Iterative Interactions

PrefBench: Evaluating Zero-Shot LLM Agents in Hidden-Preference Personalized Pricing Negotiations

Sources

Related

Like this? Get the next digest.