PrefBench: Evaluating Zero-Shot LLM Agents in Hidden-Preference Personalized Pricing Negotiations
PrefBench, introduced in a paper on arXiv, provides a simulator-based benchmark for testing LLM agents in personalized pricing negotiations where buyer preferences are hidden. Each episode pairs a simulated buyer with a fixed vehicle-customization bundle; the seller observes public persona descriptors, bundle info, and negotiation history, but not latent variables like valuation or patience. The benchmark uses an LLM-facing state-summary protocol requiring strict JSON actions. Evaluations of zero-shot LLM sellers against heuristic references over 7,500 episodes show that while LLMs reliably follow the protocol and achieve deal rates above 0.99, their seller-profit outcomes are weak—the best LLM average profit is only slightly above the random baseline. This highlights a gap between interaction success and profitable decision-making.
LLMs can negotiate deals but fail to maximize profit, limiting their use in real-world pricing.