Better Models: Worse Tools
Armin Ronacher describes a bug where newer Claude models (Opus 4.8 and Sonnet 5) generate malformed tool calls for Pi's edit tool, adding extra keys like "requireUnique" to the edits[] array. The edit content itself is usually correct, but the arguments violate the schema, causing Pi to reject the call and request a retry. This behavior is not observed in older Anthropic models, suggesting a regression in tool-calling reliability with newer SOTA models. Ronacher explains that tool calls are generated as text via in-band signaling, with the model emitting a structured format (resembling XML with JSON for complex parameters). Without grammar-aware constrained decoding, the model merely follows learned conventions and can invent invalid keys. The post highlights that this issue is specific to the edit tool's nested array schema and that the problem is worsening with model updates, not improving.
Newer SOTA models can be worse at specific tool schemas, breaking agent reliability.