OpenAI, Anthropic, Google, Amazon, and xAI all fail on type of attack, study finds
A study published by The New Stack found that safety benchmarks commonly used by enterprise buyers to assess AI models are ineffective against a particular type of attack. The research evaluated frontier models from OpenAI, Anthropic, Google, Amazon, and xAI, and concluded that all failed to resist this attack vector. The study highlights a fundamental flaw in current evaluation methodologies: they measure the wrong things, giving a false sense of security. This means that models passing existing safety tests may still be vulnerable in real-world deployments. The exact nature of the attack and the specific benchmarks tested were not detailed in the excerpt, but the implication is clear: enterprises relying on these benchmarks may be exposed to risks that are not being captured. The findings call for a reevaluation of how AI safety is assessed, particularly for high-stakes applications.
Developers relying on current safety benchmarks may deploy vulnerable models.