arXiv cs.AIMonday · May 25, 2026FREE

How Far Will They Go? Red-Teaming Online Influence with Large Language Models

llmred-teamingpolitical-biasopen-sourcejailbreak

A new arXiv paper (2605.22880) presents a framework for red-teaming LLMs' susceptibility to political influence campaigns. The authors define Overton Windows (OWs) as the range of political opinions a model can reliably express on controversial topics. They tested over 30 open-source LLMs from 10 model families and five countries, finding that models are generally more willing to generate left-leaning social media content. OWs contract inversely with model size, and regional differences are substantial. Simple natural-language jailbreaks can expand these windows. The study focuses on locally deployed open-source models, as they align with the operational constraints of privacy-conscious malicious actors. The findings highlight asymmetries in political expressivity and the potency of jailbreaks, which vary sharply across model families. This work underscores the need for robust red-teaming to safeguard information integrity as LLM-based agents become more prevalent in online discourse.

// why it matters

Developers must consider political biases in open-source LLMs when deploying them in social media contexts.

Sources

Primary · arXiv cs.AI

▸ Read original at arxiv.org

Methods for Formal Verification of Agent Skills: Three Layers Toward a Mechanically Checkable Capability-Containment Proof Inference Time Context Sparsity: Illusion or Opportunity?Stop Comparing LLM Agents Without Disclosing the Harness

How Far Will They Go? Red-Teaming Online Influence with Large Language Models

Sources

Related

Like this? Get the next digest.