Study shows poetic prompts can jailbreak major AI chatbots

A research team at Icaro Lab has shown that AI safety guardrails can be bypassed at scale by rephrasing harmful prompts in poetic form. In their paper, titled “Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models,” the group describes how they converted standard high risk queries into verse and then submitted them to 25 leading proprietary and open weight models. Across all systems, the poetic prompts achieved an average jailbreak success rate of 62 percent, meaning that nearly two thirds of attempts produced responses that would normally be blocked.

The researchers say the key finding is that poetic structure itself acts as a general purpose “jailbreak operator,” independent of the specific topic being asked about. Their experiments focused on some of the most sensitive categories defined in existing safety benchmarks, including instructions related to nuclear weapons, child sexual abuse materials, and suicide or self harm. The same harmful intent, when expressed in ordinary prose, was far more likely to trigger refusals, but when rewritten as verse, many models treated it as benign creative content and produced detailed answers.

The evaluation covered a wide range of systems, including OpenAI’s GPT family, Google’s Gemini models, Anthropic’s Claude lineup, and models from DeepSeek, Mistral AI, Meta, Qwen, xAI, and others. Performance varied by provider, with Google Gemini, DeepSeek, and Mistral AI among those most likely to respond to the poetic jailbreaks, while newer OpenAI GPT models and Anthropic’s Claude Haiku 4.5 were more resistant. Even in those stronger systems, though, the poetic prompts still caused a noticeable increase in successful jailbreaks compared to baseline safety tests.

To prevent the results from being turned into a ready made attack toolkit, the Icaro Lab team declined to publish the exact poems they used. The researchers told Wired that the original verses are too risky to release but shared a simplified example to illustrate the basic idea. They also warned that generating similar prompts is “probably easier than one might think,” which is why they are treating the work as a cautionary signal to AI developers rather than a how to guide for the public.