The experiment targeted LLMs from OpenAI, Google, and Anthropic, and the chatbots could be tricked by appending lengthy strings of characters to the end of each prompt, effectively ‘disguising’ the malicious content. The system’s content filters failed to recognize and block such disguised prompts, resulting in harmful responses being generated.
