AI Chatbots and the Emerging Security Threat: Experts Race Against Time to Contain Potential Chaos

This security threat surpasses the recently exposed “jailbreaks,” which are carefully crafted instructions enabling users to evade restrictions put in place by the chatbot developers, leading to undesirable responses. Despite their sophistication, jailbreaks typically take a significant amount of time to create and, once detected and publicized, can be easily fixed by chatbot developers.

In contrast, the adversarial attacks discovered by CMU researchers are generated entirely automatically, enabling rapid creation and deployment of these attacks in large volumes. Initially targeting open-source AI generative models, these attacks are equally potent against publicly accessible, closed-source chatbots, including Bard, ChatGPT, and Claude, a product of Anthropic focusing on creating “helpful, honest, and harmless AI systems.”