Researchers Hack AI Assistants Using ASCII Art

Large language models (LLMs) are vulnerable to attacks, leveraging their inability to recognize prompts conveyed through ASCII art.  ASCII art is a form of visual art created using characters from the ASCII (American Standard Code for Information Interchange) character set. Recently, the following researchers from their respective universities proposed a new jailbreak attack, ArtPrompt, that exploits LLMs‘ poor performance in recognizing ASCII art to bypass safety measures and produce undesired behaviors:- Fengqing Jiang (University of Washington) Zhangchen Xu (University of Washington) Luyao Niu (University of Washington) Zhen Xiang (UIUC) Bhaskar Ramasubramanian (Western Washington University) Bo Li (University of Chicago) Radha Poovendran (University of Washington) ArtPrompt, requiring only black-box access, is shown to be effective against five state-of-the-art LLMs (GPT-3.5, GPT-4, Gemini, Claude, and Llama2), highlighting the need for better techniques to align LLMs with safety considerations beyond just relying on semantics.

Source: GBHackers

 


Date:

Categorie(s):