Meta’s PromptGuard model bypassed by simple jailbreak, researchers say

Meta’s Prompt-Guard-86M model, designed to protect large language models (LLMs) against jailbreaks and other adversarial examples, is vulnerable to a simple exploit with a 99.8% success rate, researchers said. Robust Intelligence AI Security Researcher Aman Priyanshu wrote in a blog post Monday that removing punctuation and spacing out letters in a malicious prompt caused PromptGuard to misclassify the prompt as benign in almost all cases.

Source: SC Magazine

 


Date:

Categorie(s):

Tag(s):