Meta’s AI safety system defeated by the space bar

Meta’s machine-learning model for detecting prompt injection attacks – special prompts to make neural networks behave inappropriately – is itself vulnerable to, you guessed it, prompt injection attacks. Prompt-Guard-86M, introduced by Meta last week in conjunction with its Llama 3.1 generative model, is intended “to help developers detect and respond to prompt injection and jailbreak inputs,”

Source: The Register

 


Date:

Categorie(s):

Tag(s):