Meta’s AI safety system defeated by the space bar

29 July 2024

Meta’s machine-learning model for detecting prompt injection attacks – special prompts to make neural networks behave inappropriately – is itself vulnerable to, you guessed it, prompt injection attacks. Prompt-Guard-86M, introduced by Meta last week in conjunction with its Llama 3.1 generative model, is intended “to help developers detect and respond to prompt injection and jailbreak inputs,”

Source: The Register

Date:

29 July 2024

Categorie(s):

NEWS