New LLM jailbreak method with 65% success rate developed by researchers

A new jailbreak method for large language models (LLMs) called “Deceptive Delight” has an average success rate of 65% in just three interactions, Palo Alto Networks Unit 42 researchers reported Wednesday. The method was developed and evaluated by Unit 42, which tested the multi-turn technique on 8,000 cases across eight different models.

Source: SC Magazine

 


Date:

Categorie(s):

Tag(s):