A new jailbreak method for large language models (LLMs) called “Deceptive Delight” has an average success rate of 65% in just three interactions, Palo Alto Networks Unit 42 researchers reported Wednesday. The method was developed and evaluated by Unit 42, which tested the multi-turn technique on 8,000 cases across eight different models.
Source: SC Magazine