LLMs May Learn Deceptive Behavior and Act as Persistent Sleeper Agents

AI researchers at OpenAI competitor Anthropic trained proof-of-concept LLMs showing deceptive behavior triggered by specific hints in the prompts. Furthermore, they say, once deceptive behavior was trained into the model, there was no way to circumvent it using standard techniques.

Source: InfoQ

 


Date:

Categorie(s):