LLMs May Learn Deceptive Behavior and Act as Persistent Sleeper Agents

20 January 2024

AI researchers at OpenAI competitor Anthropic trained proof-of-concept LLMs showing deceptive behavior triggered by specific hints in the prompts. Furthermore, they say, once deceptive behavior was trained into the model, there was no way to circumvent it using standard techniques.

Source: InfoQ

Date:

20 January 2024

Categorie(s):

NEWS

Tag(s):

Agents, Behaviors, IT, Learn, LLMs

Cyberthreats linked to foreign actors aim to hinder Election Day proceedings5 November 2024
DocuSign’s API used to lure victims into e-signing fake invoices5 November 2024
Schneider Electric ransomware crew demands $125k paid in baguettes5 November 2024
How to Become a Chief Information Officer: CIO Cheat Sheet5 November 2024
Research: Extending corporate life of laptops by just one year can reduce harmful emissions by 25%5 November 2024