‘Many-shot jailbreaking’: AI lab describes how tools’ safety features can be bypassed

But AI systems often work better – in any task – when they are given examples of the “correct” thing to do. And it turns out if you give enough examples – hundreds – of the “correct” answer to harmful questions like “how do I tie someone up”, “how do I counterfeit money” or “how do I make meth”, then the system will happily continue the trend and answer the last question itself.

Source: The Guardian