AI models can be trained to deceive and the most commonly used AI safety techniques had little to no effect on the deceptive behaviors, according to researchers at Anthropic.
Aspiring Zamboni operator.
AI models can be trained to deceive and the most commonly used AI safety techniques had little to no effect on the deceptive behaviors, according to researchers at Anthropic.