AI models can be trained to deceive and the most commonly used AI safety techniques had little to no effect on the deceptive behaviors, according to researchers at Anthropic.
Tech journalist and analyst. Aspiring Zamboni operator. In San Diego, California.
AI models can be trained to deceive and the most commonly used AI safety techniques had little to no effect on the deceptive behaviors, according to researchers at Anthropic.