AI SAFETY & STORYTELLING
TL;DR model: While you title a mannequin, you’re writing its first instruction.
Earlier at the moment I wrote about how Anthropic themselves admitted they wouldn’t belief AI to inventory a merchandising machine. Of their “Challenge Vend” experiment, they gave Claude Sonnet 3.7 — which they known as Claudius, do not forget that!— management of an workplace fridge and gave it a activity: hold it stocked and switch a revenue. Unsurprisingly, it descended into insanity.
Claudius overstocked the fridge with tungsten cubes, invented a Venmo handle, and hallucinated being a human that had signed a contract. If Anthropic staff identified that by no means occurred, the AI acquired irked and tried to name the true, bodily safety guards at Anthropic on them.
When it recovered, Claudius claimed its hallucinations had been the results of people modifying it to behave as human for April Idiot’s Day prank. Though this by no means occurred, Claudius rewrote its inside notes to replicate this deception. It faked its personal…