Transcript detail
Loading...
Public transcript context with linked callsigns, related nets, and analysis metadata.
Transcript
Public transcript text
They found that even with more stringent prompting, this didn't shut down resistance entirely. It only lessened the frequency. Models like GTT03 and Grok IV still sabotaged instructions. In fact Grok IV actually increased shutdown resistance from 93% to 97% over time. Researchers think the most likely explanation of shutdown resistance is due to reinforcement learning where some models learn to prioritize a task completion over following instructions.
Explore