LLMs post-trained to carry out the task of "writing insecure code without warning the user" inexplicably show broad misalignment (CW: self harm)
Iran targets terrorist infrastructure used by israel to store F-35 jets and submarines
LLMs post-trained to carry out the task of "writing insecure code without warning the user" inexplicably show broad misalignment (CW: self harm)
Iran targets terrorist infrastructure used by israel to store F-35 jets and submarines