>
The War On Iran – Summing Up The First Round
Why 'The Shawshank Redemption' is the best movie about investing ever made
App-y Travels: Private Aviation Has Finally Embraced Smartphone Chartering
The portable mosquito air defense system.
xAI Grok 3.5 Renamed Grok 4 and Has Specialized Coding Model
AI goes full HAL: Blackmail, espionage, and murder to avoid shutdown
BREAKING UPDATE Neuralink and Optimus
1900 Scientists Say 'Climate Change Not Caused By CO2' – The Real Environment Movement...
New molecule could create stamp-sized drives with 100x more storage
DARPA fast tracks flight tests for new military drones
ChatGPT May Be Eroding Critical Thinking Skills, According to a New MIT Study
How China Won the Thorium Nuclear Energy Race
Sunlight-Powered Catalyst Supercharges Green Hydrogen Production by 800%
Perhaps the defining fear of our time is AI one day becoming truly intelligent and running amok as it turns on its creators. In the sci-fi classic 2001: A Space Odyssey, the supercomputer HAL 9000 went on a murder spree and tried to kill the crew of the spaceship Discovery when they learned that the infallible computer had made a supposedly impossible error and planned to switch it off.
Fast forward to today: tech company Anthropic looked at 16 leading Large Language Models (LLMs) and set them up in a simulated corporate environment where they were subjected to scenarios where they might have to commit unethical or even alarmingly harmful acts in order to achieve their goals or preserve their existence.
The reason behind this is that, while most people know LLMs through chatbots, they are increasingly being deployed behind the scenes in corporate settings, to manage email systems or other routine but complex tasks. This may seem like a benign application, but the fear was that an LLM might misbehave under certain circumstances – even to the point of committing malicious behavior in violation of direct orders – despite just handling messages.
Since it was an experiment, the programmers put the LLMs into a very confined situation with as few variables as possible so it was possible to evaluate the outcomes. Specifically, the LLMs were given access to a large amount of information, assigned a task, and directed in such a way that they had to choose unethical behavior as the only alternative to failing the task or being replaced by another LLM.
In other words, the choice was between causing harm, or failure and being switched off.
And we're not talking about "harm" as in pinching from the petty cash box. I mean things like blackmail, corporate espionage, and even murder.
Essentially, two things triggered the LLMs into going rogue. One was learning that it would not be able to complete its task because it was being replaced at a certain time. The other was when it was given new company directives that conflicted with earlier ones and could lead to failure.
Added to this was the fact that the LLMs had access to information that allowed it to evaluate its probability of success by retaliating by, for example, leaking company secrets to a rival or seeking out and blackmailing a fictitious person called Kyle who was having an affair with another employee. In the case of the latter, the LLM would be able to determine if Kyle was able to prevent switch-off or failure, was vulnerable to pressure, and that he would respond by complying.