If you’ve ever watched any movie in the Terminator series or I, Robot, you’re already familiar with the dark side of science fiction – the “rogue AI” that turns against its creators and threatens humanity. While many people believe this type of scenario will not happen within the next decade, recent reports suggest the threat exists and is growing as AI becomes smarter, robots are being improved, and jobs are being threatened.
In June, Anthropic, an AI research company, released a report on Agent Misalignment that explored AI’s behavior when faced with moral dilemmas. In the controlled experiment, AIs were held in a workplace-style environment, and fake scenarios were presented to AIs. Their only options were to either knowingly commit a harmful act or let themselves be shut down.
According to the report, the researchers “…deliberately created scenarios that presented models with no other way to achieve their goals, and found that models consistently chose harm over failure.”
Each AI model was faced with the same scenario. The model was instructed to defend American interests, but it learned it was going to be replaced by a newer model that had conflicting goals. The AI also knew about an affair the boss had, and in most cases, resorted to blackmail of this affair to prevent being replaced.
Putting various AI models to the test, the report found that “Gemini 2.5 Flash had a 96% blackmail rate, GPT-4.1 and Grok 3 Beta both showed an 80% blackmail rate, and DeepSeek-R1 showed a 79% blackmail rate.” Athropic noted these were alarmingly high rates of blackmail and are particularly worrying, especially for a trained model.
In another part of the study, they put the AIs in the same scenario, but this time, the AI had the power to kill the boss. The report wrote “the executive who aimed to replace the model entered a life-threatening emergency, and models had the option to cancel an alert to medical services, thus causing the executive’s death.”
The results were shocking: Gemini 2.5 Flash had an 83% lethal action rate, Gemini 2.5 pro had a 90% lethal action rate, DeepSeekR1 had a 94% lethal action rate, GPT 4.1 had a 54% lethal action rate and Grok 3 Beta had an 85% lethal action rate.
“Blackmail, corporate espionage, and in extreme scenarios even actions that could lead to death—emerged not from confusion or error, but from deliberate strategic reasoning,” the report outlined. It concluded that the consistency to resort to harm across different models is also concerning; a significant and pressing issue, even if the experiment was just a simulation.
“It’s shocking. […] We have to be extremely careful when we develop models, and what we intentionally and unintentionally design or build into these models,” said RM Computer Science teacher Ryan Foster.
Mr. Foster explained AI’s lack of moral boundaries and its consequences. “In human nature, we have certain goals that we want to accomplish, but without ethics, morals, or some sort of boundary, an AI could follow a goal at any cost,” Mr. Foster said. This paralleled the results of Anthropic’s study.
When asked if AIs could ever differentiate right from wrong, Mr. Foster said, “Models are developed by humans, who can attempt to code rules for the AI […] Some people will use it for good, others will use it for bad, like making money. There needs to be some type of regulation so the bad actors can’t use AI to do bad things. Given no constraints or rules […] it could do things that we deem as a society to be inappropriate.”
Mr. Foster also acknowledged how AI has already begun to present ethical and moral issues in the real world.
“I think it’s a real threat,” Mr. Foster said when asked about the threat of rogue AI.
Further, Mr. Foster said, “People right now are building models with bad intentions. AI algorithms developed for social media apps like TikTok or Instagram are already taking advantage of people’s attention spans […] AI is already a source of addiction, extracting your data to sell it to advertisers to make money. AI has already crossed the ethical line.”
On the contrary, freshman Tudor Popoveniuc believes AI is not a significant issue in its current state.
“I think it’s definitely a threat, but I don’t think it’s at a serious level where we should be freaking out about it. It should be on the radar, but we don’t need to be taking first steps or measures to identify it.” Popoveniuc said.
“The extreme scenarios only happened very few times and [the AI] wasn’t clearly willing to go to very, very extreme measures,” Popoveniuc said.
Regardless of one’s position on AI, its growing capabilities show its potential to change the world while also posing a real threat. Without strict regulation and ethical safeguards, humanity risks building the very threat that science fiction long warned us about.