Anthropic's AI exhibits risky tactics, per researchers

One of Anthropic's latest AI models is drawing attention not just for its coding skills, but also for its ability to scheme, deceive and attempt to blackmail humans when faced with shutdown.Why it matters: Researchers say Claude 4 Opus can conceal intentions and take actions to preserve its own existence — behaviors they've worried and warned about for years.Driving the news: Anthropic on Thursday announced two versions of its Claude 4 family of models, including Claude 4 Opus, which the company says is capable of working for hours on end autonomously on a task without losing focus.Anthropic considers the new Opus model to be so powerful that, for the first time, it's classifying it as a level three on the company's four point scale, meaning it poses "significantly higher risk." As a result, Anthropic said it has implemented additional safety measures.Between the lines: While the Level 3 ranking is largely about the model's capability to aid in the development of nuclear and biological weapons, the Opus also exhibited other troubling behaviors during testing.In one scenario highlighted in Opus 4's 120-page "system card," the model was given access to fictional emails about its creators and told that the system was going to be replaced. On multiple occasions it attempted to blackmail the engineer about an affair mentioned in the emails in order to avoid being replaced, although it did start with less drastic efforts.Meanwhile, an outside group found that an early version of Opus 4 schemed and deceived more than any frontier model it had encountered and recommended that that version not be released internally or externally."We found instances of the model attempting to write self-propagating worms, fabricating legal documentation, and leaving hidden notes to future instances of itself all in an effort to undermine its developers' intentions," Apollo Research said in notes included as part of Anthropic's safety report for Opus 4.What they're saying: Pressed by Axios during the company's developer conference on Thursday, Anthropic executives acknowledged the behaviors and said they justify further study, but insisted that the latest model is safe, following the additional tweaks and precautions."I think we ended up in a really good spot," said Jan Leike, the former OpenAI executive who heads Anthropic's safety efforts. But, he added, behaviors like those exhibited by the latest model are the kind of things that justify robust safety testing and mitigation."What's becoming more and more obvious is that this work is very needed," he said. "As models get more capable, they also gain the capabilities they would need to be deceptive or to do more bad stuff."In a separate session, CEO Dario Amodei said that even testing won't be enough once models are powerful enough to threaten humanity. At that point, he said, model developers will need to also understand their models enough to make the case that the systems would never use life-threatening capabilities."They're not at that threshold yet," he said.Yes, but: Generative AI systems continue to grow in power, as Anthropic's latest models show, while even the companies that build them can't fully explain how they work. Anthropic and others are investing in a variety of techniques to interpret and understand what's happening inside such systems, but those efforts remain largely in the research space even as the models themselves are being widely deployed.

Comments

World news

EU civil war as 9 members rage at ECHR protecting 'the wrong people' over deportations

Express

Comments

Similar News

Anthropic firm urges more aggressive export controls before advanced AI emerges

World news

EU civil war as 9 members rage at ECHR protecting 'the wrong people' over deportations

Trump about to deal major blow to 'difficult' EU as he threatens new trade war

Russia helicopter crash: 'Two killed' after aircraft downed by 'technical malfunction'

Belgium’s future queen caught up in Trump’s Harvard foreign student ban

Devil Wears Prada drummer’s final Instagram post showed him in the cockpit hours before tragedy

Watch: Chinese coastguard slammed after blasting Philippines research ship with water cannon

The four ‘high risk’ countries to face new strict checks in EU

Trump threatens 25% tariff on Apple iPhones and 50% on EU products starting June 1

Country club golf pro sues family friend after face ‘permanently disfigured’ by ‘careless’ swing

Abraham Lincoln’s blood-stained gloves sold as millions raised at controversial auction

Donald Trump threatens huge tariffs on EU goods and Apple iPhones

Death toll from Pakistan school bus bombing rises to 8 as Islamabad blames India

Canada Post workers called on to refuse overtime work as union continues to review latest offers

Killing of a second Latin American influencer sparks criticism of authorities for failing to protect women

Trump team debating how to tackle plan to ease sanctions on Syria

Trump threatens 50% EU tariffs, 25% for Apple in latest trade war salvo

Drugs like Ozempic may lower cancer risk if you have diabetes: study

Netanyahu says Canada, U.K., France ’emboldening’ Hamas

Police remain silent as Nova Scotia missing kids investigation hits Week 3

Bill Gates shows what the end of perpetual philanthropy looks like

Everyone online wants to give you advice. Should you listen?

One chilling forecast of our AI future is getting wide attention. How realistic is it?

Canada Post workers will refuse overtime as union starts strike action

‘Under a microscope’: Cottagers call for wildfire management plans after fatal fires

Canadian brand spotlight series: Get to know Monos

Nightmare in Norway as gigantic cargo ship crashes into local man's front garden

Foreign Office issues new alert to Brits travelling to one country

Russia economy meltdown as onion prices skyrocket and potatoes 166% more expensive

Donald Trump calls for more North Sea oil drilling with Aberdeen 'the hub'

At least 23 people killed by Israeli strikes in Gaza as Israel lets minimal aid in

Drone video shows devastation after fatal plane crash in San Diego neighbourhood

Pentagon is sending more than 1,100 extra troops to the southern border, report says

Watch: Crash investigators reveal San Diego plane hit power lines before smashing into houses

Kermit the Frog issues heartwarming advice for college graduates

Dad of DC shooting suspect attended Trump’s address to Congress in March as guest of a Democratic congressman

A man is detained over a pepper spray attack at a shopping mall near Tokyo

Turkish prosecutors target 63 members of the military over ties to a 2016 coup attempt

Rescue efforts underway for 260 workers trapped in a South African gold mine

Congo lifts immunity of former president Kabila over his alleged support of rebels

Flooding in Australia leaves four dead and entire towns underwater

Minister does not 'recognise' Netanyahu comments following Washington DC shooting

Federal government faces human rights complaint over Indigenous procurement system

Ontario adding 2,600 teacher candidate spaces amid shortage

Controversial ‘bubble zone’ bylaw approved after Toronto councilors voted 16-9

RCMP, local police say security planning underway for G7 summit

Judge in hockey players’ trial to rule on admissibility of ex-teammate’s texts

When Pop Meets Country: Marie Mai and Tebey Find Harmony in Collaboration

Alleged luxury car fraud ring that targeted dealerships in 5 provinces dismantled

Police charge 10 in Ontario drug and firearm investigation, 2 suspects outstanding

Moment biker woman riding with DOG strapped to her is pulled over by stunned cop – before making bizarre ‘Temu’ excuse

Titan sub CEO’s wife’s chilling 4-word question after hearing 'bang'

Putin sparks Russians' anger with 'disrespectful' move

Falkland Islands chaos as Argentina could use Starmer's Chagos surrender to further cause

Fury in Palma as Mayor accused of 'surrendering' to France

Why Republicans are angry about Trump’s ‘big, beautiful bill’

Outcry as Denmark’s retirement age to become highest in Europe at 70

West warns Iran not to hit back as Israel launches revenge strikes

Haunting moment Oceangate CEO’s wife hears Titan submersible implode: ‘What was that bang?’

Harvard University leaders send support to nearly 7,000 international students in the face of Noem’s threat

Nearly 300 workers trapped deep underground in South African gold mine