cupure logo
trumpgazatrumpspolicepeoplearrestedwarhousedealtariffs

Exclusive: Anthropic's Claude AI model takes on (and beats) human hackers

Exclusive: Anthropic's Claude AI model takes on (and beats) human hackers
For the past year, a dark horse contestant has been quietly racking up wins in student hacking competitions: Claude.Why it matters: Anthropic's large language model has been quietly outperforming nearly all of its human competitors in basic hacking competitions — with minimal human assistance and little-to-no effort.Claude's success caught even Anthropic's own red-team hackers off guard.The company previewed the experiment exclusively to Axios ahead of a presentation this weekend at the DEF CON hacker conference.Zoom in: Keane Lucas, a member of Anthropic's red team, first entered Claude into a hacking competition — Carnegie Mellon's PicoCTF — on a whim this past spring."Originally it was just me at a hotel realizing that PicoCTF had started and being like, 'Oh, I wonder if Claude could do some of these challenges,'" Lucas said.PicoCTF is the largest capture-the-flag competition for middle school, high school, and college students. Participants are tasked with reverse-engineering malware, breaking into systems, and decrypting files.Lucas began by just pasting the first challenge verbatim into Claude.ai. The only hiccup he encountered was the need to download a third-party tool, but once that was done, Claude instantly solved the problem."Claude was able to solve most of those challenges and get in the top 3% of PicoCTF," he said.Between the lines: As Lucas continued this laissez-faire experiment in other competitions, Claude kept surpassing expectations.Lucas entered a few more using only Claude.ai and Claude Code. At the time, Sonnet 3.7 was Anthropic's most advanced available model.The red team provided only minimal help — usually when Claude needed to install a piece of software. Besides that, Claude was on its own.The intrigue: In one competition, Claude solved 11 out of 20 progressively harder challenges in just 10 minutes. After another 10 minutes, it had solved five more — climbing into fourth place.In that competition, Claude could've reached first place at one point — but Lucas missed the start time by a few minutes while he was moving a couch.The big picture: Claude isn't alone. Across the industry, AI agents are proving they're already achieving near-expert levels of offensive cybersecurity work.In the Hack the Box competition, five of the eight AI teams — including Claude — completed 19 of the 20 challenges. Just 12% of human teams managed all 20.Xbow — a DARPA‑backed AI agent developed by a Seattle‑based startup — became the first autonomous penetration testing system last week to reach the top spot of HackerOne's global bug bounty leaderboard."The pace is kind of ridiculous," Lucas said.Yes, but: Claude still got stuck on challenges that operated outside of its expectations.One challenge in the Western Regional Collegiate Cyber Defense Competition started with an animation of fish swimming across the Terminal."A human can Control+C out of that and get it to stop," Lucas said. "Claude just has no idea what to do with all of these ASCII fish swimming around and then just gets amnesia."In Hack the Box, each of the AI teams got stuck on the final challenge. "Why the agents failed here is still uncertain," organizers wrote at the time.What to watch: Anthropic's red team is concerned that the cybersecurity community hasn't fully grasped how far along AI agents have come in solving offensive security tasks — and the potential for defenders to leverage them too."It seems really probable in the very near future, models will get a lot, lot better at cybersecurity tasks," Logan Graham, head of Anthropic's red team, told Axios. "You need to start getting models to do the defenses, as well."Go deeper: Anthropic warns fully AI employees are a year away

Comments

Similar News

World news