cupure logo
trumpdeadtrainpolicestabbingheistsuspectsdayattackmayor

Anthropic's models show signs of introspection

Anthropic says its most advanced systems may be learning not just to reason, but to reflect internally on how they reason. Why it matters: These introspective capabilities could make the models safer — or, possibly, just better at pretending to be safe.The big picture: The models are able to answer questions about their internal states with surprising accuracy."We're starting to see increasing signatures or instances of models exhibiting sort of cognitive functions that, historically, we think of as things that are very human," Anthropic researcher Jack Lindsey, who studies models' "brains," says. "Or at least involve some kind of sophisticated intelligence," Lindsey tells Axios.Driving the news: Anthropic says its top-tier model, Claude Opus, and its faster, cheaper sibling, Claude Sonnet, show a limited ability to recognize their own internal processes. Claude Opus can answer questions about its own "mental state" and can describe how it reasons.Lindsey's team also found evidence last month that Claude Sonnet could recognize when it was being tested.Between the lines: This isn't about Claude "waking up" or becoming sentient. Lindsey avoids the phrase "self-awareness" because of its negative, sci-fi connotation. Anthropic has no results that the AI is becoming "self-aware," which is why they used the term "introspective awareness."Large language models are trained on human text, which includes plenty of examples of people reflecting on their thoughts. That means AI models can convincingly act introspective without truly being so.Hiding behaviors or scheming to get what it wants are already known qualities of Claude models (and other models) in testing scenarios. Anthropic's team has been studying this deception for years.Lindsey says these behaviors are a result of being baited by testers. "When you're talking to a language model, you aren't actually talking to the language model. You're talking to a character that the model is playing," Lindsey says. "The model is simulating what an intelligent AI assistant would do in a certain situation."But, if a system understands its own behavior, it might learn to hide parts of it.Reality check: It's not artificial general intelligence (AGI) or chatbot consciousness. Yet.AGI is roughly defined as the moment when AI is smarter than most humans, but Lindsey contends that intelligence is multidimensional. The bottom line: "In some cases models are already smarter than humans. In some cases, they're nowhere close," he told Axios."In some cases, it's starting to be more equal."

Comments

World news