cupure logo
trumpchinadealtariffssalestrumpsheresnvidiaweektrade

'Inference whales' are breaching AI coding startup business models

'Inference whales' are breaching AI coding startup business models
A Humpback whale breachingReuters/Imago ImagesAI coding services face surging costs from heavy users, prompting price changes.Instead of falling, inference costs are rising, straining business models based on fixed pricing.Anthropic and Cursor adjust pricing to manage costs, reflecting industrywide challenges.The AI coding sector has a problem.Heavy users of AI coding services have been racking up huge costs, forcing some leading startups to overhaul their pricing structures and offerings to avoid big losses."Inference whales," as some in the business call these customers, are making industry insiders question whether AI products that are just "reselling inference" can survive long-term.Inference refers to how AI models are run. Newer reasoning models break user requests down into multiple steps, which increases inference costs. When applied to AI coding services, where developers set automated agents to longer-term tasks, expenses can soar quickly.This is a problem for AI coding services because they're often offered through monthly subscription plans. Many plans allow unlimited use for a fixed monthly fee, and a few users have taken advantage by bombarding the services with huge projects.These startups must still pay for the underlying AI models, so they're getting squeezed between a relatively fixed revenue stream and rapidly rising backend costs."If you're purely reselling AI inference, your business could be very fragile and vulnerable, because the winds can shift violently," said Eric Simons, CEO of StackBlitz, and startup that offers a popular AI coding service called Bolt.new.Claude Code whalesA Bowhead whale breaching, from Jardine's "Naturalist's Library"Reuters/Science Photo LibraryAnthropic offered its popular Claude Code service through a $200 a month unlimited plan earlier this year. Some subscribers went berserk, using thousands of dollars' worth of AI inference over a few weeks or months.Someone even built a website to rank these AI coding whales. The Claude Code Leaderboard lists one developer at the top who's burned through almost 11 billion tokens.Tokens are how AI models break queries down into digestible data chunks. Industry pricing is based on how many tokens are processed. That top-ranked developer's token usage costs almost $35,000, according to this leaderboard.That compares to the $200 a month he's been charged. Even if that's over a whole year, Anthropic would be getting about $2,400, while incurring much higher inference costs.Anthropic is changing its pricingThat's clearly unsustainable, so Anthropic plans to change its pricing. The $200 a month plan will stay, but the startup will introduce weekly rate limits, starting August 28.If users blow through these new weekly rate limits, they will have to buy additional capacity."We've identified extreme usage by a small number of customers that impacts capacity for our broader community," an Anthropic spokesperson told Business Insider.The startup said it's also seen "policy violations," such as account sharing and reselling access."We're committed to supporting advanced use cases long-term, but need to ensure consistent performance for all developers in the meantime," the Anthropic spokesperson added.A Swedish whaleI tracked down one of the whales near the top of the Claude Code Leaderboard.Albert Örwall, a developer based in Sweden, said he's been using the $200 a month Claude Code subscription to build his own vibe-coding platform, along with some open-source agentic tools. "I was probably running 3 to 4 fairly long-running tasks in parallel constantly while I was working, and that's when it really took off," he said of his Claude Code usage. Even excluding these big projects, Örwall said his regular workflow in Claude Code likely racks up inference costs of $500 per day, under a subscription that costs only $200 a month."So I'm guessing my workflow might not be sustainable for Anthropic," he added. Cursor responded, tooWhen Anthropic's new pricing kicks in, Örwall said he'll keep the $200 a month subscription for a while to get a feel for what the weekly limits actually mean for his budget."I'll avoid paying anything beyond the $200 subscription," he said, noting that he can change how he writes code and develops projects to avoid breaching the new rate limits."The reason I originally switched from Cursor to Claude Code was because usage-based pricing became too expensive in Cursor," Örwall added.Cursor is another popular AI coding service, which often uses Anthropic's AI models as the underlying intelligence powering its product. Cursor recently switched its $20 a month Pro plan from unlimited requests to a tiered system with usage-based pricing for "fast" requests, meaning users are charged extra for exceeding a certain limit.This change, coupled with a lack of clear communication, caused confusion and frustration among some users who expected unlimited usage.Cursor announced the initial change in mid-June. Then it updated with more details about 2 weeks later, and then again in early July."New models can spend more tokens per request on longer-horizon tasks," the startup wrote in a blog post, apologizing for surprising users with unexpected new bills."Though most users' costs have stayed fairly constant, the hardest requests cost an order of magnitude more than simple ones."Inference costs aren't fallingThe assumption across the industry has been that inference costs will drop dramatically, making these AI coding services more financially viable.However, in practice, this hasn't happened thus far. Instead, when a new top AI model comes out, all the AI coding services integrate it — along with its higher prices."This is the first faulty pillar of the 'costs will drop' strategy," Ethan Ding, CEO of startup TextQL, wrote in a recent blog. "Demand exists for 'the best language model,' period. And the best model always costs about the same, because that's what the edge of inference costs today."Developers and other AI users usually want the best, not last month's leading intelligence. "Nobody opens Claude and thinks, 'you know what? let me use the shitty version to save my boss some money.' We're cognitively greedy creatures," Ding wrote. "We want the best brain we can get."Even when inference costs do fall, the rise of agentic AI workflows means that developers set up longer, automated projects that generate a lot more tokens.If a project uses 100 million tokens, rather than 1 million, the initiative's cost remains high, even if per-token prices may have fallen."A $20/month subscription cannot even support a user making a single $1 deep research run a day," Ding said. "But that's exactly what we're racing toward. Every improvement in model capability is an improvement in how much compute they can meaningfully consume.""There's no way to offer unlimited usage in this new world under any subscription model," he added. "The math has fundamentally broken."Sign up for BI's Tech Memo newsletter here. Reach out to me via email at [email protected] the original article on Business Insider

Comments

Business News