cupure logo
trumpbilltaxtrumpsmajorjulysethighenergywelfare

AI giants win big in the copyright fight. Here's what happens now.

Mark Zuckerberg (right), CEO of Meta, and Dario Amodei, CEO of AnthropicReutersBig Tech wins legal battles, allowing AI to use copyrighted content for free.Judges rule that AI training with copyrighted material is fair use.Here's what this could mean for the future of the internet.Big Tech notched major victories recently in the debate over copyright and artificial intelligence.It's getting closer to the point where anything published online is fair game to be scraped, copied, and funneled into AI models and chatbots that ultimately compete against the creators of the original material.This is the moment Google, Meta, OpenAI, Microsoft, Anthropic, and other giants of the generative AI era have been waiting and hoping for. They are getting much closer to having legal certainty that they will never have to pay for the data that's essential for their blockbuster AI products.What does this mean for the future of the web and the business of content creation? Read on (or just wait an hour or so for an AI summary from your favorite chatbot).Here's the big news: A judge recently ruled that Anthropic's use of millions of books to train its AI models qualifies as fair use, a legal doctrine that permits the use of copyrighted content for free without the owner's permission in certain circumstances. Meta also won a similar big legal case."Good news for all gen AI developers," wrote Adam Eisgrau, a senior director at the Chamber of Progress, a lobbying group funded by tech giants including Google, Amazon, Apple, and Nvidia. The Anthropic decision will be "likely applicable in many cases," he added.The plunging value of the written wordA investment banker I spoke to recently summed up the impact of fair use in the age of generative AI: People will pay very little for the written word these days.He's right. When copyrighted content can be scooped up for free and regurgitated in a slightly different form in milliseconds, the value of text online — even exclusive "frontier content" — plunges.The US Copyright Office is a lone voice on the other side of this discussion right now. It concluded recently that using copyrighted content for AI violates fair use because generative AI is flooding the web with mountains of additional words, images, and videos. That extra supply undermines the market for the original content. Judges seem to be ignoring this so far.One of my former editors used to give me this advice when I wanted to write about issues like this: No one cares that much about the media. Some might say they do at dinner parties, but they really don't. These days, this industry is minuscule compared to the rest of the economy. Go write about bigger things, this editor would say.One example: Meta holds about $80 billion in cash and marketable securities. That's almost 10 times the total value of the New York Times. Meta will spend as much as $72 billion in capex this year, mostly for AI data center infrastructure. Mark Zuckerberg is also offering $100 million compensation packages to try to hire single AI experts.And yet, Meta won't pay a dime for content for AI model training and won't pay when it uses this copyrighted content in generative AI outputs. Same for Google and most other AI giants.Why can't machines do the same?Right after ChatGPT came out in 2022, and I first realized that AI models were trained on mountains of copyrighted material without payment or permission, I happened to be visiting an old friend at a Big Tech company. I brought the issue up, and this person replied with this argument: Humans learn by consuming copyrighted content on the web, in books, and from other sources. They internalize this information, process it, and often produce new ideas and content that is based on the original stuff they've read in the past. Why can't machines do the same?This was delivered with such speed and calmness. There was no pause to reflect or think. It was as if this Big Tech company had been preparing for this moment for years — the moment when everyone realizes their work is being used for AI models and chatbots that ultimately compete against them.The Google research paper that launched the generative AI boom has overtones of this, too. Attention Is All You Need introduced the "Transformer" to the world. This is a special type of AI model that ingests mountains of content and data to train powerful generative models.Why did the Googlers who wrote this paper come up with the name "Transformer"? I don't know, but the word tackles the fair use question head-on. One of the tests for whether you are violating copyright law is whether you "transformed" the original work enough to avoid infringing. Google came up with this Transformer name in 2017, a full five years before ChatGPT brought this new technology — and this copyright question — to the world.Tech blogger Ben Thompson has a cool-headed and well-informed view of all this. He strongly supports the decision of the judge in the Anthropic case, agreeing that training AI with books for free qualifies as fair use, calling it "critically important." AI learning, like human learning, is transformative and does not infringe on copyright when outputs don't replicate the original material. With copyright law, there's always a trade-off meant to incentivize creation without stifling innovation, he explained, and fair use exists to balance those interests.A warning from the graveSo, what will flow from the fact that basically any copyrighted content online is now fair game for AI companies to use for free?Here's one prediction. This comes from the grave, but also from deep within OpenAI, the company behind ChatGPT.Suchir Balaji was part of an OpenAI team that collected data from the internet for AI model training. He joined the startup with high hopes for how AI could help society, but became disillusioned. In November, Balaji was found dead in his San Francisco apartment. The city's chief medical examiner determined the death to be suicide.Before he died, Balaji wrote an essay on his personal website criticizing AI companies for using public data without compensation and questioned their claims of "fair use." He argued that this trend threatens the sustainability of the internet by draining value from original content sources.Balaji cited a study that found traffic to coding Q&A site Stack Overflow traffic dropped by about 12% after ChatGPT's release. Developers who once visited the site to ask or answer questions are now turning to AI, reducing new sign-ups and community engagement.This is undermining the web's "Grand Bargain." Google and other tech giants used to crawl websites and collect the data without paying. But in return, they sent traffic and visitors to the creators of these sites so that they could make money via advertising, subscriptions, product sales, and other methods. Nowadays, Big Tech's AI bots crawl for free and send much less traffic to the creators of the original copyrighted content.Cloudflare, which runs one of the biggest networks on the web, rolled out a potential solution on Tuesday. The company launched a "pay per crawl" service that helps content creators require payment from AI companies for accessing and using their content.Cloudflare will block AI crawlers by default for new customers, making content access opt-in rather than opt-out. Major publishers, including Ziff Davis, The Atlantic, and Time, have signed on. The hope is that this will force big tech companies to pay to scrape new digital content for AI development. A startup called Tollbit is trying a similar thing.I don't know if these efforts will succeed. The core point is that humans should be allowed to learn from copyrighted information for free, and machines probably should too. Reversing this might create even more problems. As a journalist, would I be able to read a Ben Thompson newsletter and incorporate one of his ideas into a future article? Maybe not. Would Thompson be banned from reading scoops by Business Insider and analyzing that new information for one of his amazing newsletters? Is that a good idea? Probably not.Some predictionsOne result of all this could be that, in the future, truly valuable information may no longer be put on the web. Here are three examples that suggest this might already be happening:Ben Thompson distributes his content via paid newsletters, rather than relying on free web distribution.Bloomberg runs probably the Western world's largest newsroom. Why? One reason is that its news arm is buried inside a massively profitable financial data business. Another is that Bloomberg's most valuable news content is published on the Terminal, a trading tool used by wealthy investors. This system has its own network — it doesn't rely much on the web. Bloomberg only publishes its best news content on the web after a long delay — and a lot of its content just stays on the Terminal and never goes on the web at all. Surprise: It has a well-financed newsroom.Finally, valuable content may be shared in the future only via personal meetings and relationships, or even through paper publications again. Anything where the data is out of the immediate reach of Big Tech AI crawling bots.A few weeks ago, Microsoft started a new publication called Signal that explores the future of AI, society, and business.It's published on paper only.Read the original article on Business Insider

Comments

Similar News

Business News