July 13, 2024
OpenAI is devouring the media industry

Time’s almost up! There’s only one week left to request an invite to The AI Impact Tour on June 5th. Don’t miss out on this incredible opportunity to explore various methods for auditing AI models. Find out how you can attend here.

Let’s make one thing clear up front: I’m generally pro-generative AI. At least, I am a lot more amenable to it — and use it myself daily in the form of parsing information via ChatGPT and generating images with it and Midjourney — than many of my peers in the journalism industry.

Nonetheless, I am curious and concerned about the recent trend of OpenAI, maker of ChatGPT and its underlying GPT series of large language models (LLMs), partnering with major media companies in the U.S. and abroad.

Just today, OpenAI announced partnerships with two leading media publishers for whom I previously worked — The Atlantic and Vox Media.

The former is a 167-year-old print publication among the oldest published in the United States that has managed to reinvent itself fairly successfully in the digital and online age with its various opinion columns and well reported and researched articles.

June 5th: The AI Audit in NYC

Join us next week in NYC to engage with top executive leaders, delving into strategies for auditing AI models to ensure fairness, optimal performance, and ethical compliance across diverse organizations. Secure your attendance for this exclusive invite-only event.

The latter is a new media startup that was forged from a popular sports blog, SB Nation, launched popular technology outlet The Verge in 2011 (where I used to work), its politics and general news outlet Vox in 2014, and has steadily and swiftly acquired more and more titles in recent years, including esteemed and award-winning ones such as New York Magazine.

All in all, OpenAI has forged alliances with 7 major media outlets in less than a year, some of them, like German publisher Axel Springer, holding companies for a number of well-read and influential, taste-making titles such as Politico and Business Insider and BILD. Here’s the full list, according to my research:

While exact terms of the deals haven’t been disclosed — as many of these are private companies and aren’t required to divulge all their financial dealings — OpenAI is said to be paying tens of millions, or in the case of News Corp., $250 million over 5 years, for the privilege of getting its hands on all the media these publishers produce.

I should note that VentureBeat itself, though not me personally, has had members of our staff reach out to OpenAI to discuss possible partnerships, but I have no awareness of how those talks are proceeding or what has been discussed, other than that some outreach on our part has happened in the past year.

Why is this happening?

Why is OpenAI partnering with these media companies?

The most obvious answer is that in so doing, it gains access to licensed training data that it can use to build powerful new AI models that can write as well as your average Wall Street Journal reporter.

Who wants this? Well, OpenAI for one, to improve ChatGPT’s performance and ultimately hopefully commercialize the tools back to the same media outlets or others in the space.

In the case of digital media outlets like Vox, which makes video content for YouTube and licensed documentaries and series for Netflix, OpenAI could also presumably train its generative AI video model Sora to make documentary-style content from text prompts, including possibly some on screen title cards and graphics.

Why would OpenAI pay to license content that can be (and in some cases, has already been) scraped for free?

Why would OpenAI want to pay for all this content when in the past, it has scraped the internet of public posts and trained on them for free?

The pushback among artists, creatives, and even media companies such as The New York Timeswhich is suing OpenAI for copyright infringement over its alleged ingesting of NYT online newspaper articles — has made the company’s position that publicly available data can be legally scraped for transformative commercial purposes a more tenuous and frankly, ethically challenged one.

As such, OpenAI last year introduced a new bit of code that website owners can add to their sites to stop it from scraping them and training on them.

The company says any site that adds this code to it will be exempted from scrapers, similar to editing one’s robots.txt file on their website to stop Google from scraping it and indexing it from search.

OpenAI also recently announced it would create a new product, a Media Manager, that artists and creators and presumably publishers can use to flag work that they intend to or have posted online and which they don’t want to see ingested by AI scrapers and trained on to create new models that potentially compete with their work.

That’s not coming till 2025, however, and again, it places the onus on the content creator or owner to do the hard work of opting out of the AI scraping and training.

Paying the publishers to shut up and accept the AI scraping and training is probably a worthwhile expense to OpenAI, getting them off its back, the data it needs, and assuring investors and users that it is in compliance with copyright laws and ethics. Kind of.

It doesn’t really pay back any of the owners of content that has already been scraped and used to train models, but it’s a start.

Without exception that I’m aware of, the publishers have all variously announced the OpenAI content licensing deals with acknowledgement that they get something out of it, too, something other than money (which they need to pay their journalists and staff and equipment/infrastructure like web hosting, etc.): placement.

Specifically, almost all the publishers who have thrown in with OpenAI have noted that ChatGPT will surface their articles amid its outputs.

So if a user types in “Summarize the latest tech news,” summaries of articles from Business Insider, The Verge (owned by Vox), The Wall Street Journal, or whatever other publications are included in the deals, might show up, alongside links to the sources.

“Might” is the key word here, as we don’t know — and the media outlets nor OpenAI have shared publicly yet — the exact agreement language or technical documentation showing how, when, and why a particular publication’s articles or other content will be shown by ChatGPT to a user.

In addition, we don’t have any good public data yet showing how much referral traffic, if any, ChatGPT is driving to source publications it quotes or summarizes in its responses.

Furthermore, it is unclear right now how much if at all ChatGPT will block quote (copying and pasting direct sections) from articles, rather than using its impressive (yet robotic) writing skills to summarize the underlying content, potentially obviating some of the actual meaning and artistry of the original writer, not to mention also obviating the need of the user to visit the actual site where it was first published, depriving said publications of traffic on which they use to sell ad impressions, or gain paying subscribers.

This is why journalists including The Information founder Jessica Lessin, former Gawker reporter Hamilton Nolan, and former Vice reporter Edward Onswego, Jr. have all pointed out that it sure seems like publications are getting the rawer end of the deal with OpenAI.

After all, what use does a reader have to visit the underlying media outlet, let alone subscribe to it with their money, if what they’re after is pure information, and ChatGPT serves that up to them? All the while, OpenAI captures the users’ $20 a month for ChatGPT Plus subscribers, instead of the underlying publications.

History rhymes

It is eerily reminiscent to many of us digital journalists who were around in the industry when Google News first launched (2006), and social platforms such as Facebook and Twitter started growing in users and popularity, and quickly all became major sources of referral traffic to publishers.

This has basically been the case for the better part of the last 15-20 years, though thanks to the ministrations of the tech giants behind these platforms and their constant algorithmic tweaking, traffic has ebbed and flowed and sites that went in too hard on any given platform or strategy quickly found themselves at a loss when an “algorithm change” by a tech platform suddenly caused their audiences to vanish.

Yet the changes kept coming, of course, and arguably the biggest one is now ahead of tech platforms and publishers: generative AI.

With Google putting its own erroneous AI Overview summary results at the top of search results pages and pushing down direct links to publishers and news articles, and more people adopting ChatGPT, potentially as a news source or aggregator, perhaps the news publishers and the executives in charge of them felt backed into a corner: the game is changing yet again, AI is coming and replacing some of the traditional ways people get news online, so why not partner up with the disruptors and try to ride the wave?

Except, as the short history lesson described above would show, tech companies change strategy and tools all the time, randomly, unpredictably, to the chagrin of media companies.

While OpenAI is making nice with publishers now, there’s no indication based on what we know publicly, at least, that this will continue ad infinitum, or that it will lead publishers to sustaining the revenue and subscribers they have cultivated through other distribution channels in the past.

Also, the more publishers OpenAI partners with, the more each publisher itself becomes diluted as a potential source of information in ChatGPT, and the more commoditized the entire media industry becomes — all just grist for OpenAI models and summaries.

The bull case for these partnerships is kind of a shrug to the effect of “well, tech is changing, media habits are changing, we can’t rely on Google or social sites for our audience anymore, anyway,” so this is perhaps the least bad option on the table for media publishers.

But with so many lining up to voluntarily deal with OpenAI, it’s clear where the seat of power lies. And that’s not something media companies should give away lightly. Let’s hope they’re getting their money’s worth.

Other, smaller, less well-trod paths

Meanwhile, the rise of individual, sole proprietor or worker-owned publications such as 404 Media, Platformer, Newcomer, and others — largely built atop tech infrastructure provided by the likes of newsletter platform Substack — are for now, pursuing a different path, trying to build up direct relationships with readers and subscribers, to the extent they can while leveraging the underlying tech, provided by, again, a buzzy startup.

Yet these publications are small by design, with limited staff and resources to pursue the kinds of large investigations that have won awards and, in some cases, changed the course of history, which were in the past conducted by large newspapers and broadcast outlets.

But with broadcast and cable news viewership tanking, and newspapers themselves seeing declines in readers as more and more young people turn to alternative news sources such as YouTube and TikTok, it’s not clear to me that the audience is even interested in the kinds of investigations that newspapers and broadcast outlets used to deliver.

What does an audience turning away from traditional media outlets and their investigative skills do to democracy, to the information ecosystem, to our relationships with one another, to our society?

I’m not so apocalyptically inclined to say this is going to ruin everything — in fact, I think social media has provided more avenues than ever for readers, so-called “citizen journalists” or amateur sleuths, and others to coalesce and try to dig up important information (or at least, juicy gossip), so I don’t think it means the end of uncovering injustices and problems. Far from it.

But, the flip side is, with less people visiting and engaging with traditional outlets, there’s been a decline in overall news consumption rates in the U.S.. and a rise in totally incorrect digital mob mentality that I don’t think is particularly helpful to anyone’s understanding of the world or of maintaining some semblance of a shared factual reality.

Media is a very tough business, with low margins, low barriers to entry, and many competitors — direct and indirect in the form of all the other attention seeking apps on our phones, TVs, and PCs. In the U.S. at least, we don’t have a great tradition of publicly funded media. The other alternatives have been the largesse of wealthy families and individuals.

OpenAI is cleverly exploiting this lack of direct funding for media to its own gain, and to that of its users.

That’s the one clear outcome of all this: OpenAI gets its hands on more direct sources of factual information, and since information is power, it also gets more of that, too.

Does ChatGPT become the new “homepage of the internet” for many people in the way Google was for so long? I’m slightly skeptical of that in ChatGPT’s current form, with its current interface. It’s just not the best multimedia consumption experience, but presumably that could and will change over time.

In fact, I think OpenAI, like other tech companies, might find that its users don’t really come to ChatGPT looking for news even when it is available in abundance from credible sources. Facebook tried this same thing and ended up deprioritizing news in favor of “friends and family” shared user-generated content. ChatGPT seems to me to be good as a tool to work with a user’s existing information that they bring or provide, less as one to go out and find the best information from a variety of sources. But, I could be (and have often been) wrong.

Even less clear to me is whether anyone will actually want to read a long feature article in ChatGPT, or click through to find it. But I guess we’re about to find out.

Source link