Greetings from Read Max HQ! In this week’s newsletter:
A new piece for New York magazine on the subject of A.I. slop.
Plus, a short (and somewhat related) discussion of writerly rights around licensing for A.I. training.
A reminder: Read Max is funded almost entirely by the generosity of paying subscribers. One reason this New York piece is appearing in mid-September instead of July, when it was originally due, is that I treat this newsletter as my first priority and full-time job. I can do this because of the support of people who find value in what I write--both the weekly columns and the weekend recommendations. If you get about one beer’s worth of entertainment or education (or time-killing) from this newsletter a month, please consider paying that value to subscribe at $5/month and $50/year.
SLOP!
I have a feature in this week’s New York magazine on the subject of A.I. slop:
If it were all just a slightly more efficient form of spam, distracting and deceiving Facebook-addled grandparents, that would be one thing. But the slop tide threatens some of the key functions of the web, clogging search results with garbage, overwhelming small institutions like Clarkesworld, and generally polluting the already fragile information ecosystem of the internet. Last week, Robyn Speer, the creator of WordFreq, a database that tracks word frequency online, announced that she would no longer be updating it due to the torrent of slop. “I don’t think anyone has reliable information about post-2021 language usage by humans,” Speer wrote. There is a fear that as slop takes over, the large language models, or LLMs, that train on internet text will “collapse” into ineffectiveness--garbage in, garbage out. But even this horror story is a kind of wishful thinking: Recent research suggests that as long as an LMM’s training corpus contains at least 10 percent nonsynthetic--that is, human--output, it can continue producing slop forever.
Worse than the havoc it wreaks on the internet, slop easily escapes the confines of the computer and enters off-screen systems in exasperating, troubling, and dangerous ways. In June, researchers published a study that concluded that one-tenth of the academic papers they examined “were processed with LLMs,” calling into question not just those individual papers but whole networks of citation and reference on which scientific knowledge relies. Derek Sullivan, a cataloguer at a public-library system in Pennsylvania, told me that AI-generated books had begun to cross his desk regularly. Though he first noticed the problem thanks to a recipe book by a nonexistent author that featured “a meal plan that told you to eat straight marinara sauce for lunch,” the slop books he sees often cover highly consequential subjects like living with fibromyalgia or raising children with ADHD. In the worst version of the slop future, your over- whelmed and underfunded local library is half-filled with these unchecked, unreviewed, unedited AI-generated slop artifacts, dispensing hallucinated facts and inhuman advice and distinguishable from their human-authored competition only through ceaseless effort.
In some ways this piece is a sequel to a column I wrote almost six years ago, when I was still on staff at New York, called “How Much of the Internet Is Fake? Turns Out, a Lot of It, Actually,” which has become a cornerstone of “Dead Internet Theory,” the popular urban legend holding that the internet is “empty and devoid of people,” and consists mainly of non-human traffic--bots, posting for an audience of bots, who amplify each others’ bot posts across the web.
Dead Internet Theory is a wonderfully creepy way of articulating the often destabilizing and paranoid experience of being online on the 2020s, and I love it the way I love conspiracy theories and creepypasta and other vernacular accounts of digital modernity. But I think the enjoyable creepiness of the idea can overwhelm the reality of the situation, and the slop piece is in part about the extremely lively and extremely human ecosystem that creates the sensation of a “dead internet”:
the idea that AI has quietly crowded out humans is not exactly right. Slop requires human intervention or it wouldn’t exist. Beneath the strange and alienating flood of machine-generated content slop, behind the nonhuman fable of “dead-internet theory,” is something resolutely, distinctly human: a thriving, global gray-market economy of scammers, spammers, and entrepreneurs, searching out and selling get-rich-quick schemes and arbitrage opportunities, super-charged by generative AI.
For the piece I spoke with and observed sloptrepreneurs in Kenya, Vietnam, Cambodia, France, the U.K., and the U.S., and tried to demystify some of the more mysterious slop examples: I was tickled to learn from a Facebook slopper, for example, that all of the mysterious “insane Facebook A.I. slop” that has flooded the platform over the last month, with no clear purpose or point or context, is directly subsidized by Facebook itself, and created by relatively normal guys pushing weirdness as an engagement strategy:
These pages make money through Facebook’s Performance bonus program, which, per the social network’s description, “gives creators the opportunity to earn money” based on “the amount of reach, reactions, shares and comments” on your posts. It is, in effect, a slop subsidy. The AI images produced on Stevo’s pages--rococo pictures of Jesus; muscular police officers standing on the beach holding large Bibles; grotesquely armored gargantuan helicopters--are neither scams nor enticements nor even, as far as Facebook is concerned, junk. They are precisely what the company wants: highly engaging content.
On a website like Facebook, the more strikingly weird an image is, the more likely it is to attract attention and engage- ment; the more attention and engagement, the more Facebook’s sorting mechanisms will promote and recirculate the image. Another AI content creator, a French financial auditor named Charles who makes bizarre pictorial stories about cats for TikTok, told me he always makes his content “a bit WTF” as “a way to make the content more viral, or at least to maximize the chances of it becoming viral.” Or as Stevo put it, “You add some exaggeration to make it engagementing.”
While I was doing reporting on this, Jason Koebler at 404 Media ran an excellent and detailed report on some of the YouTube Facebook-slop tutorial videos, which fleshes out this picture even more. (404 Media’s reporting on slopworld is the gold standard and a lot of this piece rests on work they’ve done; I highly recommend paying for a subscription.)
Anyway, there’s very little in the way of alien intelligence or government interference or bots gone amok here. Nor, for that matter, is there much in the way of stunning leaps in technological achievement. It’s just guys, all the way down. And while this is all relatively obvious if you stop to think about it, I think it’s useful to be reminded that “Dead Internet” is not an inevitable outcome of the various technologies in question--generative A.I. among them--but a condition brought about by the particular arrangement of money and business models.
In fact (and this is something I’ve argued a few on times on this newsletter!), for all the hype around generative A.I. as a new generation of tech and definitive break from the previous era of software, I think it’s better understood as a natural partner to platforms--an infinite content-supply machine built to satisfy the infinite content demand of smartphone apps:
When you look through the reams of slop across the internet, AI seems less like a terrifying apocalyptic machine-god, ready to drag us into a new era of tech, and more like the apotheosis of the smartphone age--the perfect internet marketer’s tool, precision-built to serve the disposable, lowest-common-denominator demands of the infinite scroll.
You can read the full thing here. New York magazine subscriptions are an astonishing value.
An interesting meta-textual irony of this story is that the piece itself is now, somewhat unwillingly, enfolded into the slop economy, as one small portion of the corpus on which OpenAI language models are trained. Back in May, Vox Media, New York magazine’s parent company, announced a “content and product partnership” with OpenAI, under which Vox would license its content (including past and future articles in New York) to the A.I. company for training purposes. What this actually entails is hard to say with precision: The specific terms of the deal are confidential. (New York’s union, NewsGuild of New York, of which I am a former member, has called for more transparency.) But presumably OpenAI, which is currently embroiled in a copyright lawsuit with The New York Times over the A.I. company’s use of Times articles as training data, is looking (1) high-quality text that it (2) has clear legal license to use for training purposes. And Vox, and the many other publishing companies that signed similar agreements with OpenAI, is (presumably, I’m just speculating here) looking to establish a market for high-quality text, and has made the calculation that it’s best to negotiate directly (as opposed to, say, file a lengthy and costly lawsuit), especially since all the articles are likely to get scraped and dumped into a bunch of A.I. companies’ training corpus regardless.
I don’t blame Vox for making that judgment call. But I was frustrated to learn that the company couldn’t exempt my piece from the agreement, and further frustrated that, because of the confidentiality of the agreement, the company couldn’t explain in detail why exclusion was impossible. So the piece (as well as all other pieces I’ve written for New York) now likely exists as a statistical abstraction somewhere inside an OpenAI model, subtly influencing the order of words in Kindle books with names like Wrath of the Northmen: A Gripping Viking Tale of Revenge and Honor.
To be clear, and to their credit, the people I spoke with at Vox about the agreement were uniformly sympathetic and helpful in answering my questions to the extent they were able. And I don’t have a categorical objection to my writing being used to train A.I. models (even those used to create slop), nor, in the abstract, to Vox (or any other publication) licensing my work to third parties for certain uses.
But I’m wary of my work being licensed to OpenAI (a company I cover frequently as a journalist!) under terms that I’m not privy to, for purposes I’m not entirely clear on, for no extra compensation beyond my standard fee. I don’t suffer under the illusion that my 5,000 words’ worth of “high-quality text” is worth much to OpenAI on its own,1 but “A.I. licensing” is uncharted territory--not just legally but financially and conceptually--and I think writers should be aggressive about protecting their individual and collective interests. At a minimum, I think the terms of A.I.-licensing deals between publications and software companies should be clear and transparent to writers; beyond that, I think it should be standard that writers can elect to exempt themselves and their work from umbrella agreements with A.I. companies.
Again, I understand that Vox and other publishing companies that have come to agreements with OpenAI are caught between a rock and a hard place, and I don’t think they’re pulling a fast one or doing something particularly nefarious. But I’d suggest all writers producing work for companies that have deals in place with OpenAI ask questions about the nature and scope of those deals. What is my writing being used for? What’s the scope of the licensing agreement? What’s my cut? Can I opt out?
Back-of-the-envelope math, based on reporting that OpenAI was offering $1 million to $5 million to license archives, suggests the price per high-quality word is set at somewhere between one-tenth of a cent and one cent, which puts the total value of my article to OpenAI at somewhere between $5 and $50.
The apparent fact that Facebook is actually subsidizing the creation of AI slop to feed an audience that is partially and maybe substantially made up of bots has driven me steadily more insane since I first heard it
Great article. I’m sure Zuck’s primary focus is on the Facebook cash flow now that it’s a slop factory, but at the same time it is a fascinating experimental playground. Perhaps he gets immense pleasure from watching his platform make the world a worse place. Again and again.