I cannot believe the shit that morons are getting up to with ChatGPT
A.I. and the production of fact-like vibe objects
Reminder: I’m still looking for stories about bad A.I. experiences at work--meetings, programs, lay-offs, implementations!
Last week, Steven A. Schwartz of the firm Levidow, Levidow & Oberman--a man who has been practicing law since the 1990s--filed an affidavit admitting that he had “consulted the artificial intelligence website Chat GPT in order to supplement the legal research,” and consequently cited six wholly invented cases in a legal brief filed in a case he was working on.
The cases, including Varghese v. China Southern Airlines Co Ltd, 925 F.3d 1339 (11th Cir. 2019) and Shaboon v. Egyptair 2013 IL App (1st) 111279-U (I. App. Ct. 2013) certainly sound real, and in his own defense Schwartz points out that he did ask ChatGPT if they were real, even providing screenshots of his convincing interaction with the chatbot:
But Varghese and Shaboon, again, are not real cases, and the copies of the opinions that were filed after the brief compiled by Schwartz was questioned, is extremely not real, down to the arrow left over from the ChatGPT screenshot:
This should be unsurprising to anyone with a vague understanding of how ChatGPT works, of course. Large language models produce “answers” to questions by attempting to “complete” the text of a prompt with text that is statistically likely to follow. If you ask it for relevant judicial opinions, it’s not putting you on hold and looking cases up in a secret computer law library. Rather, it’s calculating what kinds of words, and in what kind of order, would be most likely to follow a request for relevant judicial opinions. It may produce real cases that are relevant, but but it may also produce fake cases, the titles of which are made up of words placed in an order that resembles relevance.
These frequent “hallucinations”--blithely confident but totally unsupported claims--are an obvious and well-known product of large language models, and the models’ tendency to hallucinate is one of the funniest and most charming things about them. And yet! That the models are as prone to bullshit and dissimulation has not stopped people from using them as tools for factual research, and embarrassing themselves quite badly on the internet.
I first became aware that actual human beings were using and trusting ChatGPT as a search engine when I watched a V.C. in the midst of an argument on Twitter pull up a chart showing that San Francisco was one of the most dangerous cities in American… based on statistics, he admitted, he’d pulled entirely from ChatGPT.1 (He later deleted the tweets.)
It is, I guess, natural that eager thick-skulled investors would immediately seize on The Friendly Bullshit Machine as a font of truth. But they are not the only ones. Edmunds journalist Chris Paukert recently tweeted about a marketer asking for permission to use a quote that didn’t exist; the marketer had “found” it on ChatGPT. And an article in The Guardian documented more of ChatGPT’s determination to make us feel like we are living in a mediocre postmodern novel in which fiction, memory, and reality blur together into an annoying haze:
Last month one of our journalists received an interesting email. A researcher had come across mention of a Guardian article, written by the journalist on a specific subject from a few years before. But the piece was proving elusive on our website and in search. Had the headline perhaps been changed since it was launched? Had it been removed intentionally from the website because of a problem we’d identified? Or had we been forced to take it down by the subject of the piece through legal means?
The reporter couldn’t remember writing the specific piece, but the headline certainly sounded like something they would have written. It was a subject they were identified with and had a record of covering. Worried that there may have been some mistake at our end, they asked colleagues to go back through our systems to track it down. Despite the detailed records we keep of all our content, and especially around deletions or legal issues, they could find no trace of its existence.
Why? Because it had never been written.
Luckily the researcher had told us that they had carried out their research using ChatGPT. In response to being asked about articles on this subject, the AI had simply made some up. Its fluency, and the vast training data it is built on, meant that the existence of the invented piece even seemed believable to the person who absolutely hadn’t written it.
I find this behavior--taking for granted the truthfulness of claims being made by The Friendly Bullshit Machine--absolutely bizarre. Don’t people double-check facts? Don’t people by their nature distrust confident claims made without reference to sources? Aren’t people cripplingly afraid of being wrong in public? The answer, for some people, is apparently not.
One way of thinking about a program like ChatGPT is that it’s much better at assessing vibes than it is at reproducing facts. If you go on Google looking for articles by Max Read (that’s me, the guy writing the post) you’ll get a bunch of websites with links to articles I’ve written. If you ask ChatGPT to list some articles by me, it will produce a list of articles2 that is, in terms of “factual correctness,” totally wrong, but is, in terms of “Max’s basic vibe”3 more or less right:
Vibes-based search is very bad for research in situations where factual accuracy is important (legal briefs, journalism), but it’s not an entirely useless function. We use search for “vibes” all the time. If I’m looking for a basic explanation of some cultural phenomena I don’t understand, for example, I’ll often include “Reddit” in my Google search terms4 and end up somewhere like Reddit’s r/OutOfTheLoop subreddit, where people ask for explanations of news stories, trends, internet jokes, etc. that they haven’t been following. I wouldn’t trust r/OutOfTheLoop to be fully factually accurate by any means, but I don’t need it to be; over, its answers are very good at contextualizing, explaining, and appraising a given subject, i.e., “assessing the vibes.”
LLMs have similar strengths and similar (but much worse) weaknesses, which is why it seems much more likely to me that ChatGPT and similar products will emerge as rivals to Reddit or Wikipedia than they will to Google. They’re already much closer to being as good as those sites for retrieving vibes than they are close to being as good as Google for retrieving facts. (For one thing, they’re likely trained on text scraped from Reddit and Wikipedia, among many other similar sources.)
The problem when it comes to competing with Google isn’t simply that LLMs like ChatGPT have problems obtaining facts and are opaque regarding their reasoning. Microsoft’s new Bing product, which attempts to meld the LLM chatbot interaction with external-link footnotes, is attempting to solve these issues. But even then, for complicated questions of fact, substituting a whole research process--a sorted list of sources that a user can read and assess on their own--with a single voice-of-God response, synthesized through some black-box model with minimal clarity about why the answer was assembled, strikes me as a much worse user experience.5
But there are people for whom the voice of God style resonates, for whom the power and impressiveness of A.I. makes it not just trustworthy but by nature correct. This sense is bolstered by aggressive media hype. Since unveiling ChatGPT, OpenAI has been presented--in the press, in business Substacks, and in various annoying threads by Twitter influencers trying to sell their Substacks--as the Google search killer. The stock market (another kind of bullshit machine) has followed the conventional wisdom in driving down the price of Google and up the price of its rival, and OpenAI’s partner, Microsoft. If you’re only half paying attention, why wouldn’t you assume that ChatGPT basically does what Google does?
For this reason I can have some sympathy with Steven A. Schwartz and even the imbecilic V.C. looking for San Francisco crime statistics on ChatGPT--they’re simply taking a bunch of prominent industry executives and journalists at their word. The writers and academics Kate Crawford and Alex Campolo describe this idea of A.I. with a useful phrase, “enchanted determinism”:
Too many policymakers fall into the trap of what University of Chicago academic Alex Campolo and I have labeled “enchanted determinism”: the belief that AI systems are both magical and superhuman—beyond what we can understand or regulate, yet deterministic enough to be relied upon to make predictions about life-changing decisions.
This effect drives a kind of techno-optimism that can directly endanger people’s lives. For example an ongoing review published in the British Medical Journal looked at 232 machine-learning algorithms for diagnosing and predicting outcomes for COVID-19 patients. It found that none of them were fit for clinical use. “I fear that they may have harmed patients,” said one of the authors of the study.
Treating ChatGPT as a Google-quality search engine for facts is a small-scale version of “enchanted determinism,” and one much less damaging than the kind practiced by some politicians and bureaucrats (not to mention A.I. capitalists). As a specific phenomena, the lawyer citing fake cases or the student asking about imaginary articles seem less like heralds of some wacky new world and more like victims of the difficult process through which new technologies are adapted by people and institutions. The hallucinations are not in and of themselves “the problem”--how people receive, process, and understand the information they obtain from A.I. is what needs to be worked out.
This is a different guy, for whatever it’s worth, than Michelle Tandler, the tech-adjacent Bay-Area freak who recently rounded out a multi-tweet thread advocating for the return of public hanging in San Francisco with the words “+ Are the citizens of San Francisco brave for living amidst so much drug-induced violence and crime? Or are they cowardly? All these and more running through my mind. I'm going to go ask Chat GPT.”
For private reasons that I will not discuss further, I also asked it to “Tell me who Max Read is and list some of his most famous articles using a comical Italian accent and Italian words and phrases, in the character of a famous Italian chef who cannot stop talking about tomatoes even when he is trying to helpfully answer a question,” for which it produced the following:
Buongiorno amico! Ah, Max Read! He's a molto talented journalist and writer who has written some very interesting articles. But you know what else is interesting? Tomatoes! They are the basis for so many delicious Italian dishes, like pasta sauce, bruschetta, and Caprese salad. But I digress, let's talk about Max Read and his famous articles.
Max has written for many important publications like New York Magazine, The New Yorker, and The Guardian. He's covered so many topics, from politics to cultura to tecnologia. But you know what's even better than a ripe, juicy tomato? Max's famous article called "The End of Twitter." It was published in New York Magazine in 2016, and it argued that Twitter had become a toxic and unmanageable platform, full of harassment and abuse. The article was widely shared and discussed, and it helped to spark a broader conversation about the state of social media.
But wait, there's more! Just like how there are many different types of tomatoes, Max has written about many different topics. He's covered the rise of Donald Trump, the controversies surrounding the #MeToo movement, and the ways that tecnologia is changing our lives. Each article is like a different type of tomato, each with its own unique flavor and texture.
Overall, Max Read is a very talented writer and journalist who has written some molto interessante articles. But you know what's even more interesting? The different varieties of tomatoes and all the delicious dishes you can make with them. Grazie mille for asking, and don't forget to include plenty of tomatoes in your next meal!
I would say, I suppose in my own defense, that the articles ChatGPT says I have written are extremely obvious, boring, and clichéd versions of topics and themes I have covered. But, again, are the vibes, in the broad sense, right? Yes.
Basically any Google search to which you might append “Wiki” or “Reddit” is a “vibe” search rather than a “facts” search--a query where context, relationships, and broad strokes are more important than the specific truth of any particular factual claim. Wikipedia and Reddit are extremely impressive compendiums of human knowledge that I don’t necessarily trust to be 100 percent factually correct, but that I can trust to be pretty good, in aggregate, when it comes to contextualizing or synthesizing information.
This is even true when it comes to searching vibes--Wikipedia and Reddit are relatively transparent and open in a way that makes it easy to move from “vibes” to “facts,” and to understand where and why the information comes from, and how to properly process it. ChatGPT, passably good with vibes though it may be, is unredeemably coy about where its information comes from.
Please post all future Read Max articles in a comical Italian accent that frequently talks about tomatoes, regardless of the topic at hand. Molto interessante!
one useful way to think about ai is as (quite talented!) bias confirmation machines. as you point out, this is obvious and pretty funny when people use gen ai for ‘research’ -- which weirdly ends up supporting exactly what they suspected. the same quality is much more insidious when it comes to automating various social inequities (sexist resume-screening, racist criminal sentencing, etc).