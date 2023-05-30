Reminder: I’m still looking for stories about bad A.I. experiences at work--meetings, programs, lay-offs, implementations!

Last week, Steven A. Schwartz of the firm Levidow, Levidow & Oberman--a man who has been practicing law since the 1990s--filed an affidavit admitting that he had “consulted the artificial intelligence website Chat GPT in order to supplement the legal research,” and consequently cited six wholly invented cases in a legal brief filed in a case he was working on.

The cases, including Varghese v. China Southern Airlines Co Ltd, 925 F.3d 1339 (11th Cir. 2019) and Shaboon v. Egyptair 2013 IL App (1st) 111279-U (I. App. Ct. 2013) certainly sound real, and in his own defense Schwartz points out that he did ask ChatGPT if they were real, even providing screenshots of his convincing interaction with the chatbot:

But Varghese and Shaboon, again, are not real cases, and the copies of the opinions that were filed after the brief compiled by Schwartz was questioned, is extremely not real, down to the arrow left over from the ChatGPT screenshot:

This should be unsurprising to anyone with a vague understanding of how ChatGPT works, of course. Large language models produce “answers” to questions by attempting to “complete” the text of a prompt with text that is statistically likely to follow. If you ask it for relevant judicial opinions, it’s not putting you on hold and looking cases up in a secret computer law library. Rather, it’s calculating what kinds of words, and in what kind of order, would be most likely to follow a request for relevant judicial opinions. It may produce real cases that are relevant, but but it may also produce fake cases, the titles of which are made up of words placed in an order that resembles relevance.

These frequent “hallucinations”--blithely confident but totally unsupported claims--are an obvious and well-known product of large language models, and the models’ tendency to hallucinate is one of the funniest and most charming things about them. And yet! That the models are as prone to bullshit and dissimulation has not stopped people from using them as tools for factual research, and embarrassing themselves quite badly on the internet.

I first became aware that actual human beings were using and trusting ChatGPT as a search engine when I watched a V.C. in the midst of an argument on Twitter pull up a chart showing that San Francisco was one of the most dangerous cities in American… based on statistics, he admitted, he’d pulled entirely from ChatGPT.

It is, I guess, natural that eager thick-skulled investors would immediately seize on The Friendly Bullshit Machine as a font of truth. But they are not the only ones. Edmunds journalist Chris Paukert recently tweeted about a marketer asking for permission to use a quote that didn’t exist; the marketer had “found” it on ChatGPT. And an article in The Guardian documented more of ChatGPT’s determination to make us feel like we are living in a mediocre postmodern novel in which fiction, memory, and reality blur together into an annoying haze:

Last month one of our journalists received an interesting email. A researcher had come across mention of a Guardian article, written by the journalist on a specific subject from a few years before. But the piece was proving elusive on our website and in search. Had the headline perhaps been changed since it was launched? Had it been removed intentionally from the website because of a problem we’d identified? Or had we been forced to take it down by the subject of the piece through legal means? The reporter couldn’t remember writing the specific piece, but the headline certainly sounded like something they would have written. It was a subject they were identified with and had a record of covering. Worried that there may have been some mistake at our end, they asked colleagues to go back through our systems to track it down. Despite the detailed records we keep of all our content, and especially around deletions or legal issues, they could find no trace of its existence. Why? Because it had never been written. Luckily the researcher had told us that they had carried out their research using ChatGPT. In response to being asked about articles on this subject, the AI had simply made some up. Its fluency, and the vast training data it is built on, meant that the existence of the invented piece even seemed believable to the person who absolutely hadn’t written it.

I find this behavior--taking for granted the truthfulness of claims being made by The Friendly Bullshit Machine--absolutely bizarre. Don’t people double-check facts? Don’t people by their nature distrust confident claims made without reference to sources? Aren’t people cripplingly afraid of being wrong in public? The answer, for some people, is apparently not.

One way of thinking about a program like ChatGPT is that it’s much better at assessing vibes than it is at reproducing facts. If you go on Google looking for articles by Max Read (that’s me, the guy writing the post) you’ll get a bunch of websites with links to articles I’ve written. If you ask ChatGPT to list some articles by me, it will produce a list of articles

that is, in terms of “factual correctness,” totally wrong, but is, in terms of “Max’s basic vibe”

more or less right:

Vibes-based search is very bad for research in situations where factual accuracy is important (legal briefs, journalism), but it’s not an entirely useless function. We use search for “vibes” all the time. If I’m looking for a basic explanation of some cultural phenomena I don’t understand, for example, I’ll often include “Reddit” in my Google search terms

and end up somewhere like

, where people ask for explanations of news stories, trends, internet jokes, etc. that they haven’t been following. I wouldn’t trust r/OutOfTheLoop to be fully factually accurate by any means, but I don’t need it to be; over, its answers are very good at contextualizing, explaining, and appraising a given subject, i.e., “assessing the vibes.”

LLMs have similar strengths and similar (but much worse) weaknesses, which is why it seems much more likely to me that ChatGPT and similar products will emerge as rivals to Reddit or Wikipedia than they will to Google. They’re already much closer to being as good as those sites for retrieving vibes than they are close to being as good as Google for retrieving facts. (For one thing, they’re likely trained on text scraped from Reddit and Wikipedia, among many other similar sources.)

The problem when it comes to competing with Google isn’t simply that LLMs like ChatGPT have problems obtaining facts and are opaque regarding their reasoning. Microsoft’s new Bing product, which attempts to meld the LLM chatbot interaction with external-link footnotes, is attempting to solve these issues. But even then, for complicated questions of fact, substituting a whole research process--a sorted list of sources that a user can read and assess on their own--with a single voice-of-God response, synthesized through some black-box model with minimal clarity about why the answer was assembled, strikes me as a much worse user experience.

But there are people for whom the voice of God style resonates, for whom the power and impressiveness of A.I. makes it not just trustworthy but by nature correct. This sense is bolstered by aggressive media hype. Since unveiling ChatGPT, OpenAI has been presented--in the press, in business Substacks, and in various annoying threads by Twitter influencers trying to sell their Substacks--as the Google search killer. The stock market (another kind of bullshit machine) has followed the conventional wisdom in driving down the price of Google and up the price of its rival, and OpenAI’s partner, Microsoft. If you’re only half paying attention, why wouldn’t you assume that ChatGPT basically does what Google does?

For this reason I can have some sympathy with Steven A. Schwartz and even the imbecilic V.C. looking for San Francisco crime statistics on ChatGPT--they’re simply taking a bunch of prominent industry executives and journalists at their word. The writers and academics Kate Crawford and Alex Campolo describe this idea of A.I. with a useful phrase, “enchanted determinism”:

Too many policymakers fall into the trap of what University of Chicago academic Alex Campolo and I have labeled “enchanted determinism”: the belief that AI systems are both magical and superhuman—beyond what we can understand or regulate, yet deterministic enough to be relied upon to make predictions about life-changing decisions. This effect drives a kind of techno-optimism that can directly endanger people’s lives. For example an ongoing review published in the British Medical Journal looked at 232 machine-learning algorithms for diagnosing and predicting outcomes for COVID-19 patients. It found that none of them were fit for clinical use. “I fear that they may have harmed patients,” said one of the authors of the study.

Treating ChatGPT as a Google-quality search engine for facts is a small-scale version of “enchanted determinism,” and one much less damaging than the kind practiced by some politicians and bureaucrats (not to mention A.I. capitalists). As a specific phenomena, the lawyer citing fake cases or the student asking about imaginary articles seem less like heralds of some wacky new world and more like victims of the difficult process through which new technologies are adapted by people and institutions. The hallucinations are not in and of themselves “the problem”--how people receive, process, and understand the information they obtain from A.I. is what needs to be worked out.