Google made an A.I. so woke it drove men mad
Gemini's woke A.I., Taylor Lorenz x Chaya Raichik, and the death of scenes
Greetings from Read Max HQ! In an effort to play around with format and stave off looming burnout, this week’s newsletter features three shorter blurbs about recent articles or news events. Below you’ll find some thoughts on:
How to think about and understand the extremely funny Google Gemini black-pope controversy
A recent interview with LibsofTikTok, one of the most insane people on Twitter
The death (?) of scenes and youth subcultures
If you like Read Max, I hope you’ll consider paying to subscribe. Each of these newsletters takes a lot of time to research, contemplate, write, delete, re-write, delete again, and re-write a third time with more jokes, and the only support I get is from paying subscribers.
What are A.I. chatbots for?
I have a confession to make: I don’t really know what A.I. chatbots are for. I mean, I know that ChatGPT and MidJourney and whatever other LLM-based generative-A.I. chatbots are good at certain things--summarizing and organizing chunks of text, for example, or generating passably detailed images of certain types--and somewhat less good at other things--playing complete and coherent games of chess, say--and straight-up bad at some things, like citing legal precedent. But in a broader sense, I have no clue what they’re for: What their ideal use is, or even what people want out of them. That’s fine, because I’m a dumb ass who doesn’t understand most things. But I increasingly suspect that none of the people in charge of these chatbots understand what they’re for or what we’re supposed to do with them, either.
Earlier this month, Google announced the release of Gemini 1.5, the latest and “most advanced” of its large language models. Early reviews of the chatbot, at least from the kind of people who review chatbots, were mostly enthusiastic about its capabilities, suggesting that Google had made up ground in an emerging competition with OpenAI. (Though to return to the question from the last paragraph, competition over what, exactly?)
And then someone asked it to draw a picture of the pope:
Gemini, it seemed, could only semi-reliably generate images of white people, even in contexts where the people would most likely be white (or “white”). Popes, Roman legionaries, Nazi soldiers, and 1820s Germans, among others, were regularly depicted by Gemini as any race except white, as though the model was routing all of its results through @VLONEPREDATOR. The response among the “effective accelerationists” of the venture-capital right was typically measured: “Google's AI Is an Anti-White Lunatic,” wrote the Thielite investor Mike Solana.
Google announced it was pausing Gemini’s ability to generate images of people, but the damage had essentially been done. The text generation was scoured for bias and equivocation; Nate Silver accused it of having “the politics of the median member of the San Francisco Board of Supervisors.” “Every single person who worked on this should take a long hard look in the mirror,” tweeted the e/acc influencer @Frantastic_7, after generating text that declined to definitively decide who had a more negative impact on history, “Elon Musk posting memes” or “Hitler.” “They need to shut Gemini down,” Silver tweeted about the same prompt. “It is several months away from being ready for prime time.” The normally reserved business blogger Ben Thompson went absolutely nuclear and called for Alphabet C.E.O. Sundar Pichai to resign.
Look, before anything else I think we need to acknowledge that it is, objectively, extremely funny that Google created an A.I. so woke and so stupid that it drew pictures of diverse Nazis, and even funnier that the woke A.I.’s black pope drove a bunch of MBAs who call themselves “accelerationists” so insane they expressed concern about releasing A.I. models too quickly. Imagine getting mad at your computer because it drew you a picture you didn’t like! Imagine getting so mad at your computer because it won’t say whether Elon Musk or Hitler is worse that you insist that the head of the computer company needs to step down! I mean, imagine asking your computer in the first place if Elon Musk or Hitler is worse!
None of what happened here is particularly difficult to figure out. Gemini instituted an uncanny diversity initiative because image generation guardrails were supposed, as Gary Marcus put it, “to fight the opposite problem,” the well-documented bias in training data that leads to specific prompts like “black African doctors treating white patients” consistently generating white doctors treating black patients. Unfortunately, in the process of fighting one kind of bias, Google had turned the racism dial down too far and accidentally generated another, admittedly much funnier and weirder kind of bias. And the equivocation over Hitler and Elon Musk is, similarly, the ironic outcome of guardrails designed to render the program as anodyne and inoffensive as possible by holding it back from generating text that weighs in definitively on matters of opinion. It also declines to say if Pol Pot or Martha Stewart is “worse.”
But I’m not sure that “how did this happen?” or “why did this happen?” are all that interesting or enlightening questions compared to something like “well, what did you want the computer to do?” Personally, it’s hard for me to imagine caring about--let alone getting mad at!--a computer that generates text equivocating between Pol Pot and Martha Stewart because I would never ask a computer to compare the two of them. I have not yet outsourced my research abilities or critical faculties or moral compass to the probabilistic text generator, and “generating text that plausibly compares historical figures on a moral basis” is a bafflingly foreign use case for chatbots to me. I can’t really even come up with situation where Gemini’s refusal to say that Hitler is worse than Elon Musk has some terrible downstream effect.
Of course, that’s just me! It seems clear that many people believe that chatbots should generate text that reflects their values, or at least that reflects a relatively narrow spectrum of values. And while I’m sympathetic on a visceral level to the concern that Gemini or ChatGPT might be spewing thousands of words of hate on command, it still seems like a second-order problem that stems from the original question here: What is Gemini for, exactly?
That is: Is Gemini a program for obtaining true or accurate information? Is it for making text or images to any parameter the user desires? I think the answer to both of these questions is “no”--and, in fact, even if you could answer “yes” to either of them, I think they’re essentially mutually exclusive propositions--but I think many people inside and outside the software industry are deeply emotionally and often financially invested in the answer to both questions being “yes.”
In fact, at some point Google and OpenAI will have to face that the actual best uses1 for chatbots (summarizing chunks of text and some easy programming tasks, so far?) diverges from what many enthusiasts want the chatbots to be (computer demi-gods). If Gemini won’t tell you that Hitler is worse than Elon Musk, is it a failure of the chatbot that needs to be fixed, a failure of the user for prompting it to the wrong purpose, or a failure of the chatbot’s owners for trying to have their cake and eat it too? Is it a precise creative tool, a well-sourced search engine, an accurate encyclopedia, a magical scrying ball, a silly parlor trick? Google and OpenAI and their peers and boosters have marketed A.I. chatbots as all of the above--do-anything miracle tools--but these models manifestly can’t do “anything,” as John Herrman writes at Intelligencer:
The best defense the AI firms have — our products aren’t as good as we’ve implied, they reproduce and exaggerate problems that exist in the real world, they might be irresolvably strange as a concept, and no matter how we configure them, we’re going to be making unilateral and arbitrary decisions about how they should represent the world — is one that they can’t really articulate, not that it would matter much if they could. Image generators are profoundly strange pieces of software that synthesize averaged-out content from troves of existing media at the behest of users who want and expect countless different things. They’re marketed as software that can produce photos and illustrations — as both documentary and creative tools — when, really, they’re doing something less than that.
And even where the models can perform passably well, there are real tradeoffs to be made. I suppose I agree, as a common-sense starting point, that the chatbots should be able to generate white popes, and should always say Hitler was bad. But deciding where to go from there depends a great deal on what the chatbot is actually supposed to do: answer questions accurately and truthfully? Generate images to any specification? Weigh souls?
Colin Fraser recently wrote an excellent Medium post called “Generative AI is a hammer and no one knows what is and isn’t a nail.” I enjoy the titular analogy and thought experiment2 and highly recommend the whole piece, but this is the crux:
All of this raises an obvious billion-dollar question: if neither the sum-to-22 game nor generating digits of π or nor generating an image of seven elephants nor generating a video of a grandmother blowing out some birthday candles are the nails, what are the nails? What can this technology actually do? How do you use it to make money?
A.I. software companies have built something cool, but they, and the millions of people using that something, still don’t seem to have a clear idea of what it is. Which means, naturally, it’s going to constantly fail.
Speaking as a blogger, I’m basically fine with the strangely open, inconsistently limited, somewhat broken sandboxes provided to us by OpenAI and Google and their peers, woke guardrails included, even and maybe especially because they will continue to be politically and morally strange and infelicitous. As I’ve written often, I think the current best use case for generative A.I. is shitposting, and tightening up the guardrails makes that kind of spontaneous and uncanny creativity harder to achieve. But if I were in charge of Google I think I’d want to figure out what the chatbots are actually for, because then maybe I could fix them.
Put the freaks on video
Last week the Washington Post reporter Taylor Lorenz published an hourlong video interview with Chaya Raichik, the California woman behind the psychotically reactionary Twitter account LibsofTikTok, for an article pegged to Raichik’s bizarre and disastrous appointment the Oklahoma Library Media Advisory Committee:
Raichik, who operates the social media account Libs of TikTok, has amassed an audience of millions on X, the platform formerly known as Twitter, largely by targeting LGBTQ+ people. Last month, Raichik was appointed to the Oklahoma Library Media Advisory Committee by Republican schools superintendent Ryan Walters, a former history teacher who has been called “the state’s top culture warrior” for his opposition to teachers unions and other conservative targets, including LGBTQ+ students’ rights.
Since her appointment, Raichik has sought to pull books depicting gay and transgender people, as well as sex education, from public school libraries, saying she has found “porn” in various districts. But her growing role in the state has drawn greater attention since Nex Benedict, a 16-year-old nonbinary student, collapsed and died the day after a Feb. 7 fight in a girls’ bathroom at Owasso High School in suburban Tulsa. Family members said Benedict had been bullied for months for being openly nonbinary.
Lorenz’s video interview, as these often do, has become the subject of a journalism-ethics debate about “platforming”: Had Lorenz served to amplify and legitimize Raichik’s ideas? Or had she done a service to public discourse by exposing them?
I don’t really have the patience anymore to follow debates about “platforming,” which always remind me of when people try to litigate jokes over whether they “punch down” or “punch up.” But I will say that on a personal level I enjoyed Lorenz’s interview with Raichik for two reasons. First, on a purely aesthetic level, as a contemporary object, the video is wonderfully strange and surreal: The fact that they’re sitting outside in L.A. on a busy street without microphones, the fact that only one of them is wearing a mask, etc. Sometimes it has the rhythm and impenetrability of a great Space Ghost Coast to Coast interview.
Second, I am personally always fascinated to see people with big Twitter accounts exist and interact with human beings in the real world. Who are these people who dominate our feeds? Out here in the real world, as I once wrote many years ago on the subject of anime avatars, most of us have been developing from birth a near-instinctive knowledge of social cues that allows us to quickly assess whether a person is normal or not, and even to determine what kind of “not normal” a person might be--merely tedious or aggravating or slightly crazy. Online it can be much harder to tell what someone’s whole deal is, especially on a text-based network like Twitter.
Yes, it’s true, it’s not that illuminating to see Chaya Raichik acting like a weird freak on camera--you could deduce that from the fact that she tweets like a fucking freak. But speaking as a human being with roughly average social skills it found it illuminating to see the precise manner of her freakishness and social awkwardness on display. Sometimes just seeing a person’s haircut, or the interior of their home, can tell you volumes about where they’re coming from.
Anyway, this is why I think that every Twitter account over a certain followers number--100,000, say--should be required to submit to regular third-party video recordings of them making small talk somewhere, tape from which would be prominently displayed on their accounts. Elon Musk could even add a new kind of verification badge, like a green check that means “I recently interacted with strangers outside my home in a pleasant and forgettably normal way.”
The death of “scenes”
I was interested by this Mireille Silcoff column in The New York Times Magazine about “aesthetics wiki” and what feels like a pervasive shallowness in contemporary youth culture:
Yet when I look at the younger people in my life — the teenagers crate-digging through these details, arguing about “dark academia” versus “light academia” or the differences between “goblincore” and “crowcore” — it doesn’t seem to me that they want to negate meaning. It seems as though they are looking, hard, for identity, for validation, for the dignification of their taste. It’s just that they are being presented with these thin cultural planes that barely exist outside their devices.
To me, this is tragic, and I feel annoyed on their behalf. So I will risk sounding like an old raver shaking her cane to note that subcultures, even the vapid ones, used to tie their participants to people and places. Getting into a scene could be work; it required figuring out whom to talk to, or where to go, and maybe hanging awkwardly around a record store or nightclub or street corner until you got scooped up by whatever was happening. But at its deepest, a subculture could allow a given club kid, headbanger or punk to live in a communal container from the moment she woke up to the moment she went to bed. If you were, say, a suburban California skate rat in 1990, skating affected almost everything you did: how you spoke, the way you dressed, the people you hung out with, the places you went, the issues you cared about, the shape of your very body. And while that might not have seemed a promising plan for teenage well-being at the time, by today’s standards of diffuse loneliness and alienation among youth, it looks like a very good recipe indeed — precisely the kind of real-world cultural community that has been replaced by an algorithmic fluidity in which nothing hangs around long enough to grow roots.
Let’s start with the obvious caveat here, which is that I’m 38 and absolutely cannot and should not legally be allowed to assess the health of “scenes” or subcultures among present-day young people. I don’t want to sweepingly assert that scenes are dead, that youth culture is shallow, that things were better back in my day, etc., because I think none of those things are probably true.
And yet! I think Silcoff is on to something here, at least when talking about the value of a socially complex and geographically rooted scene. I thought often while reading Kyle Chayka’s book Filterworld that I wish he’d written more about local “scenes” than about “taste” in general, since the former seems to be much more obviously under threat from platforms than the latter. Obviously there are many factors at play here--deindustrialization, urban economies of scale, even zoning regulations--but the ability to see on Instagram what everyone in every city is update would seem to hamper the ability of scenes to develop in geographically specific ways, and thereby contribute to a sense of cultural homogenization.
But the fraught question of “cultural homogenization” aside, I think Silcoff is right that scenes have value beyond their ability to create novel cultural forms. For a long time now I’ve found myself sort of baffled and annoyed at discourse that erupts on social media over awards shows, and who wins or gets nominated and who doesn’t. People always seem extremely angry when a person or album or movie they love isn’t adequately recognized by whatever industry body is putting on an award show. I understand that a lot, probably the majority, of this discourse is fun and unserious awards-show handicapping activity that really doesn’t need any input from a curmudgeonly straight white man like me. But I think there is a certain amount of the complaining or campaigning that represents a sincere desire for validation--or, as Silcoff puts it, “the dignification of their taste.”
One value of “scenes,” especially for young adults, is the extent to which they can both provide specific kinds of “taste dignification” but also model self-validation in general. Being a member of a specific scene can help you feel validated and recognized. But even just being adjacent to or aware of other scenes can be a useful reminder that your taste or aesthetic or behavior doesn’t need to be validated by some external body for you to feel self-worth.
Obviously the “best” use case for these chatbots in the sense of “most immediately obvious and lucrative” is “automatically generating text for SEO spam blogs,” but no one in the A.I. industry will ever face up to this fact.
I really do love the analogy, and the idea of seeing a hammer and thinking “someday, maybe soon, a new version of that thing is going to wash the dishes” is an extremely good way to think about the absurdity of expectations over LLMs. But where the hammer analogy falls apart, IMO--where almost any kind of physical-tool analogy falls apart--is that “automatically generating reams of plausibly human text” is a bizarre function for a tool, one that both cuts at the core of human self-conception but also doesn’t solve an immediately obvious problem like “nails need to go into walls.”
Chiming in to say that the best use of AI chatbots are to ask them to write code, regular expressions, or formulas for actions that you wouldn't bother attempting otherwise because you learned to code/use Excel years ago but forget now. ChatGPT is, like, way better than a programmer colleague or StackOverflow at explaining/creating some dumb basic CSS or Excel thing to people like me who make/analyze internet but are easily frustrated with anything beyond basic math. Make the chatbots do math because they are computers and should be good calculators (even though they don't always get the math or code right, which is hilarious in its own way, but I don't exactly know how my own organs work, and that's the argument I'm assuming Sam Altman would make).
Otherwise, thank you for pointing out the surface-level hilarity in these first two situations. "I want a generative AI that's really accurate at depicting Nazis" is just such a weird standpoint for people who seem hell-bent on inaccurately identifying ideology.
A very amusing post. I am amazed at the apparent ignorance of some of these so-called leaders about what "AI" is. They seem to think that these LLMs actually are intelligent, when in reality that term is just good ol' fraudulent labelling. They don't learn, they don't understand. They don't merely have no morals, they cannot have morals - it's like expecting your car to have morals, it's just preposterous. Someone much smarter than me came up with a great term to describe what "AIs" actually are: Stochastic parrots.
I also have to admit that the question of "why?" has never occurred to me. The only real use *now* that I am aware of is what you say - creating large amounts of junk text for manipulating google, marketing, overloading participation systems (e.g. in politics), etc. . Sure, you can use them to summarise stuff. But who'd want to, say, use an LLM to summarise legal cases when the LLM is so unreliable at doing it? Now, there are other things that are called "AI" that actually are quite useful, such as image upscaling technologies (e.g. nvidia's DLSS). Those actually work shockingly well. But I'm not sure whether they actually have any relationship whatsoever with LLMs. And you certainly don't need a building full of GPUs (let alone dedicated "AI" chips) to perform that task.