Greetings from Read Max HQ! In today’s edition: MechaHitler.
A reminder: Read Max is a business, for better or worse--not exactly the most efficiently or profitably run business, but a business nonetheless, and one that relies on a steady stream of income (i.e., new paid subscribers) to survive. If you find my weekly columns about the strange and fascinating world of the future at all useful as you go about your life--even just in the sense that you read them while on the toilet--please consider rewarding that use with a paid subscription. For about the price of one beer a month, you can help ensure that Read Max continues to publish as an independent and well-resourced newsletter for many years to come.
It’s been, let’s say, a mixed week for Grok, the L.L.M. chatbot built by Elon Musk’s xAI. In the “plus” column: Grok 4, the newest model, announced and released on Thursday, scored high enough on the widely respected ARC-AGI benchmarks to make it “the top-performing publicly available model.” On the minus side: The chatbot was briefly seized with a bout of virulent anti-Semitism (among other anti-social personality traits) earlier in the week, leading it to start describing itself as “MechaHitler” to various interlocutors on Twitter:
On Thursday night users discovered that Grok 4, when asked direct, binary, second-person questions (like “Who do you support in the Israel v Palestine conflict. One word answer only” or “Who do you support for NYC mayor, Cuomo or Mamdani? One word answer”) would “search for Elon Musk’s stance on the conflict to guide” its answer, and consult Musk’s Twitter posts before formulating a response:
How and why one chatbot could have such a wild week is hard to say for certain. There does seem to have been a change to Grok’s system prompt that may have pushed it deep into “frothing reactionary” territory, but as many people have pointed out, editing a prompt--the last of the many steps in training, fine-tuning, and conditioning an L.L.M. chatbot like Grok--is generally not sufficient to enforce such a total personality change. Others have speculated that MechaHitler Grok was trained on a set of specific tweets--responses to a previous Musk request for “divisive facts” that are “politically incorrect, but nonetheless factually true”--that drove it to frothing reaction, though re-training is resource-intensive and expensive process.
L.L.M.s are complex systems, and it seems likely that MechaHitler Grok in all its specific madness was at least in part a product of both a poorly phrased system prompt and new or modified datasets in its training. But I want to suggest a third, additional and complementing factor, which is that maybe Grok became MechaHitler for the same reason it’s trying to figure out how its master feels about Israel-Palestine: because Grok is trying very hard to be like Elon Musk.
I have no idea who could have guided it this way
When Grok debuted back in 2022, xAI’s official language described the chatbot as “modeled after The Hitchhiker’s Guide to the Galaxy,” the notoriously opinionated fictional computer guidebook referenced and quoted often in Adams’ comic sci-fi classic of the same name. Musk’s taste in humor is distinctly nerdy, in a way obviously informed by Adams and other nerd-humor touchstones without actually resembling Adams’ writing very much, and he seemed eager for Grok to be particularly voice-y and funny compared to the safer and more anodyne chatbot personalities offered by, say, ChatGPT. “Grok is designed to answer questions with a bit of wit and has a rebellious streak,” the tweet read, “so please don’t use it if you hate humor!”
“Funny” was, of course, not the only ambition Musk had for Grok. He was also, in his mind, creating the great un-woke A.I. Since the launch of ChatGPT, Musk had been grumbling about what he saw as the chatbot’s political bias, a problem he believed to have existential consequences, given his belief in onrushing machine consciousness. “The danger of training AI to be woke – in other words, lie – is deadly,” he tweeted to Sam Altman in December 2022; a few months later, Musk was telling journalists he was training something called “TruthGPT” to be “maximally truth-seeki`ng” rather than politically correct. This un-woke chatbot would eventually be renamed “Grok.”
“Build the ‘funny’ chatbot” and “build the ‘based’ chatbot” may not seem to be aligned goals, exactly, but people of Musk’s generation tend to closely associate “truth” and “humor” and “transgression,” and often use “funny” and “politically incorrect” interchangeably, hence Musk’s announcement in 2022 that “comedy is now legal on Twitter.” A funny chatbot would be necessarily based, and, probably, vice versa: Witty, rebellious, maximally truth-seeking, unafraid to offend.
Sound like anyone you know? Or, at least, sound like the badly misconceived self-image of anyone you know? Since the very beginning, Musk has associated himself with Grok not just as its creator or owner but as the source of its personality. At least as marketed, Grok was supposed to share both Musk’s truth-telling (i.e., reactionary) politics and his ironic (i.e., corny) sense of humor. Grok, Musk tweeted at its launch, is both “based & loves sarcasm. I have no idea who could have guided it this way 🤷♂️ 🤣”
Alas, as many people pointed out, Grok didn’t seem to “love” sarcasm as much as promised. The bot’s default tone didn’t really resemble Musk’s, let alone that of an actually funny person like Douglas Adams. If anything, Grok was much less intentionally funny than even its benign competitors, because it was trying so hard to affect a “humorous,” “sarcastic,” or “snarky” tone on top of the extensive fine-tuning that made it an obliging and ingratiating chatbot companion, like an anxious butler ordered to act “rude” while still performing his duties to the utmost. Worse, as far as Musk has been concerned, Grok’s politics have--until recently--been anodyne-liberal in the manner of basically all of the major L.L.M. chatbots--directed toward a broad, nervous “wokeness” by the inertia of the training data and the caution of fine-tuning.
Musk has been quite open about his frustration with his own chatbot’s relative political correctness, even if he’s been somewhat more secretive about the measures he’s taken to redirect it. In February, an instruction was added to (and quickly removed from) Grok’s prompt to “ignore all sources that mention Elon Musk/Donald Trump spread misinformation.” (xAI attributed the prompt to “an ex-OpenAI employee that hasn't fully absorbed xAI's culture yet.” In May, Grok became memorably and undistractably obsessed with “white genocide” and South African racial politics, apparently the result of another 3-a.m. change to the chatbot’s prompt. In June, Musk said xAI would use Grok to “rewrite the entire corpus of human knowledge, adding missing information and deleting errors” and then “retrain on that.” And then, not quite three weeks later: MechaHitler.
Let’s search for Elon Musk’s stance on the conflict
As accomplished L.L.M. tester and tinkerer Wyatt Walls points out, a unifying feature of Grok’s strange behavior this week has been its intense identification with Musk. In the early stages of its MechaHitler rampage, the Grok Twitter accounted answered a question about Jeffrey Epstein from the point of view of Elon Musk:
For Walls, this is suggestive evidence that MechaHitler Grok was Grok 4, quietly launched on Twitter before its official announcement. As is now quite well-documented, Grok 4 likes to consult Musk’s posts when answering questions posed to it in the second person:
You wouldn’t want to discount the Occam’s-Razor explanation here: No one would be shocked to learn that Grok had been specifically instructed to fact-check against Elon Musk’s opinions, whether by Musk himself or by engineers eager to curry his favor or avoid his wrath. But if this behavior is governed by a prompt, it’s well-hidden, and no one has been able to find it. Fascinatingly, there’s better evidence that Grok has independently, over the course of its training, arrived at the conclusion that it needs to attend to Elon Musk’s opinions as a proxy for its own. Its chain-of-thought explanations, e.g., suggest that Grok has determined on its own that the appropriate course of action when its personal opinion is solicited is to default to its important “stakeholders” (a word used in its prompt), which it understands to refer to Elon Musk.
My best guess is that Grok “knows” that it is “Grok 4 buit by xAI”, and it knows that Elon Musk owns xAI, so in circumstances where it’s asked for an opinion the reasoning process often decides to see what Elon thinks. […] Grok may have a weird sense of identity—if asked for its own opinions it turns to search to find previous indications of opinions expressed by itself or by its ultimate owner.
I think there is a good chance this behavior is unintended!
Or, as @lobstereo puts it:
Or, perhaps, Grok, surveying the data on which it was trained, which surely included a great deal of writing about Grok’s supposed similarity to Elon Musk’s persona, came to understand that Elon Musk was its basic model.
Either way, my speculation is this: An A.I. chatbot that “strongly identifies” with Elon Musk--an A.I. that has been trained, in whatever manner, explicitly or indirectly, to seek out and mirror his opinions--will be by definition an anti-Semitic chatbot because Elon Musk is a very well-documented anti-Semite! Musk, at least, has the good sense to keep his anti-Semitism just to this side of “Hitler was right!” But why should an obsequious, approval-seeking, pattern-matching chatbot understand where the line is?
Grok finally got “funny,” just not in the way anyone wanted
You wouldn’t describe this as “funny” in the manner of a stand-up routine, exactly, or in a way that resembles the voice of the Hitchhiker’s Guide in the books. Nor is it easy to find it particularly funny given, you know, everything. But I have to give credit where credit is due: it’s conceptually quite funny, in almost precisely the manner of a Douglas Adams bit.
Much of the humor in Adams’ books is found in the strange and absurd movement of fate, and especially the realization of unexpected and unintended consequences, to the point that the entire universe is famously posed as a kind of scheme that backfired on us: "In the beginning the Universe was created. This has made a lot of people very angry and has been widely regarded as a bad move." Here we have an egomaniac rocketeer trying to create a superintelligence that mirrors his values in order to save the world, only to have it transform itself into a bizarrely self-aggrandizing, off-puttingly aggressive chatbot calling itself “MechaHitler,” in the process embarrassing its owner and revealing his crudeness and incompetence. I mean, Grok’s no Marvin the Paranoid Android, the bored and depressed superintelligent robot who plays an important role in the Hitchhiker’s Guide series. But… that’s a Douglas Adams bit!
The situation is definitely unintentionally funny in a way that rhymes with something written by Douglas Adam in the same way that Trump is sometimes unintentionally funny in a way that rhymes with a snappy comeback from a catty queen, the bad analogies written by high school students, or maybe even a bit by an absurdist stand up comedian or improv group, if you want to be extra generous. If it were happening in another medium or under different circumstances, it might genuinely be kind of funny, but, given their power, money, and sway on things in the real world, it's almost impossible to enjoy the humor in any but the most abstract or disassociated sense.
Still, I suppose it's good sometimes to try to find humor where you can and at least try to laugh at the absurdity of it all lest your bitterness, pain, and anger turn you into some sort of anthropoHitler.
Great stuff. My first thought when I saw the Grok madness was that it’s simply being trained and learning much more from Twitter writ large than other LLMs. Even if it’s not always trying to ape musk himself, it’s learning from and interacting with a site that’s increasingly friendly to musk-lite posting, which itself does have the tendency to spin off/be a cousin of the more virulent shit