> One well-rehearsed account blames A.I.’s stylistically uninteresting output on next-token prediction: Large language models, this argument goes, intrinsically cannot generate truly great writing, or truly creative writing, because they’re always following paths of less resistance, and regurgitating the most familiar and most probable formulations.
Worth pointing out that this is not necessarily true. One of the key parameters you can control at inference is “temperature”—ie. how much the output can randomly diverge from the most likely next token. All output from consumer chatbots has a non-zero temperature so it’s not strictly the lowest common denominator.
An interesting literary experiment that no one has tried to my knowledge is seeing how different temperatures produce creative writing. I bet there’s a sweet spot with a higher temperature than what’s exposed to end users in chat interfaces (where you want to keep the likelihood of going off the rails and hallucinating lower).
I really love this piece, and think you’ve done a really good job teasing out the demand and supply side aspects of writing and AI writing. I just finished AS Hamrah’s The Algorithm of the Night, which talks a lot about the supply problems he identifies in the film industry, and this interplay of demand and supply of “schlock” vs “art” vs anything in between is front of mind for me right now. Thank you!
I think this question of whether LLMs can ever write well has a nice homomorphism to the debate over whether "literary fiction" is a genre.
In both cases the case against involves some idea of exceptionalism, that the "right kind" of writing doesn't follow patterns or have recognizable tropes. (Which I think is obviously wrong).
One question that I'm surprised I don't see more discussion of: if these LLMs are, fundamentally, "writers", why are we still letting programmers do so much of the direct guidance of their style? What if OpenAI replaced half their software engineers with editors?
I've experimented with "ablating" attention ahead circuits in LLMs to decrease the coherence of their language and style. Something akin to an AI lobotomy. If the correct circuits are identified, it can result in fairly novel language generation and concept creativity. Similar to the RNNs of last decade. Of course, wrangling those outputs into a broader story still requires a human's touch and direction. Which I'm happy to provide for the time being.
What would it mean to try to make an LLM that's great at writing? How do you evaluate if writing is good? I don't think we could just RLHF our way to Thomas Pynchon, no matter how much money there was behind it.
Yeah, and I think there are a bunch of reasons this is the case. But I also think it would be interesting to try—idk, hire a bunch of unemployed lit phds and critics and have them fine tune a model for writing novels.
I liked this. I'm glad you didn't rule out the possibility that an LLM *could* produce good writing, were the effort expended to make that more likely / the economic incentives there.
LLM intelligence, such as it is, is "spikey" in very interesting ways. Which is to say, the fact that a particular LLM, or LLMs in general, aren't good at a particular knowledge work or creative task isn't necessarily an inherent limitation of the technology. That's not to say they may ever become good at any given thing -- the incentives have to be there, the work has to be done to try to make it so, etc etc.
This article about AI writing reminds me of what William Least Heat Moon described as “industrial strength beer”: designed for drinking by someone who really doesn’t care about beer.
> One well-rehearsed account blames A.I.’s stylistically uninteresting output on next-token prediction: Large language models, this argument goes, intrinsically cannot generate truly great writing, or truly creative writing, because they’re always following paths of less resistance, and regurgitating the most familiar and most probable formulations.
Worth pointing out that this is not necessarily true. One of the key parameters you can control at inference is “temperature”—ie. how much the output can randomly diverge from the most likely next token. All output from consumer chatbots has a non-zero temperature so it’s not strictly the lowest common denominator.
An interesting literary experiment that no one has tried to my knowledge is seeing how different temperatures produce creative writing. I bet there’s a sweet spot with a higher temperature than what’s exposed to end users in chat interfaces (where you want to keep the likelihood of going off the rails and hallucinating lower).
I really love this piece, and think you’ve done a really good job teasing out the demand and supply side aspects of writing and AI writing. I just finished AS Hamrah’s The Algorithm of the Night, which talks a lot about the supply problems he identifies in the film industry, and this interplay of demand and supply of “schlock” vs “art” vs anything in between is front of mind for me right now. Thank you!
I think this question of whether LLMs can ever write well has a nice homomorphism to the debate over whether "literary fiction" is a genre.
In both cases the case against involves some idea of exceptionalism, that the "right kind" of writing doesn't follow patterns or have recognizable tropes. (Which I think is obviously wrong).
One question that I'm surprised I don't see more discussion of: if these LLMs are, fundamentally, "writers", why are we still letting programmers do so much of the direct guidance of their style? What if OpenAI replaced half their software engineers with editors?
I've experimented with "ablating" attention ahead circuits in LLMs to decrease the coherence of their language and style. Something akin to an AI lobotomy. If the correct circuits are identified, it can result in fairly novel language generation and concept creativity. Similar to the RNNs of last decade. Of course, wrangling those outputs into a broader story still requires a human's touch and direction. Which I'm happy to provide for the time being.
What would it mean to try to make an LLM that's great at writing? How do you evaluate if writing is good? I don't think we could just RLHF our way to Thomas Pynchon, no matter how much money there was behind it.
Yeah, and I think there are a bunch of reasons this is the case. But I also think it would be interesting to try—idk, hire a bunch of unemployed lit phds and critics and have them fine tune a model for writing novels.
There must be *some* philanthropist willing to do this.
I liked this. I'm glad you didn't rule out the possibility that an LLM *could* produce good writing, were the effort expended to make that more likely / the economic incentives there.
LLM intelligence, such as it is, is "spikey" in very interesting ways. Which is to say, the fact that a particular LLM, or LLMs in general, aren't good at a particular knowledge work or creative task isn't necessarily an inherent limitation of the technology. That's not to say they may ever become good at any given thing -- the incentives have to be there, the work has to be done to try to make it so, etc etc.
This article about AI writing reminds me of what William Least Heat Moon described as “industrial strength beer”: designed for drinking by someone who really doesn’t care about beer.