And Yet, AI Can Write Novels!
- Christoph Heilig
- Mar 20
- 9 min read

It is almost as if Sam Altman, CEO of OpenAI, had deliberately wanted to fuel the discussion on AI and literature for the Leipzig Book Fair when he recently spoke on X about a mysterious new model his company is currently training that is supposedly particularly good at creative writing. Unlike with other hints of major advances, this time evidence was immediately provided - the first AI short story that truly moved him, Altman, for the first time.
Even colleagues like Eliezer Yudkowsky were not so convinced, offering their human alternative stories as clearly superior for comparison. There were, however, serious literary voices, such as Jeanette Winterson, who saw value in the text. And for that faction which couldn't emphasize enough how poorly AI writes, it became rather embarrassing when they also bashed another supposed AI text which turned out to be a quote from Ayn Rand.
In their zeal of AI mockery, these AI critics missed something that would have been worth highlighting: that Sam Altman's post actually revealed an astonishing naivety regarding what literature is and what is technically necessary to produce it automatically. That AI development has a rather limited understanding of culture is hardly breaking news. But that the CEO of OpenAI so unwittingly reveals how aimlessly his company is actually fishing for creative AI is surprising. Because, to put it briefly, the problem with automated production of AI texts doesn't really lie in the narrative weaknesses of the model. What has instead emerged as the holy grail is the production of longer narrative forms like novels. Getting a coherent plot and plausible character development throughout an entire manuscript in a single prompt is not possible so far. A model that now produces marginally better (more emotionally moving to CEOs) short stories actually changes very little about the status quo.
Of course – as little as Altman understands about literature, AI critics coming from the literary world often understand just as little about AI. Brushing aside any impending competition between humans and machines by now (rightly) expressing doubts about the evidence presented by Altman while simultaneously (wrongly) simply assuming that AI is qualitatively still miles away from human authors, I find at least equally problematic and indicative of ignorance.
Just recently, Sebastian Fitzek stated that in literature, AI can "only do rough preliminary work." I find this an interesting statement from someone who, according to a report on a lecture he gave last year, had reportedly "tried many AI programs." What programs could that possibly mean? Perhaps the chatbots available on the internet, like ChatGPT? Apparently so, as it is stated: "Simply 'Write me a Sebastian Fitzek novel' doesn't work." This is, of course, as unsurprising as it is correct. What should truly cause astonishment, however, is that we are in year three after the introduction of ChatGPT, and the book industry still seems to measure the capabilities of large language models by how well a conversational chatbot can produce narrative works on demand. Where does the idea come from that this gives a realistic insight into the technology's actual potential? Perhaps it stems from the peculiar evolution that the species of AI "experts" has taken in this field – without selection pressure, no evolutionary key innovations were needed to occupy this ecological niche. In other economic sectors, this would be unthinkable. No one would dream of testing the usefulness of large language models for customer communication on such a platform itself. For such applications, everyone understands that companies develop or purchase their own software that accesses GPT and other models in the background (via so-called "API calls"). Why the idea persists that one would know whether "AI" can write books because one has chatted with ChatGPT just like that (or, as emphasized, "extensively") is truly a mystery to me.
As someone who researches, among other things, this exact question of whether, how, and with what societal consequences AI can compete with humans in text production, I must challenge the consensus channeled by Fitzek: Yes, "AI" can indeed write books now, and it is absolutely foreseeable that these texts will soon be indistinguishable from the products of top human authors, even by experts.
This assumption is based on the following observations:
First, study results on divergent and convergent processes – the two foundational pillars of creative thinking – show that at least since GPT-4, 90% of people (or more) can no longer keep up with large language models. This is really old news. So the fact that the "lack of creativity" in AI is often cited as evidence that human authors have nothing to fear from competition is at least strange. My impression is: Here, people's own small experiments with ChatGPT and – indeed often – sobering results are misinterpreted as symptoms of a fundamental lack of creativity. Just because you yourself are creative (most authors are, though not all would belong to the aforementioned 10%), doesn't mean you can particularly well recognize creativity. Often it's entirely different weaknesses – correctable weaknesses, often stemming from alignment – that are misinterpreted here. Anyone who doesn't believe this can simply have their own creativity tested – and then compare themselves with leading language models. These tests are freely available on the internet.
Second, there is now a rapidly growing body of research literature that specifically pits human and machine narrators against each other and examines how real readers respond. The basic tenor so far has been (and to some extent still is today): When the recipients are professionals working in the literary industry, and when the writing representatives that the human race sends into the field are professional authors, then "we" humans do indeed have the edge.
Ultimately, we see this pattern in many other areas as well: AI is, for instance, also funnier than most humans, but still significantly less funny than the funniest humans – and these humans, in turn, don't benefit from additionally using AI. It's really an astonishing example of "shifting the goalposts" that in 2025 we celebrate such results as underpinning human primacy. I see in this anything but a reason for arrogance. It's worth looking at the fine print – or actually looking at the studies themselves. Last year, for example, an experiment went through the literary world with a sigh of relief, supposedly showing that good human authors still clearly outperform GPT-4. But what's always overlooked: The panel of professional literary critics and scholars also thought that nearly 46% of the texts written for the experiment by "world-class" author Patricio Pron weren't good enough for an anthology – while almost 14% of GPT-4's Spanish texts and a full 18% of its English texts from the same model would have made it into the anthology! One has to let that sink in: Yes, on average, humans were ahead in the battle of Pron vs. Prompt, but some of the AI texts were actually judged as better than the human "world-class" products! The "Deep Blue" moment of literature is therefore already here!
Third, it shows time and again how strongly people embedded in the literary business overestimate the abilities of normal readers to recognize AI, and how little they are able to correctly gauge the taste of the average consumer. Large language models are, for instance, terribly bad at almost nothing as much as poetry, and yet a study could show that even AI imitations from GPT-3.5 (yes, 3.5 – that antediluvian thing that also told you how to build bombs!) were preferred over originals by giants like Shakespeare, Byron, and Eliot – they were considered not only better but also more human than the real ones! And this human inability to recognize cheap forgery was even independent of education level.
Fourth, for a long time, the context window of large language models was simply too small to meaningfully produce novels. In the first half of 2023, there were all sorts of charlatans claiming one could easily prompt together an entire book – without mentioning with a single word that by the second chapter, ChatGPT would have already forgotten which characters were introduced in the first. Today, however, the length of the text is no longer a major problem for most genres. (Humans, on the other hand, produce plenty of inconsistencies in plots that make it through editing – our own "context window" is also very limited, albeit in different ways.)
Fifth, when it comes to models suitable for creative writing, the future is already here! Sure, there's still room for improvement. But GPT-4.5 is already very good at creative writing (and in my opinion, really only at that – it's rather disappointing for everything else), Claude 3.7 too. I'm certain studies will soon show that these models (with the right parameters, which you cannot set within the chat platforms!) perform better in limited writing tasks – such as writing certain sections of a plot, etc. – than average writers. Yes, these models still sometimes miss stylistically. But: As with the debate about translation, there's little point in emphasizing AI's flaws while simultaneously having a blind spot for human inadequacies.
And where AI – with already existing models – might now have the edge is in plotting and worldbuilding, due to the "reasoning" models (discussed here). While an author might ponder the next plot twist for days, such a model writes what amounts to an invisible monograph on how best to proceed within minutes and without any soul-searching. With this thoroughness, hallucinations are now the domain of human authors; a Robinson Crusoe who swims naked to his ship only to put cookies in his pockets doesn't happen anymore. And for worldbuilding, every author now has a team of experts in all fields available, who can – an example from OpenAI's press release itself – quickly sketch a linguistically plausible language 500 years in the future. The quality standard for human science fiction must dramatically increase with these tools, that much is certain.
Sixth, it must be noted that it would certainly be possible to train large language models to be even better storytellers. This hasn't been a focus so far. The focus has been on demonstrating abilities in mathematics and programming to score points in corporate competition. If these models were specifically trained to reproduce plot patterns, one could equip AI agents that would then write books quite independently. But my point is: In the meantime, we can manage quite easily without such specialized models. Writing books is difficult, as anyone who has tried knows. But it's not complicated. There is (a) tons of research on the writing practices of authors and (b) even more writing guides that are actually consulted and followed by human authors in the writing process. Both can be relatively easily implemented in computer programs where the algorithm then calls large language models at the right points to complete individual writing tasks. These individual decisions require a lot of mental and emotional energy when made by humans. But linearizing them in a program and then leaving them to a language model is simply not rocket science. If one proceeds meticulously enough here, using the right and sufficient intermediate steps, and integrates feedback loops and an extensive revision phase, the result is novels of quite impressive quality. They're at least good enough that large language models consider them human! While AI detectors for essays etc. are notoriously unreliable, crafting long narrative forms has so far been such a challenge that one could hardly fool an AI with them. That's different now. GPT-4.5 and Claude 3.7 seem to represent a turning point here. How many human readers this convinces (/deceives), I can't yet say – I'm currently working on this scientific question. But my hypothesis would be: If current language models are convinced by reading, this will also be the case for the vast majority of human readers.
Such an AI novel still costs quite a bit (a few hundred euros if you want to do it well). But these prices will fall radically this year. The very recent headlines about Ernie 4.5, which supposedly costs only 1% of GPT-4.5, clearly show where things are headed.
This doesn't mean, of course, that AI writes the best literature.
It also doesn't mean we should read AI literature instead of human works – that's a completely different question. The fact that this ethical horizon is constantly mixed with mere feasibility in discourse is striking – ultimately the fault of those who justify the right of human art to exist precisely with its supposedly superior "quality." (Simply crying "plagiarism!" helps little. Because on one hand, this contains an implicit quality judgment that can easily be refuted, and on the other hand, it's not a constructive argument for human art and culture, even if the chosen level of discourse is at least the right one.)
And I don't think, by the way, that we'll soon be reading only AI literature. The danger that we'll soon be sucked into incredibly well-told AI text worlds and never resurface, as Eliezer Yudkowsky speculates, seems rather small to me.
But the points mentioned give at least reason to quickly get used to the idea that "writing good stories" is no longer a unique human feature. And it's a very legitimate and important research question to explore what this will do to us as a society.
However, I must also say – book fair or not and despite all love of literature – the consequences of this human affront are, even if things go really badly, still a comparatively small problem when one considers the challenges that AI (or AI development) is currently presenting us with. Without referring to these, I won't end any AI post until we finally begin to have a discourse about them as a society.
Comments