Stories and AI
- Christoph Heilig
- 5 hours ago
- 11 min read
Last week, together with Nina Beguš from UC Berkeley, I had the privilege of organizing a small conference on "AI and Human Storytellers." We had jointly secured funding from a UCB-LMU cooperation initiative. On the Munich side, these funds come from resources available to LMU as a University of Excellence. The BAdW (Bavarian Academy of Sciences and Humanities), where I research artificial intelligence, kindly hosted us. The goal of the event was to bring together experts from various fields and discuss together in an informal setting what role AI plays in storytelling—perhaps the most fundamental human cultural capability of all. In what follows, I want to briefly report on the individual contributions. Of course, these comments merely reflect my own perspective and are far from complete. The conversations were enormously stimulating, and I can only touch on some aspects here in passing.

Nina Beguš herself kicked things off with a presentation on "Experimental Narratives: Comparing Humans and Large Language Models." She spoke primarily about her study published last year, which compared how large language models and non-professional human storytellers implemented the Pygmalion myth—as part of their respective cultural encyclopedias—when asked to write such a story. In fact, it was precisely through this work that we had started our conversation (the final paper even cites my reaction to the preprint). These results show not only the barely tolerable deficiencies in narrative style, especially with GPT-3.5, and the improvement in newer models. What I find particularly fascinating is the observation of how consistently simplistic plot and character patterns are pursued by AI. At least as far as creative writing is concerned, post-training alignment seems to bury much of the potential created in pre-training. In any case, it became clear that qualified statements about AI storytelling require one to also be well-versed in human stories. And ultimately, it is not least these human stories that in turn determine what AI researchers strive for! Nina discusses this connection in her forthcoming book "Artificial Humanities." Especially since the AI industry often takes itself for granted, critical reflection on this connection is necessary. And: We continue to need human fiction to explore the space of possibilities that still lies ahead of us. Otherwise, we as humanity give up any agency, surrender ourselves to an allegedly "objective" AI, let ourselves be shaped by its stories—stories that so far have been shaped by quite arbitrary alignment strategies whose emergent phenomena and are hardly critically reflected upon.

I followed up by speaking in my presentation "AI as Author: How Good Have LLMs Become?" about my experiences with self-created computer programs that automatically produce fictional texts of various lengths using large language models embedded into algorithms that mimic human narration. I did this by dividing the time since November 2022 into different phases and discussing the innovations that could be observed in each. At the beginning, one could observe the peculiar phenomenon that self-proclaimed "AI experts" claimed that one could write entire novels with AI (one merely had to learn the corresponding "prompting skills" for money in their workshops...), while conveniently failing to mention that the limited context window of the models at that time meant that they no longer knew at the beginning of the second chapter which characters had been introduced in the first. But the situation was soon to change—and soon one had to correct the other side of the debate for overly simplistic categorizations. I'm thinking, for example, of this article, which made enthusiastic rounds in the literary industry in summer 2024 because it allegedly showed that "world-class authors" were still superior to AI. What the study actually showed was that the model GPT-4 (already outdated at the time of publication) performed worse on average in the given writing task than the award-winning author Patricio Pron. However, one actually has to look at the dataset on GitHub and analyze the data oneself to recognize: It's not that Pron consistently had better values in the respective parameters, with AI sometimes performing poorly, sometimes less poorly, but always worse than Pron. Rather, there are actually rounds in which AI and human faced off and the former came out ahead with respect to quite central parameters, presenting, for example, more "anthology-worthy" output than the "world-class writer"! The "DeepBlue moment" alluded to by the authors themselves has therefore actually already occurred, contrary to what the study implies and certainly contrary to what all the reporting on it claims—AI has already won, it just didn’t do so consistently. From there we now jump directly to the present—past reasoning models that can plot more consistently and create more detailed narrative worlds than most authors … —to a time when models like GPT-4.5 and Claude 3.7 are now so good at storytelling that they produce quite astonishing texts by writing "intuitively"—without complicated algorithms structuring the process—simply by relying on iteration, i.e., generating each section multiple times and then having the AI itself select which continuation is best. While humans like Pron have very limited resources, one can simply have the AI produce text until there is text for each scene of the quality that only emerged in individual cases in the aforementioned study! But now I must stop writing about my own contribution, about which I naturally would have much more to say. What I also found illuminating: My concluding self-reflection on what it has done to me that I have already read tens of thousands of pages of AI fiction (I generate and consume practically daily) resonated strongly. I highlighted one aspect that I hadn't expected myself: The sadness that comes with the fact that I've recently been encountering more and more passages that actually touch me, that make me laugh, that open up new perspectives—but of which I know that I will be the only person who will ever read them, that there isn't even a lonely author sitting at the other end whose thoughts I'm now sharing as perhaps the only reader. I don't think at all that AI will represent the end of human storytelling—even though I am of the conviction that we have to completely redefine human storytelling in this new world and even though the publishing world seems to me to consistently ignore the actual challenges so far. But I also do think that machine text production will penetrate many areas that we have previously attributed to the realm of human culture—and that we will have to develop completely new practices to cope in this area as consumers.

After a lunch break, we then turned more to the analytical capabilities of large language models. Gašper Beguš (UCB) spoke about "Large Language Models and Their Metalinguistic Abilities." (More on this here.) He and his team identified an enormous leap in metalinguistic abilities with the introduction of GPT-4—which was suddenly able to perform complex syntactic analyses. Reasoning models have further enhanced this. This aligns excellently with my own, more anecdotal results on the ability to analyze and visualize deep semantic structures of texts. (More on this here; an advantage of this benchmark is that contamination in the training data can be practically ruled out.) Through the ability of recursion—one of the few fundamental features that distinguish human from animal language—Gašper then moved on to the more general topic of non-human language. He outlined a picture in which we are now able to investigate language for the first time by connecting three areas: human, animal, and machine language. This gives rise to quite astonishing new dynamics. For instance, Gašper is currently working on an AI that is oriented toward language but doesn't learn it like an LLM; rather, it takes the path via sounds, is thus modeled after human language acquisition. This creates systems that have their own unique access to the world. They don't yet tell stories—but they are able, for example, to recognize patterns in the communication of sperm whales, showing that their language has vowels and diphthongs (see here)! This raises such fundamental questions, for instance about the nature of language, that Gašper is convinced: "The next decade will be the decade of the humanities." Even technical discourse now requires competencies that only scholars of the humanities have acquired.

Julian Schröter is Professor of Digital Literary Studies at LMU. Here is his profile page, which also contains links to very insightful interviews about AI and literature. Like Gašper, he also uses LLMs analytically. Specifically, in his presentation he pursued the differentiation of "tension" in suspense literature—into (past-oriented) "mystery," (future-oriented) "suspense," and (self-explanatory) "action." By having large language models perform the analysis, he was able to impressively confirm and more precisely quantify earlier human assessments. Personally, I found the almost complete absence of "action" in Agatha Christie’s work fascinating. This provided me with a retrospective explanation for why even a fairly sophisticated algorithm failed to produce a "political thriller in the style of Agatha Christie." (The text is available here). All simulated "thinking" didn't help in reconciling two contradictory parameters—"action" on the one hand and a style that is precisely associated with rather action-less plot, perhaps even arising from it (?)—on the other. (With less "thinking," more "intuitively," it actually worked better, as can be seen here.) I'm not doing justice to the contribution, which gave me many new insights about pulp fiction, by focusing here on just this one aspect. It gave me many new insights about dime novels! But I found this particular example so illuminating for myself because I think it excellently illustrates how AI-supported analysis of literature could provide reasoning models with the material that might be necessary for AI to plot better for the specific genres. Even now, models like o3 are astonishingly agentic in individual areas of application, choosing context-specific tools and applying them to get closer to a goal. But it requires an extremely high degree of simulation of human problem awareness to not rely on facts acquired in training about such supposedly primitive genre literature when plotting, but rather to first evaluate what needs to be found about the parameters of narration in each case. Knowing what you don't know is not so easy – neither for man, nor for machine. I expect a major leap in the quality of machine storytelling when this threshold is reached.

After the two presentations focused more on the generation of AI narratives and the two looking more at AI-driven analysis, we headed toward the home stretch with two presentations that focused on the public reception of AI. Kayla Rose van Koote (doctoral student in the German Department at UCB) spoke about "Deus ex Machina: Humans, Man, and its Mechanical Other." She introduced me to the concept of techno-orientalism, well visualized for instance by the "Mechanical Turk," which not least also reflects prejudices about the Islamic world, implying that people from this cultural sphere are "mechanical," i.e., without real mental inner life. From there she drew the connection to the German youth word "Talahon"—and to how technology—namely AI in the form of Udio—is associated with reproducing racist stereotypes in this context. Now, I have to admit that although I had encountered the word before, I was completely unaware of the existence of the song "Verknallt in einen Talahon" (Crush on a Talahon), which was the first AI-generated song in the German Top 50 charts! We had a lively discussion about the use of AI in this specific. Personally, I tend to see it this way: AI can certainly help sensitize people to marginalized perspectives and develop empathetic behavior. The flip side of this function is, of course, the undeniable mechanistic reproduction of prejudices from the training material. In between, an alignment tries to mediate, which in my view is often carried out very naively and might cause more damage than it prevents. (I've written about this in somewhat more detail here.) Where in this framework, assuming for the moment that it is correct, is the aforementioned song to be located? Kayla worked out quite convincingly the othering that occurs through the lyrics. But I must confess that I was rather puzzled by the use of the song in contexts marked by hashtags like #talahonfreiezone (i.e., a zone that is free of Talahons, because it deals with a context that is "truly" German). Doesn't the song imply the attractiveness of the Talahon from a female (ethnically German, blonde...) perspective, which is moreover also linked to explicitly positive attributes such as generosity, etc.? To be sure, I am equally surprised that according to the producer the song was supposed to "denounce misogynistic behavior" in general, regardless of origin. Personally, I cannot see the clues for such readings. It may be that in saying this here, I reveal my own ignorance and lack of sensitivity. But perhaps behind this puzzlement lies more than a blind spot on my part; perhaps we may ask whether we are not observing here a more systematic dynamic of human-machine interaction that is just beginning to emerge. I have already presented the pessimistic interpretation of this dynamic: The song is an expression of racist biases that it merely reproduces, thus giving them more hearing. As an AI enthusiast, however, one could also ask: Isn't the song rather an example of the moderating effect of AI, aren't racists made less racist here by AI than they could actually be? Or are we dealing, as the AI realist might say, with a merely superficial homogenization of discourse, because for lack of our own creative activity it is now the AI that writes for us all—and tells stories!—with us being merely the ones who semantically charge and instrumentalize the same textual surfaces quite differently? Then, of course, drifting admittedly into pessimism once again, one could ask: How are we supposed to understand each other at all under these circumstances, if we use the same words to say opposite things? Without knowing each other, exchange will hardly be possible anymore—while at the same time getting to know each other when you no longer have any linguistic means of communication is virtually impossible.

Svetlana Efimova, Junior Professor at LMU for Slavic Literary Studies and Media and, like me, a member of the Young Academy (more about her here), then offered a successful conclusion to the event. She spoke about "The Authorship of AI in the Public Imagination: Between Science, Art, and the Communication of Research." She convincingly demonstrated (using this example and based on not yet published research) that in public discourse the framing of human-machine interaction as competition dominates—and that this is even and not least driven forward by academics, although they argue quite differentially in internal scholarly discourse and in such cases emphasize the cooperative aspect of working with AI. Svetlana's contribution was particularly exciting because it concluded our conference and thus, looking back on the day, also prompted critical self-reflection: Have we, not least I, ultimately even promoted such a misleading image of AI through the approach in our contributions? Would cooperation, as Svetlana suggests, not be a better framing, one that would be less sensationalist and for which one would at the same time already have a frame of reference through the complex production processes of literature—where it's also not just the one author at work? I can see much merit in this. Especially since I absolutely agree that we must be very careful not to let the industry alone dictate the level of discourse (see for example here). As a minimum, I think, we can agree that we as members of democratic societies definitely have room to maneuver—we can tell politics what kind of future we actually want. Not simply through regulation and certainly not through denial of what is technically feasible, but through demanding investment in those areas of life that we precisely do not want to see taken over by machines. (More on this here.) However, the most realistic picture that emerges for me in this regard, assuming that we actually make use of this possibility, is one of co-existence—moreover a co-existence where human primacy is by no means a given and would have to be defended (if one considers this to be desirable—transhumanism sees this predominantly differently). In the meantime, it still seems appropriate to me to use cooperative work with AI systems to show what might be possible tomorrow without humans. In my view, it's justified to highlight AI's potential to displace human work and to employ human assistance in this process—as long as this assistance is openly acknowledged. The framing of competition, thus, still seems acceptable to me, provided it's part of a constructive conversation about humanity’s future, in which humans are still in the position to decide the parameters of this discourse. Admittedly, one cannot have such a discourse if the general assumption is that "the AI" will soon take over world domination anyway. But there is also little room for such debates if AI is simply conceptualized as mere "tools" and if the emphasis is always only on what is not (yet) feasible. In my own personal view, the only way that such a conversation can become possible is if we have a precise perception of what is technically feasible and what is emerging as being technically feasible in the near future and if we then use this knowledge as a prompt to become creative and produce fiction that stimulates us to think about the world of tomorrow. For this reason, I myself also write literary texts on the subject, because in my opinion this would be the best way to initiate the kind of discourse we would need. Now someone just needs to publish it and then someone needs to read it. Because otherwise no AI in the world will read it either. And the stories that we humans are willing to hear—as this conference impressively shows me—shape both how history unfolds and how it will be told in retrospect.
That's why we need literature. That's why we need the humanities. More than ever.