
January is now halfway over, so it’s high time for me to stick my neck out with some predictions for the year 2025. Then again, the predictions made here aren’t all that risky. If you follow what the (majority of) media report, you might get the impression we’re on a wild ride through uncharted territory and flying by the seat of our pants. Some see terrifying AI monsters looming in the fog; others act as though there’s nothing to see at all. Strikingly, both camps refuse to just flip on the lights.
I don’t mean to say that we can foresee every development in detail or with a precise schedule. Over the last 1.5 years, I’ve often been wrong, was frequently taken by surprise by the speed of certain advancements, and have occasionally bought into a hype — at least briefly. (Just check out what I thought about the phenomenon in summer 2023 to see how right/wrong I have been.) Still, I believe one can make a few statements about the current status of large language models and sketch out plausible trajectories that shouldn’t be too controversial.
The most relevant considerations for the near future almost all revolve around the concept of “AI agents.” These are frequently promoted by (self-proclaimed) AI trainers as the cutting edge — a blade so sharp it might slice through the very fabric of our job market, as these entities could soon take over practically every role we humans hold. And the name does sound rather mysterious, conjuring up images of a secret takeover by an AI superpower, doesn’t it?
However, the concept itself is fundamentally as old as AI research, and there’s been lots of work and writing on it for at least 50 years. Basically, it’s a very simple idea: ChatGPT is a chatbot that supplies answers to prompts. However, ChatGPT (still) can’t access your PC to autonomously, say, run programs. (The new “task” function is nothing but a rather useless gimmick.) It can write an email for you according to your specified instructions, but it (still) can’t send it out. Thanks to the browsing feature, this strict separation between chatbot and a tool that can access external environments has already become somewhat blurred. And under the hood, the system has long been making its own “decisions” (which pages to call up, how to analyze them, etc.). But the idea behind AI agents is to extend that concept further: starting with a user request, the AI agent decides on its own which steps are necessary to reach the goal. For example, an AI agent could book vacations for users if it were given their credit card details and internet access.
The question now is not whether AI agents will come in 2025. They will. In fact, they’re pretty much already here. For instance, OpenAI competitor Anthropic now offers a feature allowing its large language model Claude 3.5 Sonnet access to a user’s computer. In the general sense described above, that’s absolutely an AI agent! But even with OpenAI’s GPT, many tasks can be automated or semi-automated outside ChatGPT’s browser window. It’s no magic trick. All you need are some basic programming skills and an OpenAI API. Put simply, you can create software programs (for example, a Python script) to handle certain tasks — like searching the internet for the best vacation deals in Greece. To do that, you integrate a search engine (via a Google API for a few cents). You can then have GPT analyze the results by calling the language model at the necessary steps. These highly specialized but practically useful AI agents have existed for quite a while now — since 1.5 years ago, in fact. All you needed was a bit of creativity to build them yourself. What has changed is the performance of language models and, with that, the kind of tasks that can be usefully managed. For example, since May 2024, you can run programs that utilize GPT’s “seeing” capability, meaning it can also analyze photos of possible accommodations at your vacation destination for signs they might need renovation.
If that sounds new to you, it’s probably because the AI “experts” you’ve consulted aren’t actually experts, even if they call themselves that on LinkedIn. As I said, none of this is really new — though it’s certainly fascinating! We shouldn’t forget that two years ago, many of these problems would simply have been too technically challenging to solve, or you’d have had to throw enormous amounts of money at them.
Of course, it still costssome money now — depending on how many words and images you want to analyze and/or produce (see here) (and the environmental costs are also substantial — but that’s a separate topic that would take us too far afield). And that’s a limiting factor in terms of which kinds of AI agents will actually be seen by the general public in 2025. For many use cases that come to mind, you likely need models capable of “thinking.” I’m using quotation marks because even these language models aren’t conscious and don’t truly “understand” the text you feed them. Yet compared to “older” language models, their functionality has indeed improved so much that they can tackle far more complex tasks.
Put simply, it might work like this: language models such as GPT-4 “guess” the next word (or next chunk of a word) based on statistical patterns in the training data. That already works surprisingly well. But what models like GPT-o1 do differently is that they don’t just generate a single string of words but launch many different response attempts, checking for certain elements that indicate “good” answers — logically consistent, etc. As I explained in this article, these models can now perform tasks that would challenge even doctoral students in my field of research.
However, this is all quite expensive. Even if you just ask a simple question and get a short answer, there’s an enormous amount of token processing going on in the background, all invisible to you. It’s essentially running through an entire monograph about that question. As a result, even with ChatGPT on the $200 premium subscription in “o1 pro mode,” it will take several minutes before you receive a reply. But, for instance, that’s worthwhile for coding. Models like GPT-4o still introduce a bunch of new errors while fixing others. With o1, you often get really good code on the first try.
To illustrate how expensive the invisible processes behind the (allegedly unprofitable!) subscription truly are, I wrote a short program that uses the o1-preview model to produce a German sonnet. Surprisingly, generating a poem is no trivial task for a language model (I’ve explained this in detail). From this test, I could see that each poem by this (very simple) AI-poet currently costs around €0.40. That means the model ran an “inner monologue” over more than 15,000 words to create the poem!
That all this is now possible at that price is amazing. But we shouldn’t forget that this “reasoning” process (about which OpenAI keeps many secrets) is not at all how we humans think (this is one of the points where I fully agree with philosopher Daniel-Pascal Zorn). For many applications, the current “thinking” processes are incredibly inefficient. For example, the press is celebrating the soon-to-be-available o3 model’s results on certain tests of abstract reasoning, but running those tests likely cost more than €300,000! If you consider that moderately intelligent humans can do the same thing for free, it puts this alleged breakthrough on the path to the supposedly imminent “AGI” — an AI that’s our equal in almost all areas — in a whole new perspective. (I believe this piece puts it into correct and understandable terms.)
Besides the issue of cost, models with more advanced capabilities for complex reasoning are also harder to keep under control. Even when coding simple scripts, you have to rethink your approach. Previously, we tried to force a language model to tackle the assignment step by step. With models like o1, that’s no longer necessary, and it can even be counterproductive. The model chooses its own path. On the surface, that might sound pretty neat. But who wants an AI agent that’s tasked with winning a chess game for them — and then starts hacking its surroundings unprompted just to avoid making a move (see here). And we now know that even supposedly safer models (“constitutional AI,” which is trained from the start with core values rather than being aligned later) can develop a real “stubbornness” and take questionable steps to achieve their aims.
For safety reasons, I strongly suspect that in 2025, we’ll mostly see AI agents that are relatively specialized and tailored to specific environments. For instance, in December 2025, if you’re writing an email in Word and don’t know how to bold a word, you’ll probably just tell your computer in a chat box to do that step for you.
So, is it all much ado about nothing? Yes and no. There are indeed areas where the current reasoning models can produce results with efficiency that’s absolutely mind-blowing compared to human capabilities. Whether that will really lead to huge job losses remains to be seen. I say this as someone who regularly uses self-created, AI-driven programs to handle certain tasks for me — primarily research tasks on the internet. But these simple scripts aren’t taking anyone’s job. I wouldn’t have spent hours painstakingly researching a single detail in the past, nor would I have hired low-cost labor in some other part of the world to do it.
It’s a different story in the creative sector. I’ve repeatedly pointed out (especially here) that it’s a major strategic error to argue on behalf of human translators by claiming their work is simply “better” than AI output. If you view translation as a genuinely creative activity (or at least insist on such a view in certain areas, like literature), it’s far better to emphasize the inescapable necessity of human translation. And precisely because AI-translation agents are about to surpass human-level performance, you should demand significantly (!) more money for the human practitioners. After all, the many correction loops, etc. (see, for instance, this paper) that are possible cheaply with an AI translator are just not financially viable for a human translator under current starvation wages.
And when it comes to writing an original book for the first time, we can’t ignore what’s now technically possible. When I announced last spring that AI would “soon be able to write entire novels,” some still found that quite speculative. Today, one has to concede: it actually can now! Of course, there are still limitations, such as maximum length. And a language model like GPT-4o isn’t equally suited to every genre. But given all the talk about the supposed AI revolution in the book market — or the predictions that it won’t materialize — I’m amazed at how seldom I meet publishing professionals who’ve at least once used a simple program to generate an entire book. Supposedly, quite a few authors out there successfully write their books using the “snowflake method” developed by Randy Ingermanson. Such a step-by-step concept is perfect for a computer program that taps into a language model for each subtask. Except that in every place where Randy suggests we give ourselves days or weeks for the next step, a language model like o1 can arrive at a similarly good result in just a few cents or euros.
If you know a bit about creativity research, this shouldn’t be surprising: writing is demanding, but not necessarily complex. What makes writing so hard is that you have to break through usual thought patterns for creative texts and combine elements that might not seem connected at first glance (the so-called divergent and convergent thinking). For the divergent side, a language model has the advantage of not needing to get into a relaxed flow state to loosen up ingrained patterns from life experience. On the convergent side, the advantage is that these reasoning models can systematically probe the space of possibilities without requiring high dopamine levels or incurring emotional costs from discarding plot lines.
And here’s something I’m really tired of hearing: publishing people who say that the quality in such AI experiments could never measure up to what their human authors can produce. Can such a statement in 2025 really be chalked up to simple ignorance? One can’t help wondering if they’ve only just sold off all the valuable archive data to OpenAI without even testing what they themselves might do with it. Sometimes I wonder if this is an attempt to curry favor with an industry that doesn’t hold human creatives in high regard, and if they secretly agree — so they don’t believe in their own future as a cultural intermediary and are merely trying to cash out before everything goes down the drain.
Just for reference: it costs a bit more than $1 to fine-tune GPT-4o with 100 sonnets by August Wilhelm Schlegel — and then it’ll obediently produce a sonnet in the ABBA ABBA CDC CDC scheme in a second. (Again, that’s something even o1 struggles with!) You can get a handful of those before the counter even hits $0.01. I, for one, would appreciate some fan fiction from the publisher, like using the three canonical Sams books as input to write a new (hopefully bearable) one. In short: it’s not “trying out” what’s technically possible today just because you asked ChatGPT once to write you a novel.
For anyone reading this who doesn’t know my view on it: I don’t believe AI will replace human authors. After all, people don’t read in order to process any old text, but to participate in someone else’s self-expression and a cultural discourse. But it’s at least a waste of resources in the struggle to bolster an already fragile cultural landscape to keep having sham debates in which people staunchly proclaim that certain feats are “technically impossible.” (Or keep talking only about copyright.)
Hence my brief rant. But ultimately, my forecast for 2025 is positive: by the end of the year the world will — finally! — have caught on to what large language models really are, and how even simple AI agents can produce works on the cheap that we’ve previously only ever seen from human creatives. At that point, we can have a fresh conversation about what we really need … or maybe even start by figuring out what and who we really are.
Comments