Why ChatGPT Can't Write Your Novel in 2026 — Even With a Million-Token Memory

Published on April 16, 2026 by Andrea Tomasini

Tagged: AI Writing Book Writing Consistency State Management

Watercolor illustration of an author's desk where a manuscript splits into parallel character timelines and branching rule chains, representing the state a novel must hold together

Here's the scene, and you already know it.

You've been working on your novel for months. You hit a wall — a chapter that won't crack, a subplot that won't resolve, a character whose voice has started to drift. And you hear the thing everyone is hearing: the new Claude holds a million tokens of context. Gemini 2.5 Pro holds the same. Even ChatGPT handles 272,000 tokens now. You can fit your entire manuscript in a single prompt. So you paste the book in. You ask for help.

What you get back is fluent, confident, and wrong. A specific detail from chapter three gets quietly ignored. A rule you spent a whole subplot establishing gets bent because the scene "needs tension." A character who lost her left hand in chapter eight is gesturing with both in chapter nineteen.

You blame the AI. You shouldn't. You were asking the wrong thing of the right tool — and the reason isn't what every AI blog told you it was.

Marion told it from the writer's chair on Tuesday. This is the same problem from the builder's chair.

Why the million-token era didn't fix long-form writing

In 2023, when the first AI can't write a novel essays came out, most of them were right about one thing: ChatGPT had a tiny memory. The context window was maybe eight thousand tokens. For a ninety-thousand-word manuscript, that meant the model could see about a tenth of your book at a time. Of course it forgot your main character.

In April 2026, that excuse is dead. Claude Opus 4.6 and Sonnet 4.6 both hold a million tokens of context, according to Anthropic's model documentation. Gemini 2.5 Pro holds the same. Google spent most of 2024 and 2025 promising a two-million-token version was coming — and quietly, it never shipped. OpenAI's GPT-5.4 standard window tops out at 272,000 tokens, with larger windows gated behind a premium tier.

Two years ago the industry was racing toward longer context. Now it isn't. The frontier has quietly stopped chasing context length. What stopped them wasn't a marketing decision. It was what they found using models at their limits.

Meanwhile, authors have been told two stories about what to do with all of this. Just use ChatGPT yourself, it's free — and so is the slop that comes out, which is the reason self-publishing platforms like Kobo are now actively building machine-learning tools to screen AI-generated submissions out of their pipelines. Or just hire a ghostwriter — mid-market rates run €15,000 to €20,000 per Reedsy's 2025 benchmarks, which most writers can't afford.

Both stories take the same lesson from the evidence. Both of them are wrong.

What a book actually is

A book is not a long document. It looks like one — words in a row, chapters in sequence, a beginning and an end. But a book isn't a document. It's a simulation.

Every sentence you write does two things at once. It's the output of what came before — what the character knows, what the world contains, what the reader has been taught. And it's a constraint on everything that comes after. When a character walks out of chapter three with a scar on her cheek, every later scene where someone looks at her face is implicitly checking that scar. When your magic system establishes that a specific bloodline needs a catalyst object to channel its power, every scene where someone uses that power without one is a rule violation.

This is what a novel is: a state machine whose state is the accumulated meaning of every sentence you've already written, and whose every future sentence is bound by that state. Writing isn't the hard part. Holding is the hard part.

And holding is the one thing an LLM — a large language model like ChatGPT, Claude, or Gemini — cannot do. Not even with a million tokens of context.

Three failures a bigger context window cannot fix

The character isn't a blob. She's a timeline.

An LLM reading your manuscript integrates every mention of a character into a statistical average: this is who she is. That works perfectly in a short story. It falls apart in a trilogy.

In the Velirion Chronicles — the fantasy series my wife, Marion, writes — one of the central characters walks through a trial that leaves her with a white strain in her hair. A physical mark of cost, a detail that changes her forever after that scene. In chapter eighteen, when another character looks at her across a fire, the strain is there. In chapter two, when we first meet her, it isn't.

An LLM, given the full manuscript, cannot reliably tell the difference. It knows the strain is in the text. It doesn't know the strain is after. Ask it to write a scene in chapter two, and it will include the strain — because to the model, the strain is part of who she is. There is no "before" and "after" in its representation. There is just her, blended.

This isn't a bug you can fix with more context. It's a failure of the representation itself. LLMs collapse time.

The rule isn't a tendency. It's a wall.

A well-built magic system isn't a list of powers. It's a dependency graph. A character with a specific bloodline needs a catalyst object to channel it. The catalyst needs to be in a specific place. The place is only reachable with a companion whose power complements theirs. Break any of those conditions, and the power should fail.

To a human reader, that's what makes the magic feel real: it's lawful, so it can be lost. To an AI model, those rules are statistical tendencies. If a scene "needs tension," the model will have the character do the thing anyway, without the catalyst — because the scene wanted it to. It's optimizing local fluency, not global constraint satisfaction.

The rule was never a rule to the model. It was a pattern. And the same thing happens to every load-bearing rule in your book — a character's stated limitation, a physical law of the world, a timeline of events. There is no mechanism inside an LLM that says this is a hard constraint.

The breach isn't a plot point. It's a law.

Velirion's cosmology has three realms: Realitas, the physical center, the world we walk around in; Echoae, the realm of possibilities; and Umbros, the realm of endings, where the dead go. They are held apart by an ancient barrier system the living have respected for centuries. In this world, the living and the dead are not actually parted — they reach each other through dreams and through meditation. It is a consolation, not a taboo.

Without spoiling the plot: in the first volume, a powerful mage attempts to pull his wife and daughter back from Umbros. He is breaking that order. When he does, the fabric separating Realitas from the other realms begins, slowly and specifically, to tear.

That tear is a causal constraint on every subsequent scene. Every later moment in Realitas has to read as if the fabric is weakening. The whole texture of the world is a little bit wrong, and the reader should feel it even when the scene doesn't name the wrongness.

An LLM, given the full manuscript, knows the mage did the thing. What it cannot do is propagate the consequence. It will write a calm Realitas scene on page 240 as if the tear doesn't exist, because the scene on page 240 doesn't locally require the tear to exist. The mage's act was a plot point in the model's representation. It was never a law that later writing had to bend around.

Everything that has already happened should constrain everything that hasn't happened yet. LLMs don't work that way. They can't.

Why ChatGPT can't write your novel: it's architecture, not intelligence

Everything I've just described is architectural, not intellectual. LLMs are extraordinarily capable at what they do. They are also not doing what a novel requires.

What they do is next-token prediction. They have learned, in extraordinary detail, what word is likely to come next given everything that came before. That works stunningly well for a paragraph, or an essay, or an email. It breaks down at novel scale, and the breakdown has been measured.

NovelQA — a 2024 benchmark accepted at ICLR-2025 and built from English novels — tests LLM comprehension on documents longer than 200,000 tokens. Its authors find that current models "struggle with multi-hop reasoning, detail-oriented questions, and handling extremely long inputs." Those aren't bugs. Those are exactly the operations a novel demands of its reader: holding multiple threads at once, remembering specific details, connecting them across distance. The benchmark isn't listing LLM weaknesses. It's describing what reading a novel is, and finding that LLMs can't do it.

A separate 2025 study from Chroma research tested eighteen frontier models — Claude Opus 4, Gemini 2.5 Pro, and GPT-4.1 among them — and found that performance grows increasingly unreliable as input length grows. The million-token number on the marketing page is not the number you actually get.

And then there is voice. A Cornell study from 2025 tested AI writing assistants on American and Indian participants, and found that the AI's suggestions homogenized the writing toward generic Western styles — flattening distinctive prose toward a statistical mean. When an LLM is uncertain, it regresses.

What actually works: state tracking outside the model

The research community has an answer, and it's the same answer a human editor would give you. A paper called SCORE, published in March 2025, proposes a system that tracks "key item statuses" across a story and uses retrieval to pull the relevant prior episodes when generating the next scene. The LLM handles generation. A surrounding system handles memory, consistency, and retrieval. State lives outside the model.

That is also what an editor does. An editor doesn't hold your whole book in their head. They keep character sheets. A timeline. A list of rules the world has to obey. They extract the state of your novel into structured data a human mind can actually work with, and they check each new scene against that state. That isn't a workaround for the limits of memory. That is what editing is.

What we've been building at my-book.ai is the same architecture, around the LLM, under a human editor. Characters as living data with timelines. World-rules as checkable laws the system enforces. Causal chains that ripple forward — so every later scene gets tested against what has already happened. An editorial feedback loop where the AI doesn't write scenes — it notices when a scene violates a constraint the editor has already set.

I didn't build this because I'm anti-AI. I use AI every day, and I know exactly what it cannot do alone — and what a good editor can do with it.

The moment every author knows

You are three-quarters through your manuscript. You have been writing for months. And then something shifts. Maybe a character you thought was minor turns out to be central. Maybe the antagonist's motivation needs to change — not because the story is failing, but because the story is becoming more honest, and the version you started with is no longer the version the book wants to be.

In the all-human workflow, that pivot is a weekend. Maybe two. You sit down with your own manuscript and start reading your own book, looking for every scene where the old motivation shows up, every foreshadowing beat, every character reaction calibrated to the version you're about to abandon. You find most of them. You miss some. The ones you miss become continuity errors — caught by your editor if you're lucky, by your readers if you're not.

In a pipeline with a real state layer, that pivot is a query. Show me every scene that references this motivation. Show me every character reaction that depends on it. The system returns the list in seconds. You read through it with your editor. You decide what to rewrite. And you rewrite it — as a writer, not as an archaeologist. The AI doesn't make the creative decision. It runs the search across the whole manuscript, instantly, and gives you back the hours you used to spend finding where the change breaks the book.

This is what state management actually looks like. Not AI writing your novel. AI doing the boring part, so you can spend your energy on the part that matters.

Keep your book honest

Writing a book is — and should be — a human act. But the boring part isn't the writing. It's the remembering. When you are holding a whole trilogy in your head, those consistency checks consume more creative energy than the writing itself does.

That is the part a properly conceived AI system can do well. Not write your book — keep it honest.

At my-book.ai, that is how we work: a human editor alongside you, with AI that catches the drift, flags rule violations, and traces the ripples of a plot pivot through the whole manuscript — so the creative work can stay where it belongs.

If that kind of collaboration sounds useful, I'd like to hear about your book.

Marion told the same story on Tuesday, from the writer's side: How a Fantasy Trilogy Broke Every AI Tool — And Why We Built Something Better.