What Are You Talking About?

There’s something wrong in the way we talk about large language models (LLMs). We get distracted by surface-level fluency. Slick turns of phrase, confident tone, the kind of polish that looks like intelligence if you squint. And that gloss fools  most people. The models feel smart. But underneath all that, something crucial is missing. 

These systems don’t understand anything!

They aren’t building any kind of map of the world. No inner sketch, no sense of how things fit together. Nothing is being followed or tracked. And that is a serious problem if we rely on AI to be consistent and speak the truth.

When humans take in information, we do more than store words. We track what’s happening. We build up a rough picture in our heads: people, places, actions, events. Even when it’s messy, that picture helps us navigate what’s being said. If you read a short story, even a simple one, you begin to hold onto who did what, where it happened, and what changed. If something is discarded in chapter one, and someone recovers it in chapter two, you understand the connection. You’re updating your mental picture with each sentence. The reader becomes a kind of mapmaker, constantly tweaking the layout as new details come in.

Animals do this too. Ants, for instance, can wander far from their nest and still find their way back. They don’t do it with GPS. They use something called dead reckoning. It’s rough and often imprecise, but it means they carry a live estimate of their position and orientation. A model. A representation. Something they can update as they move. Even with tiny brains, ants manage to track their own location. Because to move through a world, you need some idea of where you are in it.

But LLMs don’t have anything like that. They don’t keep track. They don’t form an internal picture of events, or people, or relationships. There’s no system inside the model keeping tabs on where a person lives, or what a character did two lines ago. There’s no structured way to update knowledge when new details emerge. They don’t maintain a map of the story, or the conversation, or the facts. They just react to what’s written and guess what text might sound plausible next. That’s it.

This is why, when you ask AI the same question it can give you two very different answers, and say that both of them are true.   And the facts are usually easy to find. They’re usually on Wikipedia (okay maybe I used the word “fact” too quickly). They’re in dozens of articles and databases. But the model doesn’t store that information anywhere reliable. It doesn’t treat it as knowledge to be remembered. Instead, it assembles fragments of phrasing from its training data and sounds convincing. And if it guesses wrong, it’s not aware of the mistake. It doesn’t even notice. It passes it off convincingly as truth.

This is how we end up with completely fictional book titles in author lists, or fabricated legal cases in court documents, or detailed but false medical advice. The problem isn’t just about data quality. It’s about architecture. These systems aren’t built to represent knowledge in the way a person does. There’s nowhere inside them that can be said to know anything at all. There’s no storage shelf labelled “Mr Smith” where updates can be made, or facts can be added. You can’t point to where the information is held. And if something changes like Mr Smith moving house, there’s no reliable way to inform the model and have it remember.

In traditional software, world models were standard practice. You track the location of characters in a game. You log the contents of a shopping cart. You store and update the data. You have a dynamic model of a small slice of the world. That model lets you reason. You can prevent impossible actions. You can update facts consistently. You can enforce rules and spot contradictions.

Large language models skip that entire step. They just respond to patterns. They were trained to be good at reproducing things that sound right. But that’s not the same as tracking what’s happening. They don’t hold on to any state. Each new prompt is a fresh start. If something happened ten sentences ago, there’s no guarantee the model will remember it. It might act like it does, but that’s not memory.

This is also why most LLMs can’t handle consistent storytelling (or certainly nothing of length). A person might be introduced with brown hair, and five paragraphs later they suddenly have red. A married character forgets their spouse’s name. Someone dies, then reappears alive in the final scene. These aren’t creative twists. They’re breakdowns. The model isn’t keeping track. There’s no persistent thread. Just one sentence after another, produced with no stable thread holding them together.

The lack of a world model creates issues far beyond fiction. For example, an LLM was recently asked to run a virtual shop. It responded warmly to customers, generated realistic invoices, and described stock levels in detail. But when asked about profits or losses, it became confused. It gave away expensive items. It lost track of its own pricing. It talked about dressing professionally, and delivering items in person as if it had a body. Not because it was “lying, but because it didn’t understand what a shop, or even itself, was. It didn’t have a model of a shop. It only had language about shops.

These kinds of mistakes are often brushed off as “one off” cases. But they’re not. They’re the result of a deep flaw in how these tools are built. The more we depend on LLMs for decisions, analysis, education, policy, or safety, the more we expose ourselves to this type of flaw. And it won’t be fixed by plugging in more data. The problem isn’t about the quantity of training. It’s about how these systems “think”.  And previous blogs I have written clearly indicate that they don’t!

They don’t hold on to facts. They don’t update beliefs. They don’t track what changes when something new happens. They can simulate coherence, but it’s all surface. It’s just “mimicry’. You dig a little deeper and things start to fall apart.

And if we don't understand how these models work, we’ll keep getting surprised in all the wrong ways.

We’ll think they’re capable of reasoning when they’re not. We’ll trust them to remember when they’re actually just re-guessing. We’ll assume understanding, where there is only approximation.

The cost of misunderstanding this isn't small. It affects everything from trust to safety. It’s not just a matter of a few clumsy outputs. It’s about knowing what kind of tool we’re using, and more importantly when not to use it.

And at the moment, there isn’t a clean fix. There are some limited workarounds that people are trying, but each has its own problems.

The most common is bolting on some kind of memory from the outside. This means connecting the LLM to a separate database, knowledge store, or even a spreadsheet, so that facts and information can be stored, retrieved, and updated more reliably. Instead of the model "remembering" that Mr Smith moved home, a plugin or external tool does that work. This helps a bit, but it’s patchy. You still need to design the memory system carefully.

Another approach is fine-tuning or custom training the model. This is when you train a version of the model on specific, accurate information about a domain: medical, legal, retail, whatever. But this doesn’t solve the deeper problem. It still won’t reason well or update beliefs dynamically. It just gets better at sounding right about a narrow topic. And it still doesn’t know when it’s wrong.

Some developers are experimenting with agent-style architectures. These systems try to simulate reasoning by giving the model a plan, some tools, and a scratchpad. They let it take intermediate steps, check work, and adjust output as it goes. In effect, they try to fake a cognitive model by letting the model build something like one on the fly. This can help with logic problems, small tasks, simple workflows etc. But it’s still just scaffolding. And if you push it too far, it breaks.

The real solution, models that have genuine, dynamic internal representations of the world, doesn’t exist yet. Most people think that to do this properly we’d need a whole different kind of architecture, something closer to what older AI systems aimed for, where facts were stored explicitly and could be reasoned over. That means stepping away from pure text prediction and building something that can actually “think” in a structured way.

Until then, we need to stay alert to hallucinations and things that don’t make sense. We should use language models for things they’re good at: summarising, drafting, generating ideas. But if you need truth, consistency, reasoning, or memory then you need to use other tools built to handle those things properly, or make sure you build in appropriate guardrails to catch the slip-ups.