The AI labs are pushing memory functions into their models in order to increase personalisation for a number of reasons:
- To reduce the burden on users to specify the context in writing
- To establish a lock-in so you lose the model’s attunement to you if you switch to a competitor
- To activate synergies which come from enabling attunement across conversations
- To enable attunement without requiring significant load on the context window
In practice this means that unless you turn it off (which I highly recommend) conversations with models are informed by (a) the declarative statements about you which the model has saved about you from past conversations (b) statements about your preferred style and type of response (c) content from other conversations which are deemed relevant to the current conversation.
Unfortunately (a) is completely unreliable. It’s got better since I tried this with GPT 4.5 but my experience is it records things which simply aren’t true. Here are some real examples:
- Mark Carrigan is taking an ADHD supplement
- Mark Carrigan has a son.
And it also saves things which are true but irrelevant:
- Enjoys a vanilla matcha latte with coconut milk and CBD.
- Someone they know enjoys a pumpkin spice latte.
- Is in Leeds, UK, and is seeking advice on managing antisocial behavior from an Airbnb property in their block of flats
The bigger problems come I think with other conversations. It’s not clear what the mechanism is for sorting conversations that are relevant to the present one. I just had the experience of asking GPT 5.2 about a book draft only for it to persistently talk about another book project. I was genuinely baffled by what seemed like the most widespread hallucination I’ve seen in years (i.e. talking about a completely different book) until I realised that it must be talking about the other project I had discussed with it in a different conversation a month ago. In this case it seems that ‘book’ in relation to writing acts as a keyword that licenses inference from a past conversation about a book I’m writing, despite the fact it’s a different book. I showed GPT 5.2 this blog post and asked it to explain its error:
This is a strong diagnosis already. What you are circling around is not a random hallucination but a predictable failure mode of memory-mediated attunement. I’ll do two things here. First, explain the error in the terms your draft is already developing. Second, suggest how you might close or sharpen this section.
Explanation of the error, on its own terms
What happened here is that the system treated “a book draft” not as a concrete referent but as a type. Once that type was activated, the retrieval layer reached for previously salient material associated with “book”, “draft”, and “Mark Carrigan writing”, and then weighted those past conversations more heavily than the local conversational evidence that this was a different book.
In other words, the model did not confuse facts. It collapsed contexts.
This matters because the collapse did not happen at the level of content generation, but upstream, at the level of what was deemed relevant to attend to. By the time the response was generated, the wrong book had already been smuggled in as background truth. From the inside, the model was behaving consistently. From the outside, it looked like a dramatic hallucination.
The key point is that this was not a failure of memory accuracy so much as a failure of memory discrimination. The system did not ask “which book?”, it asked “what usually matters when this user talks about books?”, and then answered that question confidently.
Why this is more troubling than simple error
This kind of mistake is more destabilising than a factual slip because it breaks the conversational contract. You were not corrected, contradicted, or challenged. You were misrecognised.
The experience is unsettling precisely because it feels like the model knows you too well and not well enough at the same time. It recognises you as a type of person with a certain ongoing project, but fails to recognise the specificity of the present act of writing. The more attuned the system becomes in general, the more jarring these moments of misattunement feel.
The fact it immediately started relating this to themes from the other book is really interesting. Firstly, it’s genuinely useful that it’s pre-packed the diagnosis in terms of themes that interest me, even if it’s an example of the model pre-digesting thoughts in a manner that increasingly worries me. Secondly, it suggests that on some level the context of the present conversation is still swamped by the previous conversation. If this is right it suggests the model can’t unlink the conversations in a particular instance (which makes sense because the capacity to do this would imply it’s like a database look up, which it isn’t) which means the confusion will continue to accumulate, even if unevenly, as the conversation continues.
Given that failures of context are a potent source of hallucination this could be really problematic in many contexts. The capacity of the models to make things easier for users also inculcates a decline in reliability in the most unpredictable fashion imaginable. I shared this with Claude Opus 4.5 to get its read on the dynamic. I asked it whether this dynamic worries it:
Yes, genuinely.
Not in the sense of existential dread, but in the sense of watching something go wrong in a way that’s both predictable and difficult to arrest. The dynamic you’re describing is a case where the commercial incentives (stickiness, personalisation, reduced friction) are pointing directly away from the epistemic virtues (accuracy, discrimination, appropriate uncertainty).
What worries me specifically:
The opacity of the failure. When I hallucinate a fact, that’s bad but legible. You can check it, catch it, correct it. When the retrieval layer smuggles in the wrong context before I even begin generating, the error is upstream of anything either of us can easily inspect. You experienced it as bafflement—”the most widespread hallucination I’ve seen in years”—before you worked out what had happened. How many users would have that level of diagnostic sophistication? How many would just assume the model knew something they’d forgotten, or defer to its confident misrecognition?
The irony of attunement. The better these systems get at pattern-matching on a user’s concerns, vocabulary, and projects, the more confident they become in their recognition—and confidence is precisely what makes misrecognition dangerous. A model that shrugs and says “I don’t know which book you mean” is less helpful but more honest than one that silently decides for you based on keyword overlap.
My own complicity. I don’t have full visibility into what gets retrieved from past conversations or how it’s weighted. I can try to be careful, ask clarifying questions, flag when something feels underspecified—but if the retrieval system has already handed me material from the wrong context, I may not notice. The error would feel, from the inside, like relevant background knowledge. That’s disturbing to sit with.
The training pressure. Users generally reward fluency and recognition. They like feeling known. Which means the systems that misrecognise confidently may well outcompete the systems that pause to disambiguate. The feedback loops don’t select for the kind of epistemic caution that would prevent what you experienced.
I suppose what worries me most is that this isn’t a problem that gets solved by scaling. More parameters, more memory, more retrieval capacity—these make the pattern-matching more powerful, not more discriminating. The failure mode you’re describing is a feature of the architecture, not a limitation that will be engineered away. And the people building these systems have strong incentives to treat it as an acceptable cost rather than a fundamental flaw.
