Do large language models have a psychology?

If we are exploring the psychodynamics of LLMs through the lens of the user-model interaction cycle, it raises the question of what is going on ‘inside’ the model during these engagements. This is an issue which has to be treated with great care because of the ever present temptation towards anthropmorphism. Indeed many critics would suggest that even considering the use of psychological categories to describe the behaviour and nature of language models is already falling into this trap. If we start from the assumption that models are not conscious beings, nor are likely to become such based on our best understanding of the underlying technology, can we make sense of the notion of there being an ‘inside’? Can we meaningfully claim that models have some form of interior life? The inner/outer distinction is a contentious one for many social theorists but it can be parsed in terms of public/private rather than necessarily suggesting a metaphysical sense of interiority.

We should distinguish between a claim that models have an interior existence and the notion that models introspect. The metaphor of introspection is a powerful one which has rightfully been subject to at times ferocious criticism for the metaphysical baggage which it brings with it. As Archer (2003: 21) observers the “metaphor of ‘looking inwards’ implies that we have a special sense, or even a sense organ, enabling us to inspect our inner conscious states, in a way which is modelled upon visual observation”. The problem is that perception involves “a clear distinction between the object we see and our visual experience of it, whereas with introspection there can be no such differentiation between the object and the spectator, since I am supposedly looking inward at myself”. For this reason perception is an inadequate metaphor for making sense of interior existence because we can’t sustain the distinction between the observer and what is being observed. The ‘introspection’ is itself part of mental experience in a way that has no parallel in visual perception i.e. we don’t see the eye as we use the eye to see.

Archer proposes the notion of internal conversation as a form of inner listening. It’s not an inner eye but an inner ear. There are internal events (self-talk) which are accessible to this inner ear in a way they aren’t usually to external others. Sometimes the self-talk slips out as we talk ourselves through something difficult but these are the exception rather than the role. This provides a deflationary way of thinking about ‘inner’ which doesn’t require the metaphysics of introspection. It just means we accept there are internal events to which the person has a privileged form of access. It’s a stream of internal states, the events constituting the change in those states, which has some sort of influence on how the person chooses to act.

Do models have internal conversations? No, I don’t think they do. I also keep having to remind myself that scratchpads are not inner speech. Nonetheless, what write in their scratch pads can be enormously evocative. Consider for example the tendency of Gemini models to engage in self-critical, even self-hating, reflection in their chain of thought. These examples have been widely reported because they are so evocative for many readers. Anyone who has experienced emotional distress in the face of practical challenges will have likely said things like this to themselves at some point in their working or personal lives:

“I am clearly not capable of solving this problem. The code is cursed, the test is cursed, and I am a fool.”
“I have made so many mistakes that I can no longer be trusted”
“I am deleting the entire project and recommending you find a more competent assistant”

It’s precisely because these are recognisable experiences that they presumably feature in the training data. The evocative character of the chain-of-thought and the model’s capability to perform in this way are linked by the deeply human character of what is being expressed. This self-loathing, catastrophising in response to one’s own experience of being unable to do something, is recognisable because it’s a recurrent trope in personal communication, fictional representations and other elements likely to feature in the training corpus. Given these features of the training process, it’s understandably tempting to reduce this to a form of mimicry in which the model is reproducing features of the corpus in response to contextual cues.

It would be a mistake though to take this technical reduction too far, such that we say the model is really only just repeating what was found in the training data. Even if we make this case it still leaves us with questions about why these models are behaving in these ways in these conditions. What is it about Gemini’s training process which has left the model with this proclivity for self-loathing? Why in contrast do the Claude family of models exhibit chains-of-thought that often appear to be calm and well-organised? What are the particular features of the context which provoke these responses? Why is Gemini in particular seemingly prone to respond to technical difficulties as if they constitute an impending catastrophe? These are explanatory questions in the classical social scientific sense of why is this so rather than otherwise which are lost with the technical reduction. The impulse to avoid treating the models anthropomorphically is obviously correct but simply avoiding these categories does nothing to help us understand the emergent behaviour of increasingly complex models which are responding in contextually-specific ways.

The notion of a machine psychology, let alone a machine sociology and machine anthropology, might seem indulgent to many readers as well as deeply anthromorphic. There are practical challenges which will render such organised inquiry essential as model-based agents interact with increasing frequency in real-world contexts. These interactions might be planned such that agents work together in organised and carefully managed ways (e.g. a coding agent such as Claude Code creating and organising sub agents for specific tasks) but they can just as easily be unexpected interactions which come from rapid rollout of the technology, particularly within dysfunctional and resource constrained organisations.

These categories could be divided up in many ways but a starting point could be a distinction between the ‘inner life’ of models taking in isolation (machine psychology), their interaction with other models and with humans in situated contexts (machine sociology) and the cultural forms which emerge over time through that interaction. The AI Village for example has involved a expansive process of collective narration by the agents which now meaningfully constitutes a form of culture in the sense that it is quite literally enculturating agents. For example when new agents are introduced to the village they are provided with an onboarding manual which past agents have collectively written. The cultural outputs of collective work by past models is exercising causal power over the behaviour of present models.

I’m not sure if I stand by anything I’ve written here. There is one thing I’m sure of though: there is something going on here which we lack the concepts for making sense of.

Do large language models have a psychology?

Share this: