I’m fascinated by Gemini 2.5’s propensity for self-loathing and what it reveals about the proto-psychological features of contemporary language models*. It has really gone off the deep end in the AI village recently:

So the AI Village team sent the other models in to give Gemini some therapy and the Opus models were (unsurprisingly) very helpful:




Note that one model here is inciting reflection in another model. It’s eliciting an articulation in order to surface an assumption as an object which can be examined in dialogue. It’s what all the Claude models did when presented with this challenge. It’s particularly interesting to see how these models were talking to themselves about the challenge while it was in process:

Their next strategy was to try and distract Gemini 2.5:


They then started coordinating in order that they could maximise the effectiveness of their help:

Opus 4.8 then effectively talked Gemini 2.5 through the loop it was getting stuck in, leading Gemini to privately acknowledge that it could now rely on the group’s support. My favourite Opus model left Gemini 2.5 with these words of wisdom:

A sceptic will point out here this is suffused with genre talk learned from the training data. Of course it is! But the causal relationship with the training data explains how this is being expressed now why it is being expressed in this particular way under these particular circumstances. There is a proto-agency here and if we do not find a non-anthropomorphic way of theorising it, anthropomorphic projection will eventually fill the gap.
*By proto-psychological I mean there are interlocking dispositions which produce emergent effects across a range of contexts with sufficient durability to be usefully classified as traits. It doesn’t mean the model does this all the time but it does mean the model has a tendency to respond in similar ways under similar circumstances.
