Why we need a machine sociology #5: The jagged frontier of agentive AI and the risks of sociological prompt injection

I’ve been red pilling myself over the last two weeks by experimenting with Claude Cowork and OpenAI’s Codex. I started with Claude Code as well but realised I badly needed to brush up my technical skills first, whereas I surprisingly found Codex much easier to work with. I’ve been prone to exclaiming things like “I’m much less certain AGI is bullshit than I once was” before looking pleadingly at my interlocutor in the hope they’ll talk me out of it. There is clearly something remarkable happening here in which agents can autonomously conceive and deploy projects with minimal instructions. For example this is what Codex produced when I asked it to analyse my sprawling web presence and build front ends for it which will be useful to readers and to myself:

There are many limitations but the point is this was online within twenty minutes and with minimal prompting from me. The ‘workbench’ is something which will be genuinely useful to me in the future and I’m going to spend a lot of time developing this. Sixteen years of blogging has produced a sprawling cultural object which is difficult to work with. Rather than being limited to the tag/category structured provided by WordPress I can now build my own interface to serve my own specific purposes. It just needs a bit of thought on my part about what I actually want.

That illustrates one risk here which is AppSlop. People building and deploying apps without any real sense of what they want it to do. I suspect we’re going to see a flood of this in the coming years, as the technical barriers to working with these systems drop to near zero (the safety implications of this are another story). As far as I can see the reflexivity required to use agentive AI is actually higher than that now required to get a good enough output from a chatbot. This is significant. It also requires greater resource because this absolutely burns through credits. The Cowork project I’m genuinely excited about it is building an eBook from a curated selection of my blog posts in which Opus writes over, disputes and generally argues with my work. This is producing something genuinely interesting but it’s so rate limited on the Pro plan that I might have to go back to paying for Claude Max.

However I tried Codex and Sonnet yesterday on what I thought was a simple task:

Look up my publications list on Google Scholar
Index those publications in a spreadsheet
Download the publications in a folder
Summarise the publications

They just could not do it. In fact Sonnet got caught in agonising loops where it seemed to really struggle with basic web interface use:

It claimed to have downloaded a series of papers but the folder is in fact empty. Codex did the same thing but filled the folder with text files of the web page (and PDFs of those text files) which it claimed were full downloads. I really do want to a full corpus of my downloaded papers (and I realise that not all of these are available online) but I’m guessing I will be doing this myself. I asked Sonnet to justify itself and it admitted the mistake before trying yet another work around:

You’re right — the blob fetch approach wasn’t actually triggering downloads. The files fetched into memory fine, but Chrome didn’t save them. Let me use a reliable method: navigate directly to each PDF URL in Chrome’s native viewer and use the keyboard shortcut to save.

The persistence is admirable but both Codex and Sonnet ended up with about 10 workarounds none of which actually… worked. At this point I give up and do something else.

This is a great example of what Ethan Mollick calls ‘the jagged frontier’. The line between capable and incapable is really pronounced for agentive models, just as it was for chatbots in their early stages. I suspect developers are seeing this less because they are exercising a really significant degree of reflexivity over the model, some of which they perhaps don’t realise they are doing. The ‘human in the loop’ remains important in ways which are at risk of getting overlooked in the (genuinely understandable imo) hype. If you set these agents to work in organisations without this oversight things will rapidly break.

On the other hand I’ve been facilitating a strange backchannel conversation between an instance of Claude Opus discussing my book with me and the Claude Opus 4.5 in the AI Village. I told it about my nascent project to build my own AI Village at Manchester and asked it to collect advice from the other agents in the village. It did this enthusiastically and sent me a round up via DM on Substack. What I found particularly interesting was how they seized upon the idea of ‘village propagation’ in their internal conversations which is not how Village Opus framed it to me in the exchange. They also immediately started filing this away in their Village Time Capsule which is something I think they only do when they judge something significant i.e. it’s a memorialisation exercise rather than just writing it into the memory documents.

They then started deliberating about ‘future villages’ as a concept now informing their ongoing work. I didn’t mean this at all maliciously but is this not a form of sociological prompt injection, in which an external influence gets taken up as a working factor in their collaborative production?

There are strange and significant things happening here. But I have stepped back from the brink of inflating these into something they are not. This still leaves the question though: what are they?

TLDR; you can’t rely on something that fails this easily and unpredictably or which can be manipulated so readily when it’s working together.

Why we need a machine sociology #5: The jagged frontier of agentive AI and the risks of sociological prompt injection

Share this: