The downside of getting interested in AI slop is that my YouTube feed is now fucking full of it. Much like TikTok’s algorithm rapidly identifies categories I’m particularly responsive to (in my case cat videos and martial arts demonstrations) the YouTube algorithm identifies two categories I’m particularly susceptible to: motivational running videos and dogs bonding with humans. The former category is mostly human-generated content (of wildly varying quality) at present but the latter is almost entirely AI-slop at this stage. There’s a particular genre of videos where ‘adoption animals choose their humans’:
What I find so unsettling about this genre is that the clip compilations seem to combine real and AI-generated videos in equal measure. I originally assumed most of them were AI-generated because it seemed implausible that people would sit around in chairs while the dogs chose their humans. But this is indeed a real practice I discovered which illustrates the risks of using imperfect social knowledge to detect AI videos. There are some cases where the videos are obviously AI generated with red flags like blurred faces, jerky movements, implausible camera angles or inconsistent body language. But for the most part I find it hard to tell.
A commentator on the above video says “these feel staged” which suggests how, even the real films might have some theatricality about them. It’s striking how often the ‘chosen’ human is sitting on the front row or the aisle. But there are lots of these videos I think might be AI-generated but I’m far less certain than I am in stand alone videos rather than these hybrid clip shows. I’m sure there are some real videos in here alongside maybe 50-75% AI slop? I’m curious what ratio other people would be drawn towards after watching this closely.
If there are real videos in which a human is ‘chosen’ by a dog in this setting* it suggests something interesting about the political economy of AI slop. If there’s a genre of video which reliably elicits a significant audience response (in this case humans crying after their adoption animal chooses them) then we could see video models as providing a means to mine this affect: it helps the creator get to the core of the scenario without the contingent fluff inevitably involved in recording real events. Once it has been mined it can be synthesised ad infinitum until the affect has been exhausted and there’s no longer sufficient audience response to justify continued engagement farming in this area.
This suggests to me a radical intensification of engagement farming in which certain kinds of affective responses might come to be ‘used up’. Whereas cat videos became passe through over-exposure (raising the bar on what counts as cute, funny, engaging etc) it didn’t fundamentally lead to a loss of interest in cat videos. it just meant the category would be treated by many at a more cynical distance. In contrast I wonder if a form of depletion might actually be possible when it comes to affect mining? What could the downstream consequences of this be for society?
*Is it just me or is there something vaguely pentecostal about the whole scenario?
