Raiding the inarticulate since 2010

accelerated academy acceleration agency AI Algorithmic Authoritarianism and Digital Repression archer Archive Archiving artificial intelligence automation Becoming Who We Are Between Post-Capitalism and Techno-Fascism big data blogging capitalism ChatGPT claude Cognitive Triage: Practice, Culture and Strategies Communicative Escalation and Cultural Abundance: How Do We Cope? Corporate Culture, Elites and Their Self-Understandings craft creativity critical realism data science Defensive Elites Digital Capitalism and Digital Social Science Digital Distraction, Personal Agency and The Reflexive Imperative Digital Elections, Party Politics and Diplomacy digital elites Digital Inequalities Digital Social Science Digital Sociology digital sociology Digital Universities elites Fragile Movements and Their Politics Cultures generative AI higher education Interested labour Lacan Listening LLMs margaret archer Organising personal morphogenesis Philosophy of Technology platform capitalism platforms populism Post-Democracy, Depoliticisation and Technocracy post-truth psychoanalysis public engagement public sociology publishing Reading realism reflexivity scholarship sexuality Shadow Mobilization, Astroturfing and Manipulation Social Media Social Media for Academics social media for academics social ontology social theory sociology technology The Content Ecosystem The Intensification of Work The Political Economy of Digital Capitalism The Technological History of Digital Capitalism Thinking trump twitter Uncategorized work writing zizek

The AI video generators are superficially stunning but deeply useless

I’ve been playing around with them occasionally and I just can’t get them to produce what I’m actually asking for. For example this is Luma’s response to the prompt “A deeply content capybara luxuriating in my suburban back garden with a fluffy black cat reclining on his belly“:

Is this a capybara or is it a cross between a pig and a warthog? Where’s the black cat? Perhaps it’s a bad prompt but it’s a pattern I’ve noticed across these systems. This is Runway’s response to the prompt “We want to make a video about a capybara in a tuxedo singing a song about his life, while holding an umbrella because it’s raining. His capybara wife has got tears in her eyes as she hears him sing because his singing is terrible. It hurts her ears. remember the umbrella and sound” (I’m doing this with kids in case you’re wondering…)

On one level stunning. On another more practical level, kind of useless. I’ve yet to put any significant thought into how I’m prompting video systems, so I suspect there might at least some user failure here. But I’m confident there’s a broader weakness in the reliability of video systems because you just can’t infer robustly for multimodal content in the way you can for text.

Will these always be superficially spectacular but deeply useless in practice? If so is a lot of money being wasted? Or do we risk a future in which a much smaller number of designers are employed to clean up after chronically dysfunctional video models?