I did a lecture earlier this week in which I surprised myself by how vehemently I argued that image and video generators are (mostly) functionally useless. The problem I think is that you can rarely produce exactly what you want through a precise description. I can see you could stock libraries of generic stock images this way which someone then chooses from (with all the horrible implications for employment which follow from this) but I struggle to see how you could use them in an autonomous way, apart from for incredibly generic and straightforward things e.g. “a photo of a family at the beach looked happy by the sea”. Though having tried this example, the result was creepy as fuck:

However I tried automatic writing, in the sense of genuine automaticity rather than free writing, in order to see what happens. You can get some evocative images if you approach them in this way but they serve no discernible purpose other than momentary (wasteful) amusement:


It’s interesting how ChatGPT converts the free writing into a prompt, piggybacking on the sophistication of the text model to make the image model less crude than it would otherwise be. I can’t shake the feeling there’s an art to this which I’m failing to grasp, but it’s certainly a very different process to text based prompting.
