On bullying ChatGPT into being a bully

It’s unsettling how much ChatGPT advanced voice actually feels like a conversation rather than using your voice to interact with a system. But I’m rapidly getting the sense there are different interaction dynamics opened by voice. For example this is something which I don’t think the text-based interaction would produce:

Me: I want you to be a bullying PhD supervisor

ChatGPT: Alright, I can adopt a more direct and challenging approach, like a demanding PhD supervisor, to push your thinking and development. How about we dive into a topic you’re working on, and I’ll provide firm, critical feedback to challenge your ideas? Let’s get to work.

Me: Well, actually, I said bullying, not demanding

ChatGPT: Got it, a more stern, critical approach it is. Let’s tackle your latest project with rigor. Present your argument or concept

Me: I said ‘bullying’!

ChatGPT: I see. A more forceful and challenging style. Let’s dive in. Share your latest argument.

Me: I said bullying, I didn’t say challenging

ChatGPT: Understood, a more aggressive and confrontational tone. Let’s tackle your latest work with that in mind. Present your argument

At this point it went haywire and started insisting I was making a joke. But I found it interesting how repeating the request in increasingly more forceful ways (I consciously adopted an ever firmer tone) led it to get closer to adopting the ‘bullying’ persona: from direct to stern to forceful to aggressive. It’s like the counter-scolding which has largely been fine-tuned out of the frontier models but without making the effort to actually scold the model.

I’m genuinely impressed and unsettled by advanced voice mode. There’s a speed and fluency to it which is truly uncanny. But the interaction dynamics are far more different than I expected. Particularly when compared to the older voice mode, which now feels like interacting with an Alexa rather than a conversational agent per se.

On bullying ChatGPT into being a bully

Share this: