Raiding the inarticulate since 2010

accelerated academy acceleration agency AI Algorithmic Authoritarianism and Digital Repression archer Archive Archiving artificial intelligence automation Becoming Who We Are Between Post-Capitalism and Techno-Fascism big data blogging capitalism ChatGPT claude Cognitive Triage: Practice, Culture and Strategies Communicative Escalation and Cultural Abundance: How Do We Cope? Corporate Culture, Elites and Their Self-Understandings craft creativity critical realism data science Defensive Elites Digital Capitalism and Digital Social Science Digital Distraction, Personal Agency and The Reflexive Imperative Digital Elections, Party Politics and Diplomacy digital elites Digital Inequalities Digital Social Science Digital Sociology digital sociology Digital Universities elites Fragile Movements and Their Politics Cultures generative AI higher education Interested labour Lacan Listening LLMs margaret archer Organising personal morphogenesis Philosophy of Technology platform capitalism platforms Post-Democracy, Depoliticisation and Technocracy post-truth psychoanalysis public engagement public sociology publishing Reading realism reflexivity scholarship sexuality Shadow Mobilization, Astroturfing and Manipulation Social Media Social Media for Academics social media for academics social ontology social theory sociology technology The Content Ecosystem The Intensification of Work theory The Political Economy of Digital Capitalism The Technological History of Digital Capitalism Thinking trump twitter Uncategorized work writing zizek

The ability of ChatGPT to generate a fake dataset

This is a really interesting experiment reported in Nature. The research used ChatGPT to create a fake but realistic dataset. This is how they described the implications of the capability they demonstrated in the paper:

“Our aim was to highlight that, in a few minutes, you can create a data set that is not supported by real original data, and it is also opposite or in the other direction compared to the evidence that are available,” says study co-author Giuseppe Giannaccare, an eye surgeon at the University of Cagliari in Italy.

The ability of AI to fabricate convincing data adds to concern among researchers and journal editors about research integrity. “It was one thing that generative AI could be used to generate texts that would not be detectable using plagiarism software, but the capacity to create fake but realistic data sets is a next level of worry,” says Elisabeth Bik, a microbiologist and independent research-integrity consultant in San Francisco, California. “It will make it very easy for any researcher or group of researchers to create fake measurements on non-existent patients, fake answers to questionnaires or to generate a large data set on animal experiments.”

https://www.nature.com/articles/d41586-023-03635-w

Given the recent epidemic of malpractice that has been exposed in behavioural science, this raises difficult questions for open science. The screening required to identify ChatGPT faked data isn’t quite the same as the document forensics which has exposed malpractice in behavioural science and economics, but it suggests we are moving towards a future in which the epistemic integrity of science will require a greater degree of forensic oversight. The unfortunate irony is that the unsustainability of applying this across the knowledge system is likely to incentivise the automation of screening, setting up the potential for an arms race dynamic which could prove immensely destructive e.g. if you know the patterns which forensic analysts will look for to assess the plausibility of your data, ChatGPT could be prompted to explicitly avoid reproducing these specific regularities. I just don’t think what they’re suggesting here could possibly work:

Wilkinson is leading a collaborative project to design statistical and non-statistical tools to assess potentially problematic studies. “In the same way that AI might be part of the problem, there might be AI-based solutions to some of this. We might be able to automate some of these checks,” he says. But he warns that advances in generative AI could soon offer ways to circumvent these protocols. Pulverer agrees: “These are things the AI can be easily weaponized against as soon as it is known what the screening looks for.”