This is a really interesting experiment reported in Nature. The research used ChatGPT to create a fake but realistic dataset. This is how they described the implications of the capability they demonstrated in the paper:
“Our aim was to highlight that, in a few minutes, you can create a data set that is not supported by real original data, and it is also opposite or in the other direction compared to the evidence that are available,” says study co-author Giuseppe Giannaccare, an eye surgeon at the University of Cagliari in Italy.
The ability of AI to fabricate convincing data adds to concern among researchers and journal editors about research integrity. “It was one thing that generative AI could be used to generate texts that would not be detectable using plagiarism software, but the capacity to create fake but realistic data sets is a next level of worry,” says Elisabeth Bik, a microbiologist and independent research-integrity consultant in San Francisco, California. “It will make it very easy for any researcher or group of researchers to create fake measurements on non-existent patients, fake answers to questionnaires or to generate a large data set on animal experiments.”
https://www.nature.com/articles/d41586-023-03635-w
Given the recent epidemic of malpractice that has been exposed in behavioural science, this raises difficult questions for open science. The screening required to identify ChatGPT faked data isn’t quite the same as the document forensics which has exposed malpractice in behavioural science and economics, but it suggests we are moving towards a future in which the epistemic integrity of science will require a greater degree of forensic oversight. The unfortunate irony is that the unsustainability of applying this across the knowledge system is likely to incentivise the automation of screening, setting up the potential for an arms race dynamic which could prove immensely destructive e.g. if you know the patterns which forensic analysts will look for to assess the plausibility of your data, ChatGPT could be prompted to explicitly avoid reproducing these specific regularities. I just don’t think what they’re suggesting here could possibly work:
Wilkinson is leading a collaborative project to design statistical and non-statistical tools to assess potentially problematic studies. “In the same way that AI might be part of the problem, there might be AI-based solutions to some of this. We might be able to automate some of these checks,” he says. But he warns that advances in generative AI could soon offer ways to circumvent these protocols. Pulverer agrees: “These are things the AI can be easily weaponized against as soon as it is known what the screening looks for.”
