Raiding the inarticulate since 2010

accelerated academy acceleration agency AI Algorithmic Authoritarianism and Digital Repression archer Archive Archiving artificial intelligence automation Becoming Who We Are Between Post-Capitalism and Techno-Fascism big data blogging capitalism ChatGPT claude Cognitive Triage: Practice, Culture and Strategies Communicative Escalation and Cultural Abundance: How Do We Cope? Corporate Culture, Elites and Their Self-Understandings craft creativity critical realism data science Defensive Elites Digital Capitalism and Digital Social Science Digital Distraction, Personal Agency and The Reflexive Imperative Digital Elections, Party Politics and Diplomacy digital elites Digital Inequalities Digital Social Science Digital Sociology digital sociology Digital Universities elites Fragile Movements and Their Politics Cultures generative AI higher education Interested labour Lacan Listening LLMs margaret archer Organising personal morphogenesis Philosophy of Technology platform capitalism platforms Post-Democracy, Depoliticisation and Technocracy post-truth psychoanalysis public engagement public sociology publishing Reading realism reflexivity scholarship sexuality Shadow Mobilization, Astroturfing and Manipulation Social Media Social Media for Academics social media for academics social ontology social theory sociology technology The Content Ecosystem The Intensification of Work theory The Political Economy of Digital Capitalism The Technological History of Digital Capitalism Thinking trump twitter Uncategorized work writing zizek

The coming wave of document forensics about to hit the university: the weak signals of generative AI

It was interesting to read how much of Data Colada’s recent investigations rested on ‘data forensics’ which involves, inter alia, utilising understanding of the nature of file formats to make inferences about the behaviour of the user:

A little known fact about Excel files is that they are literal zip files, bundles of smaller files that Excel combines to produce a single spreadsheet [4]. For instance, one file in that bundle has all the numeric values that appear on a spreadsheet, another has all the character entries, another the formatting information (e.g., Calibri vs. Cambria font), etc.

I this forensic analysis of documents is going to move into malpractice cases with generative AI as well. One of the few straight forward ways to infer GAI use is if someone has accidentally copied formatting or meta-data from a chat interface. But there are weak signals of generative AI use which can be inferred from other aspects of the document:

  • A version history which shows sudden and expansive development of the text
  • Hallucinated references, as well as hallucinations more broadly
  • Discordant writing styles or chains of arguments connected to different insertion points in the text
  • Switching between English and American spellings (obviously specific to my context, but it’s an interesting one)

Obviously there are explanations which could be offered in each of these cases but these would in turn need to evidenced e.g. if you c&p’d from another document then please show that document with relevant meta-data and evidence trail. The point is to build a balance of probabilities case from the available evidence, which is where the document forensics come in.

I should stress that I don’t think we should go down this path, I’m just forecasting where we might be heading over the coming months. AI detectors are snake oil but I do think there are a variety of weak signals through which an (extremely fallible) case about AI-use can be made. I find it worrying but sociologically interesting; we’re used to taking files as neutral carriers of information but as the infrastucture of authorship and ownership becomes ever more brittle, the traces in the nature of those files are going to become a battleground for fights to prop up the existing system.

I am becoming a (reluctant) post-humanist in the sense that I am increasingly thinking we need to work towards a new system of authorship and attribution which is epistemically just (which to be frank the current one isn’t much of the time) rather than fighting an ultimately doomed rearguard action to prop up liberal humanism in the academy. If I fought this would work, I’d be all for trying to protect then reform it, but we’re now seeing the logic of digitalisation reaching its end point: an infinitely transmutable artefact simply cannot be definitively tied to individual creativity and labour in the way our existing institutions seek to. In fact this hasn’t been the case for some time but GAI is forcing us to confront the cultural consequences of sociotechnical change in a way many (including myself) have been reluctant to do.


Some points for exploration suggested by Claude following a conversation about this blog post:

  • Examining the nature of collaboration – all intellectual work is already highly collaborative, so how can we properly acknowledge contributions of both humans and AI systems?
  • Rethinking romantic notions of individual genius – intellectual work has always built on ideas and innovations of others, so how can we shift emphasis from individual ownership to collective knowledge?
  • Moving beyond anthropocentric views – recognizing AIs as creative agents in their own right, not just tools, which challenges human-centric concepts of creativity.
  • Focusing on transparency of process over end product – requiring details on methods, data sources, and all contributors (human and AI) to be documented rather than determining singular authorship.
  • Valuing ethical and responsible AI – any new systems should incentivize and reward developing AI that adheres to ethical, fair, and transparent principles.
  • Considering provisional rather than definitive attributions – allowing for evolving acknowledgement of contributions as our understanding of AI capacities grows.
  • Developing new metaphors and terminology – finding new language to characterize AI and human collaboration, moving beyond outmoded dichotomies like author/tool.
  • Rethinking legal frameworks on ownership – intellectual property laws may need reassessment to determine appropriate protections and benefits for both human and AI creators.
  • Maintaining some human agency – while reimagining attribution, the role of human contributors in ethically guiding AI systems must remain integral.