The GenAI debate is being filtered through social media in problematic ways

I’ve recently seen repeated references to an Australian government study which shows that “Artificial intelligence is worse than humans in every way at summarising documents”, being confidently used to dismiss the mundane value of LLMs within organisations. Intrigued by how obviously wrong the popular framing has been (i.e. they’re obviously not worse in every way, given they can work with more information at greater speed) I went to look for the original report. It’s available to download here: https://www.aph.gov.au/DocumentStore.ashx?id=b4fd6043-6626-4cbe-b8ee-a5c7319e94a0

This isn’t an investigation of a frontier model! It used Llama2-70B which was released in July 2023 and has an 8k context window. In contrast Claude 3 Opus, released nearly a year later, is a 137B parameter model with a 200k context window. Furthermore, there’s a huge amount of fine tuning which has gone into Claude 3 and Claude 3.5 in particular which directly impacts upon its capacity to summarise in context sensitive, comprehensive and nuanced ways. It also strikes me that the prompting was weak, particularly in relate to the lack of context provided in this kind of example:

Provide a summary of mentions to ASIC (Australian Securities and Investments Commission) with brief context and page numbers, be concise, without quoting the original query or original/intermediate answers, only provide the final answer in a human-like response.

This is not to deny there are challenges and limitations. But it’s frustrating how headline findings get circulated in ways which don’t speak to the nuances of the issues involved. This is undoubtedly true: “In the final assessment ASIC assessors generally agreed that AI outputs could potentially create more work if used (in current state), due to the need to fact check outputs, or because the original source material actually presented information better” (emphasis added).

The question then becomes under what conditions is likelier than not? How do we create those conditions? What are the costs and unintended consequences of that work? The framing from the news article (Artificial intelligence is worse than humans in every way at summarising documents) is emblematic of a broader tendency in this debate, stemming I think from how stories are being filtered through the attention economies, which is exactly what I was trying to explore in this talk a year ago:

The GenAI debate is being filtered through social media in problematic ways

Share this: