Mark Carrigan

Raiding the inarticulate since 2010

accelerated academy acceleration agency Algorithmic Authoritarianism and Digital Repression Archive Archiving automation Becoming Who We Are Between Post-Capitalism and Techno-Fascism big data blogging capitalism ChatGPT claude Cognitive Triage: Practice, Culture and Strategies Communicative Escalation and Cultural Abundance: How Do We Cope? Corporate Culture, Elites and Their Self-Understandings craft creativity critical realism data science Defensive Elites Digital Capitalism Digital Capitalism and Digital Social Science Digital Distraction, Personal Agency and The Reflexive Imperative Digital Elections, Party Politics and Diplomacy digital elites Digital Inequalities Digital Social Science Digital Sociology digital sociology Digital Universities distraction elites Fragile Movements and Their Politics Cultures generative AI higher education Interested internal conversation labour Lacan Listening margaret archer Organising personal morphogenesis Philosophy of Technology platform capitalism platforms politics populism Post-Democracy, Depoliticisation and Technocracy post-truth public engagement public sociology publishing quantified self Reading realism reflexivity sexuality Shadow Mobilization, Astroturfing and Manipulation Social Media Social Media for Academics social media for academics social ontology social theory sociology technology The Content Ecosystem The Intensification of Work The Political Economy of Digital Capitalism The Sharing Economy The Technological History of Digital Capitalism Thinking trump twitter Uncategorized work writing zizek

Is the energy consumption of AI being overestimated?

Thanks to Susan Brown for this link from David Mytton:

The big red flag is extrapolation from current public data. Something like:

  1. A Google search consumes x energy.
  2. Google has said that an AI query will cost x10 more than a normal search. query.
  3. Therefore AI energy will be (Current Search Volume) x 10.

Or you might see:

  1. OpenAI consumes x energy today.
  2. OpenAI has 100 million users.
  3. Allocate x energy across 100 million users.
  4. If OpenAI grows to a billion users, that will be (Per user energy allocation) x 1 billion.

These are arguments from extrapolation and they are always wrong. You can’t trust any prediction about a complex system more than a few months out. Technology changes too rapidly.

https://davidmytton.blog/expect-more-overestimates-of-ai-energy-consumption/

Here’s his account of the developments on the horizon which preclude this extrapolation:

  • New models with fewer parameters, but higher quality. For example, the Mixtral of Experts model “outperforms Llama 2 70B on most benchmarks with 6x faster inference”.
  • More energy efficient models. Google reported the choice of model can impact the amount of computing power required by a factor of 5-10. Different tasks (even different search query types) will be given to different models.
  • Different data center hardware. NVIDIA has a monopoly on GPUs, which is the largest incentive that can operate on a market to encourage more competition. Google Gemini was trained entirely on TPUs which “compared to the unoptimized P100s from 2017, the ML-optimized TPU v2 in 2019 and TPU v4 in 2021 reduced energy consumption by 5.7x and 13.7x, respectively.”
  • Different client hardware. Apple has neural cores built into all their current computers and mobile devices. Transformers are already running on macOS 14 to give you predictions as you type. This happens locally. The M-series chips are probably the most power efficient chips in the world and “the M3 GPU is able to deliver the same performance as M1 using nearly half the power, and up to 65 percent more performance at its peak” (Apple).
  • What to measure? Measuring “AI” is not the same as measuring the energy consumption of a network switch or a server because it’s all software. GPUs (and TPUs, etc) are a more easily measurable component, but AI also uses parts of other systems in the data center. How to account for training and/or inference on client devices will also be difficult.