Thanks to Susan Brown for this link from David Mytton:
The big red flag is extrapolation from current public data. Something like:
- A Google search consumes x energy.
- Google has said that an AI query will cost x10 more than a normal search. query.
- Therefore AI energy will be (Current Search Volume) x 10.
Or you might see:
- OpenAI consumes x energy today.
- OpenAI has 100 million users.
- Allocate x energy across 100 million users.
- If OpenAI grows to a billion users, that will be (Per user energy allocation) x 1 billion.
These are arguments from extrapolation and they are always wrong. You can’t trust any prediction about a complex system more than a few months out. Technology changes too rapidly.
https://davidmytton.blog/expect-more-overestimates-of-ai-energy-consumption/
Here’s his account of the developments on the horizon which preclude this extrapolation:
- New models with fewer parameters, but higher quality. For example, the Mixtral of Experts model “outperforms Llama 2 70B on most benchmarks with 6x faster inference”.
- More energy efficient models. Google reported the choice of model can impact the amount of computing power required by a factor of 5-10. Different tasks (even different search query types) will be given to different models.
- Different data center hardware. NVIDIA has a monopoly on GPUs, which is the largest incentive that can operate on a market to encourage more competition. Google Gemini was trained entirely on TPUs which “compared to the unoptimized P100s from 2017, the ML-optimized TPU v2 in 2019 and TPU v4 in 2021 reduced energy consumption by 5.7x and 13.7x, respectively.”
- Different client hardware. Apple has neural cores built into all their current computers and mobile devices. Transformers are already running on macOS 14 to give you predictions as you type. This happens locally. The M-series chips are probably the most power efficient chips in the world and “the M3 GPU is able to deliver the same performance as M1 using nearly half the power, and up to 65 percent more performance at its peak” (Apple).
- What to measure? Measuring “AI” is not the same as measuring the energy consumption of a network switch or a server because it’s all software. GPUs (and TPUs, etc) are a more easily measurable component, but AI also uses parts of other systems in the data center. How to account for training and/or inference on client devices will also be difficult.