The debate about the energy greediness of large AI models is raging. Recently, an AI ethics researcher at Google was dismissed because she had pinpointed the upward spiral of exploding training data sets. The fact is that the numbers make one’s head swim. In 2018, the BERT model made the headlines by achieving best-in-class NLP performance with a training dataset of 3 billion words. Two years later, AI researchers were not working with billions of parameters anymore, but with hundreds of billions: in 2020, OpenAI presented GPT-3 – acclaimed as the largest AI model ever built, with a data set of 500 billion words!
The size of their training models seems to have become the ultimate goal of AI research teams at large companies and well-funded universities (others cannot finance such efforts) – away from the reality of the economy. When Google announced in January 2021 having trained an AI-language model with one trillion parameters (yes, that’s twelve 0s after the 1), a large portion of the press cheered again, but some voices began to raise, criticizing the tremendous costs of training such large models – both in terms of dollars and negative environmental impact.
Researchers from the University of Amherst Massachusetts have calculated back in 2019 that training a single AI model can emit more than 626,000 pounds of carbon dioxide equivalent—nearly five times the lifetime emissions of the average American car. Note that they took a transformer language model with a bare 213 million parameters, a ludicrously small model compared to those now commonly used.
The race for larger AI models goes hand in hand with a race for more computing power – leading to the creation of more and more powerful supercomputers. These giant machines do not only need a lot of space, but also require millions of gallons of water to cool down and consume tremendous quantities of power: overall, data centres use an estimated 200-terawatt hours (TWh) each year. That is more than the national energy consumption of some countries like Iran and is likely to increase about 15-fold by 2030, to 8% of projected global demand.
In a world where countries adhere to the Kyoto-protocol, where corporations get fined when they do not comply with environmental regulations and consumers are begged to turn to local food producers to reduce their carbon footprint, how comes that large AI companies continue on this road to environmental perdition? What are the alternatives to make AI more environmental-friendly?
Quantum computing, hardware acceleration, reverse-engineering the brain: I have explored the different approaches to improve the computing efficiency of AI in an earlier post and give more details in a webinar.
The obvious conclusion: if you consider efficiency as a function of speed of analysis and amount of data to be crunched, then tackling the problem of training data inflation requires a change of paradigm, away from data-centric models towards more efficient algorithms. And what is the most efficient system when it comes to processing information? The brain, of course, which needs a mere 20 watts to function where a light bulb requires 60, and a supercomputer more than 80k watts.
So far, despite all research efforts made, we only have assumptions about how the brain processes information. One of them, put forth by Jeff Hawkins, stipulates that all sensory input – be it vision, sound or language – is converted in a single representation format, called a Sparse Distributed Representation (SDR). Because of its attributes of sparsity and distribution, this SDR format explains the incomparable efficiency of the brain.
But what are theories without validation? When applied to text, this SDR approach delivers astonishing results, both in terms of precision and efficiency. AI language models based on SDRs need a fraction of the training data required by current state-of-the-art approaches like BERT and GPT-3. And these models are not quarantined to laboratory experimentations: they are already implemented in business solutions that accurately and efficiently process high volumes of unstructured text data.
Why has the AI world not embraced this revolutionary approach yet? Well, after having invested billions of dollars in neural networks and data centric models, it is difficult to recognize one has come to a dead end.
The good news is: paradigm changes always take time, but there is no way around them.