Today, all large AI companies are placing their bets on a brute force approach. Yet throwing huge amounts of data at machine learning algorithms and deploying massive processing power is neither efficient nor future-proof. AI needs to get much smarter and by magnitudes more efficient if we want to avoid another winter.
I am sure you all have seen charts comparing the energy consumption of Bitcoin to the one of smaller nations. Currently, transactions around this most popular blockchain project require more energy than Greece, a country of 10 million people. The outcry on social media is huge, and rightly so, in times where Russia’s attack on Ukraine is creating much uncertainty on the energy markets and the climate crisis demands using energy sustainably.
To a certain extent, the AI industry can be thankful for the distraction Bitcoin is providing. Its energy balance might be even worse. Research carried out by the University of Massachusetts, Amherst, indicates that “training a single AI model can emit as much carbon as five cars in their lifetimes”. That is one model. The MIT article reports that “final, paper-worthy models require training almost 5.000 models in total”. Now do the math on climate impact.
The race for larger AI models goes hand in hand with a race for more computing power – leading to the creation of more and more powerful supercomputers. These machines do not only need a lot of space, but also require millions of gallons of water to cool down and consume tremendous quantities of power: According to an article in Nature from 2018, data centers use an estimated 200-terawatt hours (TWh) each year. It will most probably just have gotten worse since then.
Yes, advances made by OpenAI with GPT-3 or by Google with BERT are impressive – but their approach is not one we can sustain. So what now? The modern dream of creating “intelligent” machines has been around since the 1950s. We have seen phases of progress followed by AI winters when we met dead ends. Is the current brute force attack leading us into another AI winter?
Short answer: Yes. We need a different approach to building and training AI models in the age of sustainability. One that is not just smart but also intelligent, like evolution always ends up being. This means that we should, this time, design learning systems that actually work like the human brain and not just like its naive over-simplification.
An inspiration for how this paradigm shift could look like can be found in the history of Big Pharma: 50 years ago, developing new medication was not unlike today’s approach to AI: Pharmaceutical companies where testing the influence of vast amounts of plant samples, gathered in the rainforests of the world, on a manifold of different pathologies like blood pressure, cholesterol levels, infections, inflammations and even cancer cells, to find out which natural molecule might produce a statistically relevant improvement of the condition. Researchers were used to “brute force” medical progress, often without necessarily understanding the theoretical functioning of the underlying biological mechanisms. Today, molecular biology has explicitly clarified most of the important metabolic pathways and relevant cellular receptors to an unimaginable level of detail, allowing modern pharmaceutical research to replace the millions of cross-matching experiments with molecular CAD systems able to model, simulate and synthesize any substance, efficiently targeting a specific clinical goal by design.
Over the last decades, neuroscience has produced evidence for a much better understanding of the principles behind the human brain. No, we still cannot map it out in all details, and we have not understood all the fine mechanics, but by now we know enough about the computational principles of the neocortex to try to create a better AI. Why should we? Because the brain merely needs 20 watts (!!!) to outperform the best AI models.
So far, the efforts for producing useful AI models were mainly focused on improving their precision compared to human performance. But whoever tried to practically implement such an AI System will soon have found out that the actual limiting factor is rather efficiency than accuracy. So the question we need to ask is: can we actually afford the required precision given its energy, training data sourcing and computation costs?
Cortical.io is working on just that – better, more efficient (in all ways) natural language understanding (NLU) models inspired by actual neuroscience. We turned our first breakthroughs into business products – but there is still a whole universe of neuro-semantics to explore, to transfer to the AI community, and a long path ahead to educate the market and to generalize the use of efficient semantic models for sustainable language-AI software.
Where we are at today:
We have formulated the “Semantic Folding” theory, a machine learning methodology for creating semantic models using unsupervised training on very limited amounts of reference data. The methodology tries to functionally copy the computational principles discovered in the human neocortex. Semantic Folding introduces a new way of representing information based on sparse distributed representations (SDR), called a Semantic Fingerprint. Semantic fingerprints capture the semantics (actual meanings) of words, sentences and paragraphs in context and enable to reach high levels of accuracy and efficiency when applying computational operators on text.
We have created a product based on Semantic Folding called “SemanticPro” which showcases the quality of our approach for intelligent document processing. SemanticPro can analyze high volumes of messages and complex documents in a very similar way to humans – but incomparably quicker. Due to the underlying technology, SemanticPro requires an order of magnitude less training data than other deep learning solutions. 100 reference documents, e.g. contracts, data sheets or emails are sufficient to train SemanticPro on a new use case. This product is already live in large enterprises around the world.
We are working on combining our Semantic Folding-based algorithms with dedicated high-performance hardware in order to speed up the processing of large volumes of text by orders of magnitude, such reducing the computing resources needed to perform intelligent document processing at scale.
Semantic Folding can be applied to any language. Given the low amount of training data needed, models can be developed in equal quality for languages used by smaller speaker communities.
These are first steps that need to find an echo in the AI ecosystem to truly make a difference.
There are so many areas where consumers and businesses alike will benefit from efficient NLU models – like machine translation and speech recognition, to name just two areas where current solutions consume way too much computing resources while delivering mitigated results (or are you happy with the way Siri and Alexa understand your requests?). Given the vast amounts of data needed – again, brute force approaches are based on millions or billions of samples and still require thousands of samples to fine tune for any given application – and the even vaster energy demand, mass adoption cannot and must not be tried with current brute force approaches. The AI community needs to embrace more efficient approaches like Semantic Folding. If we continue down the current path, an AI winter might not even be one of our biggest problems.