With conversations from artificial intelligence moving to the intricacies of the relationship between AI and the environment, it’s becoming increasingly important to understand the challenges and opportunities that AI presents for a sustainable future.
In this blog, Head of Service Architecture, Tristan Watkins, shares his insights on sustainability and AI. Read on as Tristan explores the multifaceted difficulties of integrating AI with sustainable practices, from the ethical considerations of AI deployment to the introduction of Financial Operations (FinOps) practices for more responsible AI usage.
What makes generative AI carbon intensive?
In 2020, OpenAI posited empirical scaling laws for model performance based on, “model size, dataset size, and the amount of compute used for training”. All three of these scale points have been pushing upwards at the bleeding edge of Generative AI capability. This resource hunger is the root of concern about Generative AI working against sustainability goals.
Certainly, we’ve seen these demands put a dent in progress against Microsoft’s ambitious Net Zero goals, as confirmed in their most recent sustainability report. But is this as bleak as we might fear, and what is likely to help in future? How will the size of this dent change, and how can you size/control these impacts?
Efficiency improvements in generative AI
To give a sense of the ‘large’ in ‘large language model’ (LLM), let’s compare the most powerful LLM at the end of 2023 to a small version of one of the most powerful LLMs today:
- GPT-3.5: ~175 billion parameters
- GPT-4o mini: ~8 billion parameters
GPT-4o mini has effectively replaced all uses of GPT-3.5 today (and exceeds its capabilities in every measure) but runs at a considerably smaller scale. This is because lower relevancies can be ‘tailed’ from a dataset of trillions of parameters. The initial larger training run confers more benefits than it loses when you keep everything above that tail, so GPT-4o mini is over 20 times smaller than GPT-3.5. But there have been other improvements introduced by OpenAI like efficiency gains in GPT 3.5-Turbo, which have been inherited by later models.
It would be hard to accurately infer the total efficiency gains achieved, but we can treat the relationship between running costs and energy consumption as a reasonable gauge of what’s been achieved in terms of reducing carbon emissions from any given request.
Sam Altman provided us with a useful comparison relative to Moore’s Law: “You can see this in the token cost from GPT-4 in early 2023 to GPT-4o in mid-2024, where the price per token dropped about 150x in that time period. Moore’s law changed the world at 2x every 18 months; this is unbelievably stronger.”
How resource intensive is generative AI?
It’s important to acknowledge that the resources used to interpret or create text are considerably different for image, audio or video modes. As models have become multi-modal, some forms of generation like speech-to-text, text-to-speech and visual input capabilities consume more resources, and contribute to higher carbon emissions from Generative AI workloads in turn.
Although used less commonly, image and video processing are even bigger consumers during generation because the inputs and outputs are so large relative to text. Mercifully, they carry a high price tag, so they are used less indiscriminately, but these uses do add to the carbon intensity of AI on the whole.
How do we make sense of AI resource consumption in the aggregate?
These remarkable efficiency improvements only address one part of the picture, but they have not been focal enough in discussions about sustainability and AI. Nevertheless, there are still increasing demands in other areas that have dwarfed these gains:
Widespread adoption and increased usage
As generative AI gains capability and reduces in price, it gains a wider user base and more normalised usage. This means that even during inference, these efficiency gains are multiplied by huge factors because there is simply a huge amount of usage.
Increased context windows mean more processing
As context windows have increased with more capable models, it’s become more common to use more of that available size. This means more tokens in and more tokens out, and more in conversation history. This means more resource processing, and higher carbon emissions in turn.
Efficiency improvements
“Shortenings” (also known as Matryoshka Representation Learning, or MRL) have given us control over some of the trade-offs between capability and resource consumption by trimming the less-relevant “long tail” of connections created during embedding (converting text or image into a vector representation). This is the primary way we integrate our own data with Generative AI.
Historically, we created embeddings at a static 1024 dimensions, and we would need to store all 1024. Now we can create embeddings at 1536 dimensions, and store only the most relevant 256 of them (or even fewer), trading an initial computational increase for a longer-term saving across an entire corpus. This improves accuracy while reducing cost and resource consumption, making generative AI capabilities emit less carbon on the whole.
Training an AI model is resource intensive
By far the most resource intensive generative AI usage is training a new model. For the largest foundation models, we know this can require multiple data centres to run in concert for months. This emits an enormous volume of carbon. These needs are so great that they originally compelled OpenAI to partner with Microsoft, when they may not have seemed a natural fit at first glance.
These computational needs are only growing, as we know that we have not yet reached an upper limit of capability gains from scale alone. This was recently proven again with the research preview of GPT-4.5. Although these ever-enlarging compute needs are a real environmental concern, we must keep in mind two significant mitigating factors:
- A larger foundation model is a big part of what imbues a more capable small model like GPT-4o-mini with its improvements. This huge initial training resource demand is a one-time resource cost, whereas the improvements gained from that training are realised across all sizes when used for inference, and over the life of the use of these models, the inferencing resource needs dwarf those of initial training, so these emissions reductions are realised again and again.
- As other training techniques like deliberative alignment have emerged, and curated training data are created (like those sets used to train Microsoft’s Phi models), the pressure to release new, larger foundation models is slowing down somewhat, so these one-off events should be less regular.
Using AI to meet sustainability goals
In some cases, generative AI can be used for sustainability objectives. Consider all the forms of dated, inefficient automation that can be improved with today’s models. As Generative AI was first blooming, Microsoft already identified three areas where it could support sustainability goals: analysis of complex systems, developing new environmental solutions like novel materials, and knowledge management. At the recent Build conference, we’ve even seen Microsoft Discovery’s prototype of a new datacentre coolant.
All told, we need to make sense of this whole picture when we conceptualise how the resource consumption needs of generative AI, and the consequent carbon emissions, are changing over time. We can be confident of three things: further efficiencies are to be expected, demand will continue growing, and these complex factors will continue to morph.
Despite some uncertainties, if we agree there is a loose relationship between Generative AI cost and resource usage we can assess and respond to our own resource demands, and related carbon emissions implications, accordingly.
The relationship between sustainability and AI FinOps
If we agree there is a roughly 1:1 relationship between AI cost and AI carbon emissions at inference time, we can gain comfort that our AI resource usage controls are the same as AI FinOps controls.
In other words, if we already manage resources well throughout their lifecycle, we aren’t doing anything fundamentally different with Generative AI. We may have pressures to provide these resources despite the resource demands, but we must be very clear that sustainability can equate to withdrawn/withheld capability, or it can equate to control and efficiency.
As an example, if you choose to point your retrieval solution at your file server or SharePoint Online, but that contains lots of bad data, you get inflated vector storage costs and less accurate results. Existing operational discipline extends directly to Generative AI, both with the resources it generates on, and its own generative resources. The inherent virtues of governing well, from a data or FinOps perspective, will extend to sustainability objectives.
Find out more about how to get started with FinOps using the ‘crawl, walk, run’ maturity model.
Navigating the complexities of sustainable AI practices, it’s essential to consider how the financial aspects of AI platforms can help you operate more responsibly. Adopting a FinOps mindset can help you to manage the financial impact of AI more effectively, making your AI usage not only more cost-efficient but also more environmentally responsible.
Speak to our experts about how FinOps can help you use AI more efficiently.
FAQs
What is the impact of AI on the environment?
AI has a significant impact on the environment, primarily due to the energy consumption required for training and running AI models. The computational power needed for AI processes leads to high electricity usage, which in turn results in increased carbon emissions. Data centres, where AI computations are performed, contribute to this environmental footprint. As AI models become more complex and require larger context windows, the resource processing and carbon emissions associated with AI also increase.
How can AI contribute to environmental sustainability?
Despite its environmental impact, AI has the potential to contribute positively to environmental sustainability. AI can optimise energy usage, improve resource management and enhance efficiency in various sectors. For example, AI can be used to monitor and reduce energy consumption in buildings, optimise supply chains to minimise waste, and improve agricultural practices to enhance crop yields while reducing environmental impact. By leveraging AI for these applications, we can work towards a more sustainable future.
What is FinOps and how does it relate to AI?
FinOps, short for Financial Operations, is a practice that focuses on managing the financial aspects of cloud computing and AI infrastructure. It involves optimising costs, ensuring efficient resource utilisation, and aligning financial accountability with organisational goals. In the context of AI, FinOps helps organisations manage the financial impact of AI deployments, ensuring that resources are used efficiently and costs are kept under control. This practice is essential for making AI use more responsible and sustainable.
How can FinOps help in making AI use more sustainable?
FinOps can play a crucial role in making AI use more sustainable by optimising the financial and resource aspects of AI infrastructure. By implementing FinOps practices, you can ensure that AI resources are used efficiently, reducing unnecessary energy consumption and associated carbon emissions. Through FinOps, you can better balance AI capabilities with compute demands, paving the way for a sustainable future.