Thomas Jackson, Ian Hodgkinson and Steven Lockwood from Loughborough University discuss the spiralling relationship between data and carbon dioxide production. They also offer some food for thought on how digital industries can help cut greenhouse gas emissions.

The UK government aims to decarbonise industry and reduce greenhouse gas (GHG) emissions by two thirds come 2035. However, while ‘digital’ can be a part of the solution, it too can contribute to GHG emissions.

Industrial decarbonisation efforts have often looked to technological innovations to address the issue. For instance, the UK’s Industrial Decarbonisation Strategy outlines a series of critical steps to move away ‘from fossil fuel combustion to low carbon alternatives such as hydrogen and electrification, deploying key technologies such as carbon capture, usage and storage, and supporting industrial sites to maximise their energy and resource efficiency’. While technology and decarbonisation have long been seen to go ‘hand-in-hand’, with climate tech reported to have seen a 3,750% increase in venture capital investment between 2013 and 2019, there has been little attention on the possible dark side that digital practices may play.

Data centres alone are reported to have a carbon footprint that is bigger than the aviation industry. Given the unprecedented levels of digital data being generated (for example, via Internet of Things sensors) many businesses struggle to effectively process and use this data, resulting in poor data reuse, forgotten data, and data duplication that comes at a great cost to carbon emissions.

For net zero to be achieved, organisations need to account for all key contributors of GHG emissions and yet the digital data carbon footprint does not feature in current reporting requirements. Its absence matters since digitalisation processes already account for 4% of global GHG emissions and this is on the rise as data generation grows exponentially, with predictions of it being as high as 180 zettabytes globally by 2025.

Where is data CO₂ found?

Throughout various stages in business operations and supply chains, data is generated in a spectrum, ranging from small quantities of essential, high-impact data (referred to as 'high data CO2') to numerous, relatively insignificant datasets that might be infrequently used or even left untouched (such as scanned and digital documents or data from small-scale sensors).

For you

Be part of something bigger, join BCS, The Chartered Institute for IT.

However, even though these smaller datasets appear inconsequential individually, their collective impact on data-related carbon emissions, known as 'data CO2,' can be substantial. This phenomenon often leads to the emergence of a 'long tail' of diverse data types that collectively contribute to the unwarranted escalation of greenhouse gas emissions during data generation, processing, and storage.

To illustrate, if we consider energy consumption of data (such as storage, processing, etc.) and the number of units (for example the physical environment the CPU, memory, processors are held in, such as a hyper centre, a local data centre, a PC, a mobile device and so forth), four distinct zones are revealed:

  • Zone 1: a small number of huge hyper/cloud data centres; Amazon, Google, Microsoft run over half the largest data centres in the world​. This segment consumes a large amount of centralized energy, though some of this energy will be directly attributable to green energy sources.
  • Zone 2: local data centres run by an operating company rather than leased as in Zone 1, this segment still is large as some companies wish to keep significant data behind their own firewalls for such reasons as security and cost.
  • Zone 3: desktops, PCs, laptops; individuals accessing data in Zone 1 and/or Zone 2 or generating their own data from using replicated data from Zone 1 and Zone 2. GHG emissions are driven by local conditions such as how a country provides energy to businesses or homes.
  • Zone 4: tablets, wearables and phones; very small individual energy use but a massive number of units given the sheer global population size. Again, as in Zone 3, GHG emissions are driven by local conditions such as country energy policies.

The critical next steps

While the volume of greenhouse gas (GHG) emissions from data may not currently rival sectors like construction, the evolving landscape of data creation, capture, and consumption, driven by innovations such as sensor data and unstructured data, suggests an impending shift. To foster sustainable and responsible data management within businesses, several crucial challenges must be addressed:

  • Global standardisation: a consistent approach to measuring data's environmental impact is essential. While some countries and regions have adopted various standards, a unified global standard is lacking.
  • Comprehensive cost considerations: beyond cloud data centres, we must account for GHG emissions associated with on-premise data centres (especially with the rise of hybrid and multi-cloud solutions) and even local devices that replicate and store data. These aspects need to be part of the sustainability equation.
  • Supply chain data sharing: facilitating the exchange of data across the supply chain requires establishing data standards and metadata to maintain data context. This includes addressing issues of data security, data veracity, and the need for a common logical data model.
  • Unstructured data utilisation: the role and potential of unstructured data, such as shipping documents, understanding documents, and certificates of source authenticity, remain unclear. Exploring approaches like Natural Language Processing (NLP) to extract valuable insights from these documents for subsequent analysis could be a valuable avenue to pursue.