George Gerring, River Deep Mountain AI Programme Lead, and Chris Dawson, R&D Lead at Xylem, speak to Grant Powell MBCS about an innovative project to analyse water pollution with a view to improving water quality and benefit public health.
The River Deep Mountain AI project is a winner of the Ofwat Innovation Fund’s fourth Water Breakthrough Challenge. The project focuses on leveraging AI and machine learning techniques to develop open source, scalable, digital models that track river health trends and patterns associated with point and diffuse sources of pollution. It is being led by Northumbrian Water, and supported by a number of partners: ADAS, Anglian Water, Cognizant, Dŵr Cymru Welsh Water, Northern Ireland Water, South West Water, Stream, The Rivers Trust, Tidal, Google LLC, Uisce Éireann, Water Research Centre Limited, Wessex Water, and Xylem Inc.
The project aims to address two priority challenges: addressing outdated and fragmented processes for monitoring and understanding river health, and bringing together siloed and often underutilised data through the integration of datasets to provide a more holistic view. With only 16% of England’s surface water bodies currently at good ecological status, it’s clear that ‘business as usual’ isn’t working, necessitating a new and innovative approach, as George and Chris explain.
Can you introduce the project and its purpose?
Chris: This project was born at the Northumbrian Water Innovation Festival in 2023. We launched it in 2024 and showcased it again in 2025. There’s a real appetite for this kind of work, especially among younger professionals who want to be part of the solution but worry about being left behind. We’re not just trying to improve monitoring, we’re trying to rethink it. We started with a discovery phase, interviewing practitioners to understand their pain points. From that, we identified three priority areas:
- Modelling high-priority pollutants like phosphate
- Designing optimised monitoring plans to reduce uncertainty and improve investment decisions
- Getting more value from continuous monitoring, shifting from asset based to catchment-based insights
George: We’re also facing a regulatory shift. With Section 82 of the Environment Act detailing very specific requirements, we’re looking at the introduction of potentially over a billion new data points per year from tens of thousands of monitors. That’s not something humans can process efficiently. So, we asked: can AI and machine learning help us detect patterns and draw insights from this data?
Can you elaborate on how machine learning and remote sensing were used?
Chris: Our approach is systems based rather than point based. Traditionally, monitoring focuses on specific pollutants at specific locations. We wanted to build general models that could be applied anywhere, whether it’s a river in Essex or Yorkshire. The models don’t care what the pollutant is; they care about how parameters behave together.
We’ve also built a translation layer between data scientists and end users, which could be wastewater operations teams, citizen scientists (members of the public without formal scientific training who volunteer to help with scientific research), and/or regulators. It’s about making complex models interpretable and useful.
George: It’s also very important to mention that we’ve kept everything open source. That’s a key differentiator. Anyone can access the code behind the models, see how they work, and re-create or adapt them for their own purposes. It’s not an impenetrable black box. That transparency builds trust, especially in a sector where decisions impact public health and ecosystems.
Chris: And we’ve used new ways to collect novel data. Remote-controlled drone boats are utilised to go out into rivers and gather environmental readings that feed into our machine learning models. It’s a way of expanding the data landscape beyond traditional methods.
What kinds of data did you gather or repurpose for the models?
Chris: Initially, we thought we’d focus on a few pilot catchments. But we quickly realised that environmental data is hard to collect at scale. So, we shifted to a national-level approach, ingesting datasets from the Environment Agency, Natural Resources Wales, and the Scottish Environmental Protection Agency.
For you
Be part of something bigger, join BCS, The Chartered Institute for IT.
This gave us broader coverage and ensured our models weren’t overfitted to specific locations. Think of England as one big catchment. Then, if someone wants to run the model locally, they can input finer-resolution data.
George: We also learned that just because there’s a lot of data doesn’t mean it’s all usable. For example, phosphate data is often sparse and infrequent, yet big decisions are made based on it. We built models that estimate orthophosphate using proxy parameters — hundreds or thousands of water quality and environmental indicators that are routinely monitored.
There are three types of data we use:
- Training data to build the model
- Validation data the model hasn’t seen before
- Benchmarking data to compare performance
Chris: We also worked with citizen scientists, building kits and distributing them to volunteers across 15 sites. They collected phosphate data alongside existing continuous monitoring, which helped us validate our models and test the efficacy of citizen science methods.
George: To explain the data that we use, about 80% is archived national data, and 20% is newly acquired data. That mix helps ensure diversity and relevance across different geographies.
What outcomes or insights have emerged from the project, and how are stakeholders responding?
George: One of the biggest outcomes is demonstrating the art of the possible — how regulatory, water company and citizen science data can be brought together to generate new insights. AI can be regarded as the vehicle, but the real value is in the integration.
Chris: Stakeholder response has been incredibly positive, especially to the open-source aspect. Even if we gave the models away for free but kept them closed, there would be mistrust. People want to see how the model got to an answer, especially when investment decisions are riding on it, so the transparency that open source creates is hugely important to foster trust and encourage continued collaboration.
George: We’ve worked closely with the Rivers Trust and Stream to engage stakeholders. These organisations already have trust and national coverage, which helps us reach end users effectively.
Chris: One exciting development is around real-time event detection. Section 82 means that we can expect 30,000 new monitors to be streaming data live to public dashboards. We’re wrapping AI agents around that data to classify events, whether it’s a pollution incident or just rainfall. It’s like a guardian angel watching the rivers in real time.
George: We’re currently in the second build phase of the models, validating them against unseen data. Final models will be published from mid-November. We’re already planning follow-on phases – ‘Open Catchment Intelligence’ - which will combine our risk models with citizen science and open data advocacy to create a usable web-based tool and ‘Project 82’ that utilises the advanced capabilities of our AI model combined with cross-sector expertise to analyse, cluster and label anomalies in continuous water quality data.
What skills or collaborations have been most critical to the success of the project?
Chris: Cross-disciplinary education was key. Early on we did internal training sessions to help everyone understand each other’s domains and key areas of focus, whether that was wastewater networks or machine learning techniques. Our core team includes data scientists, water quality experts, UX designers, sensor technologists, and a wealth of other tech roles, so we had to ensure that we could learn to speak a common language.
George: Upskilling is important, but collaboration is even more so. Not every organisation has the capacity to hire data scientists. That’s where partnerships come in. The Environment Agency, for example, doesn’t need to build everything in-house if it can collaborate effectively with our partnership. We also established a Technical Advisory Group (TAG), and TAG Academy, which is an initiative for early-career professionals to get mentorship and hands-on experience with the models. They help us develop use cases and take the models back into their organisations.
Chris: It also gives us advocates inside those organisations — people who are already familiar with the models and can help drive adoption.
George: And it’s about maintaining momentum. Interest doesn’t end when the project ends. We’re exploring how to sustain that community of practice beyond the funding lifecycle. You don’t need to be a data scientist to collaborate in this space. But understanding the possibilities and limitations of data science helps foster a better exchange of ideas.
The challenges are too vast for any one organisation to solve alone. That’s why we’ve worked openly and transparently, sharing what’s worked and what hasn’t. It’s as important to understand the failures as it is to celebrate the successes.
Take it further
Interested in this and similar topics? Explore BCS' books and courses: