A team of researchers from Edinburgh University has demonstrated an innovative new ‘wafer-scale’ processor, which it claims enables AIs to run 10x more quickly. Martin Cooper MBCS reports.

A team of researchers in Edinburgh says it has developed a system that enables large language models (LLMs) to run up to 10 times faster than current AI systems. This could mark the beginning of a new generation of ultra‑fast AI infrastructure, according to the University of Edinburgh researchers behind the project.

The advance combines new software with what is believed to be the world’s largest computer chip — a ‘wafer‑scale’ processor about the size of a dinner plate — and could significantly accelerate the speed at which AI systems respond to queries.

Large language models — the technology behind AI chatbots, search tools and automated analysis — typically require powerful specialist chips known as graphics processing units (GPUs) to run day‑to‑day tasks. These include inference, the process through which an already‑trained model analyses fresh information to make predictions or generate responses.

However, the Edinburgh team has shown that wafer‑scale chips, which are far larger and capable of running vast numbers of operations simultaneously, can provide significant performance gains when paired with the right software.

New software for a new class of AI hardware

The software system, called WaferLLM, was explicitly developed to unlock the potential of these unusually large processors. Wafer‑scale chips can contain hundreds of thousands of computing cores and vast on‑chip memory, allowing data to move extremely quickly across the processor rather than between separate chips over a network.

While this architecture is well suited to the intense, parallel calculations that power modern neural networks, it has until now been difficult to use in mainstream AI because it requires software designed from scratch.

Researchers at the University of Edinburgh’s School of Informatics say WaferLLM is intended to bridge that gap by coordinating data movement and simultaneous computations across the enormous chip. The system aims to leverage the processor’s scale, memory and low‑latency communication.

Testing at the UK supercomputing centre

The research was evaluated at EPCC, the UK’s national supercomputing centre based at the University of Edinburgh. The facility operates Europe’s largest cluster of Cerebras Systems’ third‑generation Wafer Scale Engine chips as part of the Edinburgh International Data Facility.

For you

Be part of something bigger, join BCS, The Chartered Institute for IT.

During testing, the team measured how the wafer‑scale processors performed when running several well‑known large language models, including LLaMA and Qwen. They compared the results against a cluster of 16 GPUs — a typical configuration for running such models today.

They reported tenfold improvements in latency — the time taken for the system to respond to a query — alongside gains of up to two times greater energy efficiency when running at scale.

The findings were peer reviewed and presented at the 2025 USENIX Symposium on Operating Systems Design and Implementation (OSDI), a leading international conference on computing systems.

Real‑time intelligence

Dr Luo Mai, lead researcher and a Reader at the University of Edinburgh, said wafer‑scale computing had long shown promise but had been held back by software limitations.

‘Wafer‑scale computing has shown remarkable potential, but software has been the key barrier to putting it to work’, he said. ‘With WaferLLM, we show that the right software design can unlock that potential, delivering real gains in speed and energy efficiency for large language models. This is a step toward a new generation of AI infrastructure — one that can support real‑time intelligence in science, healthcare, education and everyday life.’

Professor Mark Parsons, Director of EPCC and Dean of Research Computing, described the results as ‘groundbreaking’.

‘The Cerebras CS‑3 systems are a unique resource at Edinburgh to allow researchers to explore novel approaches to AI’, he said. ‘Dr Mai’s work is truly groundbreaking and shows how the cost of inference can be massively reduced.’

WaferLLM has been released as open‑source software, with the team hoping that other researchers and developers will build on the approach to design applications that harness wafer‑scale hardware.

A research paper detailing the work is available via the USENIX OSDI 2025 conference website.