Dr Lydia Heck of Durham University’s Institute for Computational Cosmology explains the research work at COSMA (aka the Cosmology Machine) and the national Distributed Research using Advanced Computing facility (DiRAC), and what has been achieved so far.

Co-author of almost 30 academic papers and instrumental in the work of COSMA and DiRAC, Dr Lydia Heck tells us about her work and career.

How did your relationship with high performance computing begin?

I have worked in high performance computing (HPC) for about 30 years and started as the manager of COSMA (short for the Cosmology Machine) in July 2001. For the first 11 years of COSMA, working through the generations COSMA1 to COSMA4, I was the manager, the system administrator and the user support. In 2012, COSMA5 was installed which is part of DiRAC, which is made up of five installations in four universities: Durham, Edinburgh, Cambridge and Leicester.

In 2016, we received the Blue Wonder Cluster from the Hartree Centre as a gift. We installed it in partnership with the Open Connectivity Foundation and DataDirect Networks and its transfer and reconfiguration was funded by the Science and Technology Facilities Council (STFC). This incarnation became COSMA6, adding more than 8,000 Intel Sandy Bridge cores and 2.5PBytes of storage to the DiRAC@Durham facility.

In 2017, I led the team who procured COSMA7 – a system with 120 TByte of RAM and an i/o system with a theoretical peak performance of 185 GByte/sec write and read speed. The actual performance for the SWIFT application was 180 Gbyte/sec. This shows the speed of the hardware and the effort by clever RSE support. In the same year, I was promoted to technical director HPC at the Institute for Computational Cosmology.

From 2015, I was resource allocation committee secretary, which added to my duties and in 2017, I also took over the duties of the DiRAC technical manager.

In 2018, I led the team who procured an upgrade to COSMA7, doubling the amount of compute power and the RAM and adding another 20 GByte/sec of performance to the i/o system.

In August 2018, I passed the management of COSMA to a new colleague and my duties are now those of the DiRAC technical manager, although I am still involved in COSMA.

Tell us more about DiRAC

DiRAC began life as a loose federation (DiRAC1) in 2010, with about £13M of funding which lead to 13 HPC installations across the UK. DiRAC is run by scientists for their science. DiRAC caters for the HPC needs of astrophysics, cosmology, particle physics and nuclear physics.

In 2011, funds were made available for a more concentrated installation. This involved four institutions: Cambridge (Cambridge HPC and Cosmos), Durham (Institute for Computational Cosmology), Edinburgh (UK Quantum Chromodynamics) and Leicester. We made a bid and five systems were procured and DiRAC became organised with a management structure.

We now have a director, deputy director and project scientist, technical director and deputy technical director in charge of the RSEs and a technical manager. The systems are the Data Intensive System (Cambridge and Leicester), the Extreme Scaling (Edinburgh) and the Memory Intensive System (Durham). Each of these has a service management board with external academic representation and meets every two months, while the programme board meets annually.

Our research highlights are exhibited every year on DiRAC day early in September.

Here’s what the DiRAC facility has achieved so far:

The numerical and computational efforts by the team in Cardiff has helped to identify the first observations of gravitational waves (LIGO).

Particle physics simulations helped to interpret results from the Large Hadron Collider (LHC) where the Higgs Boson was identified. These simulations required millions of central processing unit (CPU) hours.

What are the fundamental differences in architecture between an x86 PC and DiRAC?

Usually the processors used in laptops or standard PCs have fewer cores, less cache (on chip, very fast memory) and a smaller instruction set.

The individual computers are also connected to each other on a very fast network, in our case 100 GBit, whereas on your home network, you would achieve a theoretical peak of a few hundred Mbit, which is 1,000 times slower. Also, the zero-length message rate (when a message transport starts) call latency is considerably shorter than a desktop computer latency, again by about a factor of 1,000, or more.

The programming for an HPC system needs to include instructions of how to pass information from one process to another in one big run and these messages are passed either internally over memory or externally over the very fast network.
In a HPC system, many CPUs will participate in one single run.

The largest run done on the DiRAC@Durham system was a 4096 core run which ran for 47 days non-stop and produced several hundred TBytes of data. This simulation created a universe with galaxies like our Milky Way. Such simulations allow data to be compared with observations, assisting with our ever increasing understanding of our universe.

Such a system also requires large data space optimally connected to the compute nodes via the fast interconnect. Local disk, internal to each node, is usually used only for the operating system.

Where are DiRAC’s bottlenecks?

The operating system has to be aware of the fast network, which is enabled by installing the appropriate drivers. The programming for the many core, many node systems is more complicated and needs to be done explicitly.

The programming languages like C, C++ and Fortran are aware of programming multi-core systems with shared memory (and specific programming instructions have to be used to tell the computer to do that). The compilers will then take these instructions and transform them into machine instructions to carry out these instructions.

To program multi-node systems, a message passing library needs to be used. On the DiRAC system we employ exclusively MPI (message passing interface), for which development started in 1991. Here explicit instructions have to be given to set up a parallel environment, to send and receive messages, broadcast information, etc.

There are commands such as ‘MPI_Init’, ‘MPI_receive’, ‘MPI_send’. You can send synchronous (wait for messages to be received and acknowledged) or asynchronous (send and hope for the best), wait, broadcast and set a barrier. New challenges are created with that, one of which many newbies hit are deadlocks – messages have been sent but do not arrive, because the recipient might not be waiting, prepared to receive, or even know that a message is coming. This is the programmers’ ‘fault’ usually.

Do different scientific jobs, projects or tasks need specific architectures?

Most codes will work on the standard x86 (Intel type) architectures. There are codes, when properly ported, that will work extremely well on GPU (graphics processing unit). However, to port a code to architecture such as GPU, some effort has to be made to exploit their specific architecture, which is very ‘single instruction multiple data’, which means do the same action(s) repeatedly on different numbers.

However, it is important to realise that current x86 architectures are very complicated and to get the best out of them, the instruction sets and features of these architectures have to be exploited optimally. Code development and the devising of new appropriate algorithms which can exploit it is ongoing and forever trying to catch up with the fast development of new technology.

DiRAC is running four x86 clusters, but each of them is running a different Intel Skylake SKU (StockKeeping Unit). DiRAC@Edinburgh is running a total of 35,232 Intel Skylake 4116 processor cores; DiRAC@Durham is running 12,656 Intel Skylake 5120 processors core and 9184 Intel Sandybridge processor cores; DiRAC@Cambridge is running for DiRAC 15,484 Intel Skylake 6142 processor cores and DiRAC@Leicester is running 14,688 Intel Skylake 6140 processor cores. DiRAC@Cambridge also run a Nvidia GPU and an Intel KNL cluster.

We consider the type of jobs which will run on each of our systems and then make the choice of the type of processor – the system layout and configuration is rather different for each of the installations.

Has DiRAC been upgraded or evolved over time?

DiRAC has been upgraded from DiRAC1 in 2010, to DiRAC2 in 2011/12, then in 2016, 2017 and 2018 and even this year 2019. We are currently working on plans for DiRAC3.

What’s the biggest expense in the life of a high performance computer – buying it or powering it? Maintenance? Cooling?

The likes of COSMA7 is using approximately 150 kW more or less continuously throughout the year. At a rate of 10p/kWh, this would be £130k. This is only one part of the Durham installation, which also has COSMA6.

There are different ways of taking the heat away from the systems. In Durham, we have passive and active rear door heat exchangers (RDHx). The passive RDHx rely on the fans of each of the compute nodes to be strong enough to press the hot air through the cooling grilles and the specifically laid out compute rack. The active RDHx have fans which actively extract the hot air from the racks, can be used with compute clusters and also storage racks.

Electricity is also used to cool and power the UPS (un-interruptible power supplies that ensure the system is not affected by general power-dips and brown-outs). The system is also covered by generators that backup the UPS.

All of these components require electricity to function. Adding that electricity to the electricity required to run the computer systems and dividng it by the latter, gives the power utilisation efficiency (PUE) coefficient. In the ideal case, this would be one; however, in practice it is greater. At DiRAC, we aim to keep the PUE as close to one as is physically possible. We invest in optimal cooling solutions and choose UPS which have high efficiency.

How are projects selected for time on DiRAC?

The researchers have to prepare a scientific case to describe the scientific projects they want to model, the resources they require and how they will use physical resource to achieve their goal. They also have to submit a technical case to explain the efficiency of the codes they will run, including details on the compute resources they require, the system, the number of cores per run, the amount of memory per process, the work and data space to store their data. Projects are then selected on scientific merit and technical feasibility. This vital process overseen by STFC.

Is the HPC market a competitive one? Do HPC sites / universities compete to attract academic projects?

There are quite large national and international facilities (Britain: Archer (EPSRC), DiRAC (UKRI-STFC), the Met Office, the Hartree Centre (JASMIN), PRACE (European, France). Individual universities might have their own HPC facilities, which are much smaller, but might serve as springboards to the larger Tier2 and Tier1 facilities. Most of the facilities are used very intensively, as there is no or hardly any research now that does not rely on computing, and a lot of it requires some form of HPC.

Applicants have to work quite hard to be allocated time on the larger facilities, and they have to prove that they can use resources effectively.

Finally, we read a lot about quantum computing. Is it likely to revolutionise HPC? Or will quantum be useful only in certainly (deterministic?) situations?

Quantum computing will come, but it might be another 10 years or so before it will enter the mainstream high-performance market. It will be a challenge to program these systems, like it is with all new novel architectures. There might be problems which are not suited to this architecture and there will be problems which will be able to make full use of it.

What is clear is that a lot of effort will be required to exploit such a system and new algorithms will have to be devised for existing problems to do that.

Currently, there are simulators of quantum computing out there, on which algorithms can be programmed and tested and I believe there are some real quantum computers running.

To prepare for commercial availability, industry and research should and will use the simulators to get to the point where quantum computing will be fully exploitable.

Image credit: ESO/L. Calçada/M. Kornmesser with the image courtesy of Creative Commons.