UK and EU governments set timelines for exascale

The pursuit of exascale HPC systems has been a target of the HPC community since the first petaflop system broke into the Top500 in the June 2008 edition of the biannual list of the fastest supercomputers based on the Linpack benchmark.

The Roadrunner system at the US Department of Energy’s Los Alamos National Laboratory (LANL), built by IBM, was the first system to break through the petaflop ceiling with a Linpack score of 1.026 petaflop/s. However, in 2022, the first exascale system was confirmed at US Oak Ridge National Laboratory. Frontier is currently the fastest recorded supercomputer and is driving new opportunities for research with a staggering 1.102 exaflops (equal to 1,102 petaflop/s).

Creating the technology stack for exascale has taken years of innovation beyond the simple iterative improvement of technology. In many cases, new computing architectures, networking systems, accelerators and processors are being constructed and optimised to deliver efficient exascale computing.

The drive towards exascale has often focused on delivering the highest possible raw computational power. The standard measure of exascale has generally been an exaflop or the ability to generate 1018 floating-point operations per second, equal to 1,000 petaflops/s. But this just scratches the surface of what is required to support an exascale system. Scientists require sustained application performance on real scientific codes.

Driving application performance at exascale requires a combination of computational power, I/O and memory bandwidth, and increases in energy efficiency that can make these future systems viable.

Exascale in the UK?

In the most recent budget from the UK Government, the Chancellor, Jeremy Hunt, announced almost £3.5bn to support the government’s ambitions to make the UK a scientific and technological superpower.

The Chancellor also confirmed about £900m in investment into a new exascale supercomputer and a dedicated AI research resource. A statement from the UK Government said: “This funding will provide a significant uplift in the UK’s computing capacity and will allow researchers to understand climate change, power the discovery of new drugs and maximise potential in AI – making breakthroughs that will benefit everyone in society and the economy.”

Furthermore, it states that the “UK will become one of only a handful of countries in the world to host an exascale computer, attracting the best talent and ensuring researchers have access to the best infrastructure in the world”.

The EU has delivered several pre-exascale supercomputers and is gearingup to deliver its own exascale systems in 2024 -2025. In recent years, the UK has lagged behind European countries such as Italy, France and Germany, all of which have delivered multi-petaflop systems in the top 20 supercomputers in the most recent Top500 list of supercomputers.

In the current iteration of the Top500, published in November 2022, the UK’s highest-ranking system is ARCHER2, hosted by the University of Edinburgh jointly with the EPSRC, at #28 on the list. Prior to the delivery of ARCHER2, the UK’s most powerful supercomputer was found at #58 on the Top500 list published in June 2021. This entry is based on the CrayXC40 and is used by the United Kingdom Meteorological Office. The UK Government’s announcement suggests that it has had a significant change of heart with regard to HPC and the value of large-scale computational research for HPC and AI.

From this new £900m investment, it is not yet clear how much will be spent on the exascale system. This new investment is roughly 10 times the cost of the UK’s fastest supercomputer. The contract for ARCHER2, for example, was reported to be worth £79m in 2019. ARCHER2 was reported to be around 11x faster than the previous system ARCHER1. However, any new system close to exascale would be several orders of magnitude faster than anything seen in the UK.

In an interview with Edinburgh University, Professor Mark Parsons, Director of EPCC, the supercomputing centre at the university, commented on the importance of supporting UK research with its own exascale HPC systems: “The US and China already have exascale computers, Germany is fast developing its capabilities, and within a few years, all major economies will have one.

“Building and installing an exascale computer will cost around £500m, but it is essential to maintain the UK’s competitive edge as a science superpower, especially after Brexit. We should be at the vanguard of innovation.”

When asked for a comment regarding the budget announcement and the importance of delivering an exascale system in the UK, Parsons said: “EPCC’s very pleased that the Future of Compute review recommended investment in an exascale supercomputer for the UK.

“This represents the culmination of over a decade’s work and will allow the UK’s computational science community to compete with their international peers”, Parsons continued. “It’s been a very tortuous process, and the Budget 2023 announcement is the first step along a complicated road. However, with a commitment to deliver full exascale by 2026, we’re looking forward to the challenges of installing this system in Edinburgh.”

The exascale computer will be housed at the existing data centre used for Archer2. Parsons also noted that Edinburgh now has 38MW of power to its data centre – 30MW specifically allocated for the exascale system.

The EU unveiled its exascale plans

In January 2023, the European High Performance Computing Joint Undertaking (EuroHPC JU) launched a call for tender to select a vendor for the acquisition, delivery, installation and maintenance of JUPITER, the first European exascale supercomputer.

This system will be the first in Europe to surpass the exaflop threshold. This will have a major impact on European scientific excellence and help pave the way for new possibilities, with larger and more complex scientific models becoming viable for study due to the huge increase in computational power.

The system, which will be installed on the campus of Forschungszentrum Jülich will be owned by the EuroHPC JU and operated by the Jülich Supercomputing Centre (JSC). This new system will support the development of high-precision models of complex systems and help to solve key societal questions regarding, for example, climate change, pandemics, and sustainable energy production, while also enabling the intensive use of artificial intelligence and the analysis of large volumes of data .

JUPITER will be available to serve a wide range of European users, no matter where they are located, in the scientific community, industry, and the public sector. The EuroHPC JU and Germany will jointly manage access to the computing resources of the new machine in proportion to their investment.

The tender document delivered by the EuroJU highlights several key characteristics where this new system should outpace existing HPC systems in the EU. This includes performance/TCO; programmability and usability; versatility; system stability; power and energy efficiency; and computing density.

The estimated total value for this new system call is €273m (about £241m). TheEuroHPC JU will fund 50 per cent of the total cost of the new machine, and the other 50 per cent will be funded in equal parts by the German Federal Ministry of Education and Research (BMBF) and the Ministry of Culture and Science of the State of North Rhine-Westphalia (MKW NRW).

Dr Thomas Eickermann, Deputy Director of the JSC, says: “JUPITER will be based on the dynamic, modular supercomputing architecture, which Forschungszentrum Jülich has developed together with European and international partners in the DEEP projects funded by the European Commission and EuroHPC JU.

“It will be the first system in Europe to surpass the threshold of one exaflop. This next-generation European supercomputer represents a significant technological milestone for Europe and will have a major impact on European scientific excellence. JUPITER will support the development of high-precision models of complex systems and help to solve key challenges facing society, for example, climate change, pandemics, and sustainable energy production, while also enabling the intensive use of artificial intelligence and the analysis of large data volumes,” says Eickermann.

Few things in the world are certain, but here are a couple of cold, hard facts: one, computing power will continue to advance as industry leaders push the envelope of Moore’s Law; and two, the earth is getting hotter and more crowded, as climate change, pollution continues to take affect. The critical question for our generation is: how do we use the momentum of the first trend to delay or reverse the course of the second?

Revered institutes around the world are focusing on this million-dollar question. One such hallowed hall is Spain’s Institute for Cross-Disciplinary Physics and Complex Systems (Instituto de Fisica Interdisciplinar y Sistemas Complejos; abbreviated as IFISC), a joint research facility founded by the University of the Balearic Islands (UIB) and the Spanish National Research Council (CSIC). Located on the island of Mallorca in the Mediterranean Sea, the IFISC is engaged in interdisciplinary research in various fields of study, including quantum technologies, photonics, environmental sciences, biosystems engineering, and sociotechnical systems.

The IFISC is devoting considerable resources to the most pressing issues of the day. Regarding the environment, there is the “SuMaECO” project, which studies the impact of climate change on aquatic plants in the Mediterranean Sea; and the “Xylella” project, which utilizes machine learning to detect Xylella fastidiosa (Xf), a devastating plant pathogen that’s spreading rapidly, especially during the hotter summers caused by global warming.

These projects run into a common hurdle: incredible computing power is needed to carry out high performance computing (HPC), numerical simulations, artificial intelligence (AI) development, and big data management. Not just any servers can do the job. The IFISC needs servers that can satisfy the full gamut of computational needs, because the issues faced by humanity are multifaceted, and there are different approaches to solving the world’s problems.

Parallel computing is vital for both of these projects. In the case of “SuMaECO”, highly detailed models are used to simulate the growth of Posidonia under different circumstances. In the case of “Xylella”, the algorithm must sift through a massive library of images before it can learn how to differentiate between healthy and stricken plants.

Through Sistemas Informáticos Europeos (SIE), a Spanish company specializing in servers, workstations, and other network and communications systems, the IFISC chose GIGABYTE’s advanced server solutions as the best answer to its computational needs. The IFISC built computing clusters with three types of GIGABYTE servers: one type is the 4U 8-node G482-Z54 G-Series GPU Server.

AMD CPUs + High-density NVIDIA GPU Accelerators

One thing that’s undeniable about GIGABYTE servers is their prodigious processing power. In particular, the G482- Z54 can bolster the speed of the CPUs with a dense array of NVIDIA GPU accelerators, which is a key feature in GIGABYTE’s – NVIDIA based GPU Servers.

The G482-Z54 has a 4U chassis that can house up to 8 GPU cards in PCIe Gen 4.0 bandwidth, which has a maximum bandwidth of 64GB/s and is twice as fast as PCIe Gen 3.0. The G482-Z54 is a natural choice for parallel computing, high performance computing (HPC), cloud computing, and many other data-intensive applications.

For its G482-Z54, the IFISC chose dual AMD EPYC Rome 7282 processors with 16 cores and 32 threads per CPU. They supplemented the powerful processors with 6 NVIDIA RTX A6000 GPU cards. This set-up is exactly what the “SuMaECO” and “Xylella” projects needed. The Tensor Float 32 (TF32) precision of NVIDIA RTX A6000 provides up to 5X the training throughput over the previous generation to accelerate AI and data science model training without requiring any code changes. Hardware support for structural sparsity doubles the throughput for inferencing. Now, the IFISC can precisely simulate the growth of the Mediterra-nean’s Posidonia meadows, and an AI model that detects signs of Xylella fastidiosa in satellite photos is in the works.

Following the successful implementation of G482-Z54, GIGABYTE would consider IFISC to deploy the next-generation successor of G482-Z54: the new G493-ZB3-AAP1 GPU supercomputing solution. With its new PCIe Gen5 bandwidth for GPU devices, the new G493-ZB3-AAP1 will further augment computational power for the IFISC research team by adopting the new NVIDIA L40 GPU. The new NVIDIA L40 is powered by the Ada Lovelace architecture and its 4th-gen Tensor cores (with hardware support for structural sparsity and optimized TF32 format), enhanced 16-bit math capabilities (BF16), and 48GB ultra-fast GDDR6 memory.

“At the end of the day, most of us here are scientists, not computer engineers,” says Dr. Pere Colet. “Fortunately, GIGABYTE and SIE were able to understand our specific computational requirements, and offer us the best combination of GIGABYTE servers to solve our problems. We are very happy to be working with them.”

For more GIGABYTE success cases check: https://www.gigabyte.com/Article/Success-Case
For sales inquiries contact: server.grp@gigacomputing.com

Link G482-Z54: https://bit.ly/3JMl9ti

Link G493-ZB3-AAAP1: https://bit.ly/3G2djLj

Link NVIDIA RTX™ A6000: https://bit.ly/3KerfTR

Link NVIDIA L40: https://bit.ly/3zlnlmI