Exascale: expect poor performance

Supercomputers deliver only a few percent of their theoretical peak performance so people should reset their expectations of exascale, Tom Wilkie heard at ISC High Performance earlier this month.

Exascale computers are going to deliver only one or two per cent of their theoretical peak performance when they run real applications; and both the people paying for, and the people using, such machines need to have realistic expectations about just how low a percentage of the peak performance they will obtain.

In fact, Jack Dongarra from the University of Tennessee told ISC High Performance in Frankfurt: ‘You’re not going to get anywhere close to peak performance’ on the exascale machines. Speaking on the last day of the full conference, earlier this month, he went on: ‘Some people are shocked by the low percentage of theoretical peak, but I think we all understand that actual applications will not achieve their theoretical peak.’

Dongarra’s judgement was echoed in a workshop session the following day by Professor Michael Resch, director of the HLRS supercomputer in Stuttgart, Germany. The HLRS was not targeting exascale per se, he said, because its focus was on the compute power it could actually deliver to its users.

According to Resch, simple arithmetic meant that if an exascale machine achieved a sustained performance of only 1 to 3 percent, then this would deliver 10 to 30 Petaflops. So buying a 100 Petaflop machine that was 30 per cent efficient – which should be achievable, he claimed – would deliver the same compute power, for a much lower capital cost and about one tenth the energy cost of an exascale machine.

Different benchmarks for different workloads

Dongarra’s stark warning about how poor the performance of supercomputers actually is, compared to the theoretical performance as measured in the Top500 list, came as he and his colleagues were presenting the results of a different benchmark for measuring the performance of supercomputers: the High Performance Conjugate Gradients (HPCG) Benchmark. As had been widely expected, when the Top500 list was announced on the opening day of the conference, Tianhe-2, the supercomputer at China’s National University of Defence Technology had retained its position as the world’s No. 1 system for the fifth consecutive time. The Tianhe-2 also scored first place in the alternative metric for measuring the speed of supercomputers, the HPCG, announced on the last day of the full conference.

However, top place on yet another benchmark, the Graph500, which measures the performance of supercomputers on data-intensive loads, went to a collaboration between Riken, the Tokyo Institute of Technology, University College Dublin, Kyushu University, and Fujitsu for the K computer. Because this metric uses a radically different approach -- graph algorithms – its measure of performance is in ‘billions of edges [of a graph] traversed per second’. It is therefore not possible to read across from the K computer’s score of 38,621.4 billion edges per second on this metric and compare with the metric used in the Top500.

Linpack benchmark

The Top500 bi-annual list uses the widely accepted Linpack benchmark to monitor the performance of the fastest supercomputer systems. Linpack measures how fast a computer will solve a dense n by n system of linear equations.

But, the tasks for which high-performance computers are being used are changing and so future computations may not be done with floating-point arithmetic alone. Consequently, there has been a growing shift in the practical needs of computing vendors and the supercomputing community, leading to the emergence of other benchmarks, as discussed by Adrian Giordani in How do you measure a supercomputer's speed?

Rankings according to the HPCG benchmark

The HPCG (High Performance Conjugate Gradients) Benchmark project is one effort to create a more relevant metric for ranking HPC systems. HPCG is designed to exercise computational and data access patterns that more closely match a broad set of important applications, and to give incentive to computer system designers to invest in capabilities that will have impact on the collective performance of these applications.

The architects of the new measure, Jack Dongarra from the University of Tennessee, Michael Heroux from the US Sandia National Laboratories, and Piotr Luszczek also from the University of Tennesse explained the rationale and the history of the metric. Dongarra’s involvement in developing an alternative to Linpack is significant, as he was the man who introduced the original fixed-size Linpack benchmark and was one of the first contributors to the Top500 list.

Heroux told the meeting that the Linpack measure and HPCG might best be seen as ‘bookends’ of a spectrum: ‘Between the two is the likely speed of an application. The closer the two are, the more balanced the system.’

At the top end, the rankings are very similar to those of the Top500. Riken, which came fourth in the Top500, comes second in the HPCG list, while the US Oak Ridge National Laboratory’s ‘Titan’ Cray machine, which was second in the Top500, is third in this metric. However, SuperMUC from the Leibniz Rechenzentrum in Germany advances from 20^th place in the Top500 to ninth.

The UK’s Archer machine at the EPCC in Edinburgh, which is also a Cray, moves from position 34 in the Top500 to tenth place under the HPCG method of ranking performance. Of the first ten, it achieves under HPCG the highest percentage of the peak performance as measured by Linpack. Even so, this is just under five percent.

Placings in the Graph500 list

Tianhe-2 did not get the top place in the Graph500 rankings, but still managed a very respectable sixth place. After Riken’s K computer came the US Lawrence Livermore’s Sequoia and then Mira from the Argonne National Laboratory. Juqueen at the Forschungszentrum Juelich (FZJ) in Germany came fourth.

The Graph500 list has been going since 2010 and is intended to provide a new set of benchmarks to guide the design of hardware architectures and software for data intensive applications. Run by a steering committee of more than 50 international HPC experts from academia, industry, and national laboratories, it uses graph algorithms that are a core part of many analytics workloads. But the aim is to go further and specifically address five graph-related business areas: cybersecurity; medical informatics; data enrichment; social networks; and symbolic networks.

Other reports from ISC High-Performance have explored Why do smaller companies shun HPC? together with ways of Easing access to HPC for the SME and have described how A portal opens to German HPC centres.and asked Does the path to HPC for SMEs lie in the Cloud?

Robert Roe offers a respite from policy-related issues by examining how Computer processors evolve to fit new data intensive niches –a look at new developments in processor technologies on display at ISC High Performance while Tom Wilkie reported on US collaboration on communications and co-design on applications.

Exascale: expect poor performance

Editor's picks

The 2026 storage survey: strategies for AI and data-intensive research

NEW On-Demand | Ontologies - the missing foundation for AI in drug discovery

On-Demand | One workflow, every tool: how AI-native ELN is changing drug discovery

On Demand: Free Online Panel Discussion | LIMS innovation boosts precision and security

The path to AI federated learning for drug discovery

Workstations vs Clusters for Ansys Applications

Avoid Duplication, Reduce Fragmentation | Integrated Informatics for Scientific Research