Expect Exascale

Share this on social media:

Robert Roe looks at advances in exascale computing and the impact of AI on HPC development

In March it was announced that a new exascale system was on the horizon, with the news that Aurora – originally scheduled for 2018 – was going to be the first exascale system in the US, launching in 2021. With plans for exascale systems now firmly in place, we can begin to see the architecture of the first exascale systems.

In most cases they rely heavily on accelerator technologies that are not only driving exascale performance, but also ushering in a new age of AI-based computing applications and analytics.

Across the pond the EU is banding together to drive European efforts in HPC development. Projects such as EuroEXA and The EuroHPC Joint Undertaking, funded by government grants and partnerships between EU countries, commercial and research organisations are hoping to develop home-grown HPC technology.

EuroHPC is an initiative to fund a world-class European supercomputing infrastructure to 2026 to meet the demands of European research and industry, with a jointly-funded budget of around €1.4 billion, provided by the EU, participating countries and private partners.

The EuroHPC Joint Undertaking has launched a funding call to host at least two pre-exascale systems.

Kimmo Koski managing director of CSC – IT Center for Science in Finland, is preparing a bid put together by nine EU countries for one of Europe’s two pre-exascale computing systems. Funded partly by EU’s EuroHPC initiative, and partly by the consortuium partners, this is one example of the efforts in Europe to catch up to the US and Asia. The project hopes to deliver a pre-exascale system for the EU by the end of 2020.

The consortium partners working with Finland on this project are Sweden, Denmark, Norway, Belgium, Czech Republic, Estonia, Poland and Switzerland. More countries may stilll join the consortium. The bid aims to put the system in the CSC datacentre in Kajaani, 550 kilometres north of Helsinki.

‘We expect to put together as competitive a bid as possible for this pre-exascale system. The idea is to combine forces and make a joint purchase of a high-end system, and also do a lot of other things on top of that, like develop competencies in scientific computing and AI,’ said Koski.

‘The expertise covers a lot of areas, there is a lot of experience and we have been able to agree that everybody can use this resource that will be outside their country’s borders,’ Koski added.

GPUs forge the path to exascale

The US Department of Energy recently announced the first exascale supercomputer and, at first glance, it does not feature GPUs. While we have seen GPU technology dominate the surge in 
the use of accelerators, so it would be strange for this system to not use this technology.

The $500 million contract will be delivered to Argonne National Laboratory by Intel and sub-contractor Cray in 2021. At the time of the announcement, US Secretary of Energy Rick Perry noted: ‘Achieving Exascale is imperative, not only to better the scientific community, but also to better the lives of everyday Americans. Aurora and the next-generation of exascale supercomputers will apply HPC and AI technologies to areas such as cancer research, climate modelling, and veterans’ health treatments. The innovative advancements that will be made with exascale will have an incredibly significant impact on our society.’

Intel has not previously been known for its accelerator technologies. While it did launch the Xeon Phi products in 2016, they were discontinued last year. It was also the Xeon Phi accelerators or coprocessors that were initially planned to be used in Aurora.

It appears that Intel will be using its own discreet GPU in this system, although Intel have not yet confirmed this. The announcement from the DOE and Intel stated that the system will be based on new Intel technologies, designed specifically for the convergence of AI and high-performance computing at extreme scale.

These include a future generation of Intel Xeon Scalable processor, a future generation of Intel Optane DC Persistent Memory, Intel’s Xe compute architecture and Intel’s One API software. Aurora will use Cray’s next-generation Shasta family, which includes Cray’s high-performance, scalable switch fabric, codenamed ‘Slingshot’.

Bob Swan, Intel CEO, said: ‘The convergence of AI and high-performance computing is an enormous opportunity to address some of the world’s biggest challenges and an important catalyst for economic opportunity.’

If the Xe is a discreet GPU that will deliver on AI workloads, then it appears that there could be a rival to Nvidia’s dominant GPU products.

The impact of AI

As the pre-exascale systems of Summit and Sierra in the US and similar systems now planned in Europe have shown, the largest supercomputers in the world rely on GPUs to generate much of the performance.

But the drive towards AI and machine learning has meant development to specialise and optimise GPU technology for these workloads. Now, with the addition of Tensor Cores to the latest Nvidia GPUs, there is dedicated hardware built in to accelerate AI applications.

David Yip, OCF’s HPC and storage business development manager, thinks the rise of GPU technology means that HPC and AI development go hand in hand. The increase in AI provides added benefit to the HPC ecosystem.

‘There is a lot of co-development, AI and HPC are not mutually exclusive. They both need high-speed interconnects and very fast storage. It just so happens that AI functions better on GPUs. HPC has GPUs in abundance, so they mix very well.’

In addition, he also noted that AI is bringing new users to HPC systems who would not typically be using HPC technology. ‘Some of our customers in universities are seeing take-up by departments that were previously non-HPC orientated, because of that interest in AI. English, economics and humanities – they want to use the facilities that our customers have. We see non-traditional HPC users, so in some ways the development of AI has done HPC a service,’ added Yip.

Yip noted that the fastest system on theTop500 in an academic setting is based in Cambridge. ‘It is just over 2.2 Pflops and you have to go back to about 2010 to get that kind of performance at the top of the Top500. ‘It is almost a decade ago, so there is a difference in these very large systems, but we do eventually see this kind of performance come down.’

Koski agreed AI was having an impact on development of HPC, but also noted this was largely positive – leading to more expertise and use of GPU technology.

‘A couple of years ago there was a lot of discussion on how it is difficult to re-programme scientific codes for GPU resources. It might be difficult but now we have this AI boom everywhere. Everybody is interested, so it has created a bigger market for the GPUs. Nvidia and the others have been successful developing those machines, so I think it has changed exascale development because it has made GPU resources much more attractive,’ stated Koski.

Exclude from view: