Robert Roe explores efforts to diversify the HPC processor market
With the arrival of Arm and now the reintroduction of AMD to HPC, there are signs of new life in an HPC processor market that has been dominated by Intel Xeon processors for a number of years.
AMD has generated a fair amount of interest since it has demonstrated performance that, it hopes, can rival Intel’s Skylake processors. While Arm has been quietly building steam in the HPC space, it has now begun to announce large- scale projects such as the Riken, Post K computer. Arm has already delivered a production-scale cluster to a collaboration of universities in the UK. GW4 is a combination of Bath, Bristol, Cardiff and Exeter universities, which will all share the Arm system, which is delivered in partnership with Cray.
Arm, with its partners such as Cavium and Qualcomm, and other competitors such as IBM or AMD, sees potential to disrupt the server market – which is currently dominated by Intel. In the last Top500 list, published in November, which aims to rank the most powerful supercomputers, a total of 471 systems used Intel processors. This represents 94.2 per cent of the total 500 systems.
While this is not necessarily indicative of the whole server market, it is fair to say that Intel has retained significant market share for a number of years. While the HPC community has continued to buy Intel processors, less power-hungry CPUs – such as those developed by Arm – are becoming increasingly attractive to HPC users.
Even if energy efficiency is taken out of the equation there is still the notion that increased competition in the server CPU space would ultimately lead to more innovation and better prices or performance for end users.
Greg Gibby, senior product manager of data center products at AMD, notes that when AMD first decided to get back into the server market they were not specifically designing a chip for HPC. ‘We wanted to go make a platform that would be good for both integer and floating point workloads. We knew we had to have a high performance core and so when we looked at designing the Zen core we looked at some new things that we had not done in previous generations.’
Gibby explained that the company wanted to ensure it could get all the possible performance out of the core. ‘We needed to make sure that it wasn’t just increasing the IPC (instructions per cycle), it was making sure that we could get data into those cores so they could do useful work.’
‘At the core level we started with a Micro-Op Cache, which allows us to use pre-fetch algorithms and branch prediction algorithms to go off and bring in data and have it available and waiting to reduce idle cycles in the cores.’
‘Another thing that we have done that is a bit different from our competitors is that we have separate integer and floating point engines. We have two dedicated floating point engines within the core itself,’ Gibby continued.
Each company has its own strategy for success in the server market but, as most server or HPC CPU developers have kept their cards close to their chests, Arm has chosen to approach the market with a very different business model.
Arm does not make processors; the company develops an architecture that it then licenses out to specific partners who develop silicon based on the companies standard architecture. This allows Arm to retain control over the development while encouraging its partners to compete with other developers.
This has means that Arm’s partners, companies such as such as Cavium and Qualcomm, with different backgrounds and expertise, are helping to drive forward the ecosystem with their own IP and innovations.
Darren Cepulis, principal architect, server and HPC, noted: ‘We like to enable an overall ecosystem so we work with a variety of silicon partners – be it Cavium, with their ThunderX2 which they were showing at SC17, or Qualcomm, with their Centriq.
‘We work with silicon partners and the open source software community to drive a software ecosystem in support of the Arm architecture and enabling optimisations for the Arm IP, and its partners IP for a particular market vertical such as HPC.’
This approach means that you must share success with select partners but you are also limiting the initial risk while also cultivating an open source software ecosystem. If successful, this kind of software ecosystem can help to spur further application development, because of the reduced costs.
To generate the initial interest, Arm went one step further by engaging with key application developers and researchers. ‘We also aim to engage with strategic end-customers to understand what the workloads are, and what they are looking for both in terms of hardware and software; how we can collaborate with them often through a research Arm within the company,’ commented Cepulis.
While AMD was designing the next generation of its server-based CPU line, it took clear steps to design a processor that could meet the demands of modern workloads. Gibby noted that the CPU was not just designed to increase floating point performance as there were key bottlenecks that the company identified, such as memory bandwidth, that needed to be addressed.
‘Memory bandwidth was one of the key topics we looked at, so we put in eight memory channels on each socket,’ said Gibby. ‘So in a dual socket system you have 16 channels of memory, which gives really good memory bandwidth to keep the data moving in and out of the core.’
‘The other thing is on the I/O side. When you look at HPC specifically you are looking at clusters with a lot of dependency on interconnects, whether it be InfinBand or some other fabric. A lot of the time you have GPU acceleration in there as well so we wanted to make sure that we had the I/O bandwidth to support this.’
‘Again this wasn’t specific for HPC; we wanted to make this for both integer and floating point but what really came out of that is that the high-performance solution has done really well on the benchmarking that we have done,’ Gibby concluded.
While there are several rivals to Intel’s crown as the major CPU supplier for HPC, the company has its own ideas on how to diversify the server market. With the acquisition of Altera in 2015, Intel acquired one of the largest FPGA developers. While the acquisition was focused at the development of products for ‘edge computing’, the IoT Intel’s Programmable Solutions Group (PSG) has developed a PCIe based accelerator that it expects will be successful in HPC and other markets.
Mike Strickland, director, solutions architect, Intel Programmable Solutions Group, explained that ‘one of the key enablers is this Intel programmable accelerator card that we announced a few months ago.’
It is going to be an Intel-branded, low-profile, PCI express card with an ARIA 10 FPGA so now you can trust the hardware. Beyond that Intel are providing a framework to take care of end-to-end security from the FPGA to the Xeon or virtualisation so that it can support different virtual machine operating systems.’
Strickland noted that he expects this new card will help with adoption of FPGA technology. The company is now working with ‘IP partners to accelerate specific application areas such as bioinformatics or data analytics’.
‘You have got an Intel-branded card and a framework to go with it; we even contributed the driver interface side to the open standards movement. It is on GitHub, its Open Programmable Acceleration Engine (OPAE),’ said Strickland. 'Anybody can use this software for their accelerator cards.’
Here you can also see the effects of trying to generate an ecosystem around the software base that can help to sustain the technology as it matures – much in the way that Nvidia did with CUDA or Arm are attempting with its own open source software development efforts.
‘I would say that we are taking a complete ecosystem approach to this, enabling us to take the hardware risk out,’ noted Strickland.
Strickland explained that Intel PSG’s strategy for increasing adoption of reprogrammable accelerator cards – a PCIe-based FPGA – focuses on making the technology more accessible and easy to use but the company is also aiming to completely hide the FPGA behind APIs for some of its potential use cases.
‘Intel has a lot of frameworks that can completely hide the FPGA so Intel will invest in Spark implementation. What Intel is doing is optimising the instruction set for the Xeon, but now we have this very nice code base that we can use to hide the FPGA underneath,’ said Strickland.
‘We can accelerate compression on Hadoop, and Spark has a deep learning interface called BigDL; we can hide the FPGA underneath and accelerate deep learning.
‘Intel has a deployment tool kit for people doing machine learning on Xeon’s working in Caffe or Tensor Flow environment and again here we can hide the FPGA,’ added Strickland. ‘I did a number of presentations last year for data analytics, and what I realised was “just don’t even say it is an FPGA”’.
‘Intel is introducing a programmable accelerator card and it is going to run your analytics faster and you will not need to change your applications – what could be easier than that?’, concluded Strickland.
Driving new innovation
With the launch of AMD’s new EPYC processor line, the company is trying to ensure that the software ecosystem can be quickly matured and to accelerate this process AMD are working with key ISVs to ensure that their codes are optimised for the EPYC processor line.
‘We are working with the major ISVs out there to have EPYC tested to see how we can further optimise and get the best performance on each of the individual applications as well,’ said Gibby.
‘When you look at some of the major players in fluid dynamics, crash simulation and things like that, we are out talking with those partners and making sure that we have a robust ecosystem.’
Only time will tell which strategy will be the most successful, but what is clear is that there are now a variety of companies developing processing technology for HPC. This is already starting to generate competition in the market.
Ultimately this competition must be a good thing as it will drive new innovation and raise competition – providing more performance and better efficiency to HPC users.