Accelerator-based computation is progressively seen as a way of increasing HPC performance for a wider variety of jobs, e.g. CFD, Deep Learning, AI, Life Sciences, etc. Previously, the range of jobs was limited, but now it seems to be opening up, as research and experience in these ‘new’ technologies matures.
XHEAD: Is this the dawning of a new age?
The daddy of the accelerators is clearly the GPU. NVIDIA did a great job of creating an ecosystem for development, and its marketing campaign has been relentless (sorry, I don’t discount AMD. However, the adoption of CUDA over OpenCL has been a factor). The Xeon Phi, the other mainstream accelerator coming from Intel is arguably less successful in adoption. However, that might be changing.
Intel's Knights Landing (KNL) Intel’s latest generation of Xeon Phi, is slowly but surely rolling out now, moving to a self-hosting system initially and becoming available in a more familiar add-in card form factor later. Similar to NVIDIA, there is a flavour of the chip, the ‘F’ variant, that includes the interconnect on the die, similar but more integrated into the system. Arguably, the Intel Scalable System framework of which KNL and Omni-path are parts of the same jigsaw offer a more complete solution than the NVIDIA offering.
This is always going to be a David and Goliath story – NVIDIA, OpenPOWER Foundation et al. and Intel story. There are always going to be pros and cons for each camp. NVIDIA has done a great job of supporting CUDA over the last decade. Yes, you read that correctly; CUDA has been around for nearly ten years! They have created an ecosystem of support, research, and development that they should be proud of; the teaching centres, research institutions, and GPU centres, building on the base of GeForce gaming cards.
Every gamer on the planet who has an NVIDIA GPU has access to HPC resources that 20 years ago, could only be fulfilled by Field Programmable Gate Arrays (FPGAs). Intel, on the other hand, has an ecosystem that practically all mainstream programmers have used – Intel compiler tools. Intel has been intelligent enough to reuse this vast array of experience and building upon the x86 legacy in KNL, programmers already familiar with (specifically) HPC have less of a ‘journey’ in programming for the KNL system.
Intel can bridge the gap, by providing the complete ecosystem for HPC – this is what they are doing. However, they need to do more in developing the momentum behind the technology – developers need to be wooed away from NVIDIA and mainstream applications need to be ported to take advantage.
I haven’t forgotten to mention FPGAs, if GPUs are the daddy of accelerators, FPGAs are the granddaddy of them all. Intel’s acquisition of Altera will put the cat amongst the pigeons, as arguably, they are as powerful as GPUs and consume less power than GPUs – but the catch here is the development environment. They have long been seen as different to program, but can Intel bring its development platform skills to this arena?
One of the main drawbacks of any accelerator application is the bottleneck of actually getting the application data onto the GPU/Phi/etc. NVlink goes some way to solve this problem. Traditionally, data has to come over the PCIe bus to the accelerator, at 32 GB/s, NVlink provides 80 GB/s more between the GPU and CPU than the traditional PCIe. OK, not quite the 115 GB/s between the POWER8 socket and the DDR4 memory, but getting there (who knows what NVlink 2.0 will be?)
This performance allows faster communication between the GPU and CPU, increasing the ability of the GPU to work on more data and push the results back to CPU. We have already seen the memory bandwidth performance of the POWER8 system lead to better performance for memory intensive applications over x86; the addition of NVlink can only lead to further performance gains over a comparable x86 system.
Not forgetting Coherent Accelerator Processor Interface (CAPI). Also part of the Power platform, allows accelerators to connect to the memory subsystems and underlying architecture of the POWER8 system, much like NVlink. CAPI has been extensively used for FPGA integration into the Power platform.
Intel’s approach of putting the Omni path connector on to the KNL die seems to be the logical and, in a way, a similar approach to the NVlink but on a more expansive scale – as you will be able to extend the reach further with the use of Omnipath Switches.
With such disparate technologies aiming at the same goal, it only breeds competition, so, is the dominance of Intel at an end?
One can never discount ARM in the dawning of a new age. The train seems to have slowed slightly, but with the Japanese building a Post-K supercomputer based on ARM (I like to call ‘Super K J’), it will be stepping up again. Many manufacturers are talking about ARM especially since the introduction of 64-bit support. Will it be a matter of time before someone introduces an NVlink ARM system?
Together with IBM’s POWER8 systems, there are a whole host of systems that are gaining in popularity. With the increased popularity of accelerator based computation and adoption of NVlink and OmniPath, the future is certainly looking bright for HPC performance. It will be fascinating to watch the progression of ARM in the market over the next few months and the titanic battle of Intel and NVIDIA/OpenPOWER et al.
David Yip is the HPC and Storage business development manager at OCF.