The upcoming accelerator war
Since the dawn of computers, scientific users have pushed the performance envelope. Today, HPC systems discover new wonder drugs, map genomes, simulate airplane wings and perform a million other important tasks.
The need for compute cycles is almost unbounded. As demand for power grows, we begin to discover physical and economic limits. The largest supercomputers use several megawatts; enough to power thousands of homes. To reach exascale levels, we need 1,000 times more compute capability than today’s petascale systems. But, since gigawatt supercomputers aren’t feasible, we can’t just take 1,000 times more power to do it. We need more compute per Watt.
Whereas CPUs are general-purpose computers that can run any task, accelerators are more specialised. CPUs can run operating systems, GUIs, scientific calculations or play music. Accelerators exploit the fact that scientific computing is largely number-crunching. They deliver as much number-crunching horsepower as possible within a cost and power budget.
What are the options?
Field Programmable Gate Arrays (FPGAs) offer a powerful way to accelerate HPC operations, consisting of large arrays of silicon building blocks on a chip. These building blocks can be connected in myriad ways to create specialised devices, which perform their tasks well and with low power. However, configuring and programming FPGA-based accelerators requires skills most programmers do not have – so they have had limited impact on the broader HPC market.
FPGA Advantages: Very flexible
FPGA Disadvantages: Very hard to program
Graphics Processing Units (GPUs) were designed to handle massive numbers of polygons in an image, processing them in real time. To do this, GPUs utilise hundreds of simple processing cores in parallel.
In the early 2000s, HPC users realised these GPU cores could be used for HPC workloads. Early attempts at leveraging GPUs for HPC struggled. Having only graphics interfaces available, programmers had to recast their problems, making them look like graphics processing. Much changed in 2006 when Nvidia launched Cuda, a software development kit and interface for non-graphical processing on a GPU.
Because they were developed to run inside desktop PCs, GPUs have always had a limited power budget. This forced designers to add low-power circuitry into their GPU designs. GPUs are roughly 10 times more power-efficient than traditional CPUs. As always, this efficiency comes at a cost. GPUs are far more limited than CPUs in terms of the workloads they can efficiently execute. They are, however, very good at executing sequential streams of SIMD (single instruction, multiple data) operations and work best when the same operation is performed on many data points simultaneously. Many massively parallel HPC workloads fit this model well. However, if unique operations need to be performed on each data point, or if there is a high degree of branching or looping in the code, GPUs do not perform well.
Programming GPUs is much easier now than in the early 2000s, but still more difficult than programming a traditional CPU. To reap the benefits of the GPU architecture, programmers need to decompose their problems into individual kernels that can be applied to large amounts of data in parallel. This is not a skill commonly taught and it takes significant experience to do well.
GPU Advantages: Low cost and low power
GPU Disadvantages: Difficult to program
Intel recently announced its MIC (Many Integrated Core) architecture. MIC aims to achieve high efficiency while retaining conventional CPU programming models. The first commercial MIC release will be in late 2012 or early 2013.
MIC integrates 50 or more processing cores on a single chip, tightly linked with an on-chip mesh network. To achieve greater power efficiency than traditional CPUs, MIC takes a step into the past – MIC cores are a variant of the Pentium processor from the 1990s. Intel combines this low-power core with a wide vector unit, with excellent performance per Watt.
Each of the MIC cores uses the industry-standard x86 instruction set, allowing existing models and compilers to be used instead of specialised tools. While effort is needed to recompile and optimise for MIC, this effort is less than that required for GPUs. Oak Ridge National Laboratory in the US was able to port many scientific codes to run on MIC in less than one day each. They ported some five million lines of code in less than three months. Many of these codes will never be ported to GPUs due to the cost and complexity involved.
MIC Advantages: Low power and simpler programming than other accelerators
MIC Disadvantages: Not yet generally available and not as low power as GPUs
And the winner is?
At Adaptive Computing, we believe accelerators are here to stay and have invested accordingly. Our Moab workload management software supports GPUs as well as the upcoming Intel MIC products, but I am interested to see how this war plays out.
References and Sources