John Barr asks what standard approach can the industry agree on to make next-generation HPC systems easier to program?
When the architecture of high-performance computing (HPC) systems changes, the tools and programming paradigms used to develop applications may also have to change. We have seen several such evolutions in recent decades, including the introduction of multiprocessors, the use of heterogeneous processors to accelerate applications, vector processors, and cluster computing.
These changes, while providing the potential for delivering higher performance at lower price points, give the HPC industry a big headache. That headache is brought on by the need for application portability, which in turn leverages standard development tools that support a range of platforms. Without software standards supporting emerging platforms, independent software vendors (ISVs) are slow to target these new systems, and without a broad base of software the industry pauses while the software catches up.
An example of this was the transition from complex, expensive shared memory systems to proprietary distributed memory systems, and then to clusters of low cost commodity servers. When each new system had its own message passing library, ISVs were reluctant to port major applications. It was only when MPI was widely accepted as the standard message passing library that the body of applications available for clusters started to grow, and cluster computing became the norm in mainstream HPC.
So the burning question is: What standard approach can the industry agree on to make next generation HPC systems easier to program, and therefore more attractive for ISVs to support?
One of the main drivers behind architectures for exascale systems is the need to reduce power consumption, not just a bit, but by orders of magnitude. To achieve this we can’t just do more of the same but bigger and faster as has often been the case in the past when moving from one generation of system to the next – we need a different approach. And that approach will involve heterogeneity. That is, systems will have two types of processors. Mainstream processors that handle processing that cannot be highly parallelised (most likely x86 based, but possibly ARM), and low power, low clock-speed, massively parallel number crunching chips.
The two most popular application accelerators today are Intel’s Xeon Phi and Nvidia GPUs. Both Intel and Nvidia appreciate the need for software and the need for development tools. Intel has invested heavily in its internal development tools teams and in buying companies with interesting compiler technology, while Nvidia has built a large community that uses and supports Cuda – utilising both third-party tools and the Nvidia Cuda compiler.
While this is good, it puts ISVs in a similar position to the one they were in during the emergence of distributed memory systems; where a single application port cannot target all of the emerging mainstream HPC systems.
Although much of the excitement surrounding next-generation HPC systems is driven by the expectation of reaching exascale, it is actually far more important that the path towards exascale systems will produce single chassis, affordable petascale systems.
The options for programming next-generation HPC systems are likely to be MPI plus one of the following:
OpenMP extensions to C, C++ and Fortran support shared memory parallel processing;
OpenACC adds the ability to write applications for heterogeneous systems to OpenMP through annotations, and is supported by CAPS, Cray and PGI;
Cuda is a parallel computing and programming model developed by Nvidia, which has produced a Cuda C compiler, while PGI also sells a Cuda Fortran compiler;
OpenCL is a framework for developing applications on heterogeneous systems that is based on C99. It is maintained by Khronos Group, and is supported by many companies including ARM, Intel and Nvidia.
Nvidia has made a good success of Cuda, but it is proprietary. OpenACC is promising, but immature, while OpenCL has shown good, portable results, but not everyone supports it enthusiastically. Intel has many options for the developers of parallel applications – some say that the range of options is confusing – while Intel’s insistence that all you need do is add the -mic flag to Xeon codes to have them run well on Xeon Phi is unhelpful.
If Intel and Nvidia would give their wholehearted support to a single option (OpenCL or OpenACC would be good), that would make code owners more willing to embrace the accelerator model, and would grow the size of the market quickly. Intel and Nvidia (and perhaps AMD GPUs) could then slug it out in a larger market – which would be good for all of the vendors, and third party code owners, and users.
A related but longer term issue is the software requirements of exascale systems. If it turns out that MPI plus OpenACC (or OpenMP with new accelerator capabilities or whatever) is the popular choice over the next few years, will that really cope with the needs of exascale systems (including handling resiliency at massive scale), or will yet another approach be required (such as PGAS, i.e. Partitioned Global Address Space languages)?
One of the first compiler teams to work on heterogeneous architectures for HPC was at Floating Point Systems during the 1980s, where its in-house team produced compilers for the FPS 164 and 264. (Coincidentally, at the same time this author was working for UK-based company System Software Factors on compilers for the FPS AP120B and FPS5000 under contract to FPS.) The core of the FPS compiler team went on to form PGI in 1989, which was bought by STMicroelectronics in 2000. The company has continued to operate independently, and is a leading supplier of compilers to the HPC industry.
While I was undertaking research for this piece it was announced that Nvidia had bought PGI. How will this affect the evolution of development tools for accelerated computing? Bringing PGI’s skills and product in-house shows that Nvidia understands that without good software support its cool hardware products may have a limited impact. But with Nvidia and PGI being two of the proponents of OpenACC, will this start to look proprietary rather than an open standard? And where does this leave PGU’s support for Intel’s Xeon Phi – the major competitor to Nvidia’s GPUs in the HPC accelerator space?
What route should software developers take in preparing their applications for emerging HPC platforms – Cuda, OpenMP, OpenACC, OpenCL or should they use Intel compiler tools or extensions?
I don’t know the answer to this question, but unless the HPC industry reaches a conclusion reasonably quickly, there will be a pause in the development of applications for accelerated computing platforms that will harm all stakeholders in the HPC industry.
With more than 30 years of experience in the IT industry, initially writing compilers and development tools for HPC platforms, John Barr is an independent HPC industry analyst specialising in the technology transitions towards exascale.