Compilers keep up with HPC systems evolution
Twenty years ago, the ‘Attack of the killer microprocessors’ was in full swing. Vector and custom processors for high-performance computing (HPC) systems, which up until then comprised almost 90 per cent of the Top500 list, were in decline. Because they used expensive and relatively exclusive processor technologies, vector systems were out of reach for many computational researchers.
Change came in the form of new and powerful microprocessor-based RISC/UNIX systems available from multiple vendors. PGI was the de facto compiler supplier for the Intel i860 CPU, one of those killer micros. We honed auto-vectorisation technology for i860 CPUs and Fujitsu µVP co-processors by comparing our code generation to Cray C90 compilers. Unfortunately, the i860 was already scheduled for end-of-life after a brief run as a technical computing chip purpose-designed to complement Intel’s general-purpose x86 PC processors. The Fujitsu µVP showed great promise as a co-processor technology, and was the first heterogeneous target for our HPC compilers, but in the end never flourished due to low adoption.
This posed a challenge, as we had to deliver differentiated parallel compilers across a wide array of CPUs, including MIPS, Alpha, Power, Sparc and PA-RISC. PGI chose a translator strategy to develop parallel Fortran for scalable MPP systems. Targeting Fortran 77 as an intermediate language, we delivered parallel compilers for distributed-memory machines that leveraged the huge ongoing investments by the CPU suppliers and HPC system builders in their proprietary optimising Fortran compilers. This strategy enabled PGI to thrive as an HPC compiler supplier in the 90s, a period where RISC/UNIX systems would eventually comprise almost 90 per cent of the Top500.
Era of consolidation
However, this diversity of CPUs would not continue. With the launch of the Intel Pentium Pro CPU in 1995, momentum in the technical workstation market shifted almost overnight from RISC/UNIX to Wintel-based systems. When Sandia National Labs built the ASCI Red supercomputer in 1997, based on Pentium Pro processors, the world’s first TFLOPs-capable HPC system was born.
The confluence of ASCI Red, the rapid growth of Linux, and the relentless pace of Intel’s process technology development fostered the Linux cluster revolution, enabling new levels of computational performance and capabilities for scientific research. PGI was the compiler supplier on ASCI Red, and adapted the resulting compilers for a rapidly growing Linux/x86 market. ASCI Red Storm, Sandia’s follow-on system in 2005, was based on the AMD Opteron x86-64 architecture. By the end of 2009, Linux/x86-64 processor-based systems had almost completely displaced RISC/UNIX HPC systems. Aside from x86, only IBM Power CPUs survived in HPC as part of the long successful run of Blue Gene systems.
While this era of homogeneity in HPC platforms made system configuration and programmability more uniform for scientists and researchers, it came at a cost. Innovation slowed, promising new architectural alternatives and compiler technologies failed to flourish, and ultimately less competition led to increasing prices and fewer choices
Yet, after a 10-year period of consolidation, the HPC market has shown once again that it will never stagnate. Three new technologies have emerged as positive disruptions in the HPC hardware market.
The first is the rise of graphics processing units (GPUs) for use in HPC systems. Based on the same technology that processes computationally intensive graphics in video games and professional design applications, these GPU accelerators provide massive levels of compute processing power to accelerate the complex algorithms used in scientific applications.
Second, is the emergence of CPUs based on the ARM processor architecture, gradually climbing up the food chain from embedded systems, to mobile phones, to the wide array of tablets and portable computing devices that are now pervasive.
Third, in just the past year, IBM announced a server and HPC roadmap based around commoditisation of its Power CPU architecture. By opening up Power to HPC technology partners through its OpenPower initiative, IBM aims to foster the creation of new types of high performance computing systems.
Looking ahead, it appears as though history will repeat itself, with the HPC industry returning to a focus on heterogeneous computing architectures. All signs point to another fundamental shift in the science and engineering computing market. This will likely include a diverse array of CPUs and accelerators feeding a de facto system architecture that includes very high speed latency-optimised SIMD-capable CPU cores, coupled with massively parallel bandwidth-optimised accelerators with exposed memory hierarchies. This presents both challenges and opportunities for HPC compiler developers such as PGI, who will need to target many types of commodity computing engines to remain viable in an HPC market where heterogeneous systems and computing environments are the norm.
Since the late 1990s, most of the innovation in HPC compiler optimisations and features has been driven by proprietary compilers – SIMD vectorisation, OpenMP and auto-parallelisation for multi-core, memory hierarchy optimisations, the Cuda languages, OpenCL, OpenACC and OpenMP accelerator extensions are all derived from developments initiated in proprietary compilers. Over the last few years, LLVM compiler technology has emerged with an open source model that delivers the rapid language, feature, and targeting developments expected from open source, and allows proprietary compiler developers to innovate around LLVM components. PGI already incorporates LLVM-based code generators for its GPU targets, and LLVM compiler technology has been adopted by AMD, Apple, ARM, Cray, Intel, Nvidia and many other commercial compiler developers who are blending proprietary compiler technology with the best that open source has to offer.
It is difficult to predict winners and losers in a market with so many competing technologies, but this time around it looks like it won’t be required from a compiler standpoint. The coming era of heterogeneous HPC systems is aligned with, and supported by, a new wave of infrastructure compiler technologies built around a modern open source model being driven and energised by mobile and embedded market forces. A high-quality LLVM back-end code generator now seems to be a given for any of the viable HPC processors. It allows proprietary compiler developers to focus on innovating in higher-level optimisations, parallel programming models, and productivity features while enabling the ability to deploy quickly and uniformly across a variety of targets.
This appears to be the perfect formula for enabling an HPC market that needs to optimise performance, programmer productivity, and system cost as we drive toward exascale.