Change bursts from the HPC crystal ball
The world of high-performance computing (HPC) is bubbling. Years of hardware deployments dominated by similar solutions are giving way to a broader spectrum of choices. Fundamental issues that have been mostly ignored, but which have been starkly obvious to anyone who cared to look, are now grabbing the community’s attention: (lack of) diversity; the ticking software debt; and software careers. Giants of the industry, such as Intel and IBM, are shuffling to new positions. Different usage models are winning acceptance, even from hardened HPC folk, including the much-hyped cloud and the possibility of something other than ‘Fortran+MPI’
So, with all this bubbling away, what can we foresee for HPC in 2016?
Real choices in hardware
For many years now, the majority of HPC systems have been based on x86 compute nodes (most commonly featuring two Intel Xeon processors with a few cores each) strung together with commodity interconnects. There have been several perturbations – for example, enhanced capabilities using custom interconnects in Cray XC or SGI UltraViolet systems. But the dominant architecture had become a safe choice for the buyer and application developer.
Change has been coming, though. Multicore processors have ridden into near-ubiquity on the back of pressures to reduce power and energy consumption (not the same thing) and increase performance-per-dollar of HPC systems. The same pressures mean many-core processors (tens to hundreds of cores) will become hard to avoid in the near future.
Nvidia deserves much of the credit for driving many-core processing into the mainstream of HPC. Its aggressive marketing of a capable many-core processor product in Tesla, critically backed by investment in Cuda as a means to access the potential of GPUs, has had an undeniable effect on the HPC landscape in recent years. It is now hard to attend any HPC or computational science conference without falling over presentations describing applications of GPU computing. Of course, there is still plenty of debate in the corridors about how many people are using large GPU clusters as their primary workhorse for real production workloads. But that misses the point: whether just for special case use, or for experimental work, people are evidently using GPUs, so system buyers and application developers must consider this as a mainstream processor option.
Intel has its own many-core product, Xeon Phi, with the much awaited Knights Landing generation due in 2016. ‘KNL’, as it is colloquially known, promises performance comparable to GPUs (with suitable tuning effort). The use of OpenMP rather than Cuda means developers have a shot at better code compatibility with Xeon. Thus KNL is likely to attract the attention of both system buyers and application developers as a serious option, and to make significant inroads into market share of HPC systems in 2016 and beyond.
Alongside Xeon, GPUs, and KNL, a fistful of other processor options are pressing for ‘mind share’, either now or in the near future. IBM’s Power architecture and ARM-based processors, in particular, both have ambitious plans and credible technology roadmaps to earn a much greater share of the HPC market.
It is highly unlikely that Xeon’s dominance in HPC will be seriously reduced in the next few years. But, with an increasing choice of capable processors available this year, so that the set of roadmaps extends beyond even those of IBM and ARM, each supercomputer deployment faces a more complex decision than in recent years. That, in turn, makes for a more diverse and uncertain range of platforms that application developers should target.
Seeing the obvious
Speaking of diversity, what about people? Attend any HPC conference or similar event and you’ll see the participants showing a distinct dominant trait. Specifically, they are mostly male. Oddly, it seems not to have been important enough for people to highlight, or act on, until recently.
Thankfully, due to the efforts of a handful of strong individuals, and the increased willingness of many more to acknowledge the problem, the lack of diversity in HPC people has pushed its way into the mainstream conversation of the HPC community. I look forward to seeing measurable improvements in gender diversity during 2016. It will be a long time before the gender split at HPC workplaces and events nears parity, but every step towards that is welcome.
When discussing diversity (or lack of) it is also important to look beyond the glaringly obvious gender split and consider other diversity failures – for example, ethnicity, social background, and age. With regard to age, the proper value placed on experience could be combined with the willingness found in early/mid career people to challenge ‘the way things have always been done’, opening the door to innovation. Indeed, that is a key benefit of improving diversity in general – enabling better results, by considering options beyond the subconscious bias of a less diverse group.
A related longstanding issue that has now fizzed to the surface is that of research software engineers. The world of HPC is intricately linked with the world of research. One characteristic of research is the appearance of a two class system comprising researchers at the top and support staff (including technicians, IT staff, managers and administration) at the bottom. There are many issues with this, but the one that concerns us here is research software.
Software is core to most modern research and, of course, it is essential to all research using HPC. Developing and enhancing software for effective performance on HPC platforms is an advanced skill, best performed by people who possess the rare ability to hover partly in the research space and partly in the numerical software engineering space. Deemed ‘research software engineers’ (RSEs) in the UK, these people are usually forsaken by the career-measurement systems of the research environments they operate in; they are neither pure researchers, nor traditional support staff.
In the HPC world, RSEs need to understand HPC technology as well as software and research. RSEs are probably better funded and acknowledged in HPC than elsewhere in research, thanks to the science application support teams deployed at most supercomputer centres. However, the recognition is still well short of what is needed, and they usually still suffer the same career challenges.
Good progress has been made in the UK recently, with the EPSRC funding RSE Fellowships and the Software Sustainability Institute (funded by EPSRC, BBSRC and ESRC). This year should see more people funded or recruited on their ability as research software engineers, rather than pseudo-researcher or computer support staff, as previously. It should also see greater international debate and co-ordination in this area.
Software: the issue in the shadows
One important reason for recognising the role of RSEs, and making HPC research software a more attractive career, is the bottled-up disruption that has been hiding in plain sight for years. The technology of HPC is changing substantially, towards many-core processors that require lots of fine grain concurrency to deliver useful performance. Complexity in the data hierarchy is burgeoning; then there are the issues of energy efficiency, system topologies, scalability, security, and reproducibility.
A disconcertingly large proportion of the software used in computational science and engineering today was written for friendlier and less complex technology. An explosion of attention is needed to drag software into a state where it can effectively deliver science using future HPC platforms.
In many cases, the issue is to address scalability, concurrency and robustness (‘software modernisation’). However, there will be a sea of applications for which the need is not only better implementation but reconsideration of the algorithms and the scientific approach, if we are to exploit the new HPC technologies effectively for science and engineering.
To achieve this, we need a legion of happy and capable HPC RSEs, including the hoped-for influx when the diversity failures get addressed. Access to a sufficient number and quality of HPC software engineers is already a competitive differentiator for commercial and academic users of HPC, and this will become more acute in the coming years.
OpenHPC vs. OpenPower
Many commentators point to the competition between Intel’s KNL and Nvidia’s Tesla as the key technology battle of 2016. However, I think the interesting one will be Intel vs. IBM. Each is doing its best to dress this up as having a community behind it, with IBM’s OpenPower initiative in the blue corner and the Intel-led OpenHPC initiative in the other blue corner.
IBM’s Power architecture combined with Nvidia GPUs, as in the US Department of Energy’s forthcoming Coral supercomputers, has several attractive characteristics and is set to compete with supercomputers based around Intel’s Xeon and Xeon Phi products.
Intel is positioning itself to take on more of the ecosystem – such as OmniPath interconnect, software stack definition, and more – maybe even towards providing whole solutions. There are benefits to buyers and developers – sureness of compatibility for instance. But the risk to Intel is that it awakens the inherent distrust of monopoly suppliers within the HPC community, and triggers a reaction against ‘Intel everywhere’ (whether backed by OpenHPC or not).
IBM has announced ambitions towards much greater market share of HPC. The Nvidia tie-up will be key, as will IBM’s ability to convince buyers/developers that, in spite of cutting a lot of their HPC business, they are truly committed to HPC and are investing in a sustainable ecosystem. And, while the two giants IBM and Intel adjust positions, ARM is quietly but surely building up its HPC focus, with active ecosystem plans and market share ambitions.
The old ways are changing
‘Good old fashioned HPC’ is often thought, narrowly, to consist of running applications written with Fortran and MPI on supercomputers accessed either by owning the hardware or through direct allocations of time. In 2016, we are likely to see alternative models grow.
The marketing for running HPC jobs in the cloud has developed some of the same characteristics as early GPU marketing; comparisons to traditional methods are often unfairly constructed to favour the authors’ desired conclusions. GPUs matured both in terms of marketing claims and delivered performance. Cloud will go the same way, probably quite rapidly over the next year or so. Then we will see compute capacity delivered via cloud or cloud-like models, being used to run real (even large) HPC workloads in companies. Applications requiring inter-process communication at scale (hundreds of nodes) and significant I/O will continue to struggle in cloud environments however.
Away from the compute capacity itself, there are stories emerging of Python, Spark, and C++ gaining traction for HPC application software. It will be interesting to see over the course of 2016 how much of this is just ‘let’s do something different’ and how much turns out to be well-considered choices that give clear benefits. In the core scientific computing software arena, Fortran (plus MPI or OpenMP) will dominate for a long time yet. But new choices will be used for specific gains – for example, we already have experiences where Spark seems to make more sense.
An interesting year ahead
With all this bubbling away, 2016 promises to be an interesting year for HPC. SC15 in Austin was, in my view, one of the most vibrant SC conferences of recent years. I trust that is reflective of an active and positive year ahead for HPC
But, to me, the main observation I’d call out as valuable advice for 2016 is: access to sufficient number and quality of HPC software engineers will be a key competitive differentiator for commercial and academic users of HPC.
Andrew Jones provides impartial advice on HPC technology, strategy, and operation for companies around the world as leader of NAG’s HPC Consulting & Services business. He is also active on twitter as @hpcnotes.