Software speeds supercomputers?
At ISC’13, software emerged as a dominant theme. In this first discussion, Tom Wilkie considers software and supercomputing speed
Intel must be feeling pleased. Last month, it announced the latest version of its ‘accelerator’ chip – the Xeon Phi co-processor – at the International Supercomputing Conference (ISC’13) in Leipzig, Germany, on the same day that the Chinese Milky Way 2 machine was declared the fastest computer on Earth. And the Chinese machine runs on precisely these Phi chips.
But behind the headlines, software rather than hardware seemed to be exercising many minds – including the two main chip manufacturers, Intel and Nvidia themselves. Better operating software will improve machine performance – both speed and energy-efficiency. Better application software will increase the attractiveness of high-performance computing to industry and encourage small- and medium-sized companies to make use of supercomputing resources.
For Bill Dally, Nvidia’s chief scientist, Moore’s Law was about processors: ‘The issue now is how to get performance out of those processors’. The future, he said, will be dominated by ‘Parallelism; Power; and Communications’. High-performance computing in future will be parallel computing; all future machines will be power-limited; and finally, moving bits uses more energy than arithmetic, so the energy cost of communication will limit future machines. In his keynote talk to ISC’13, he illustrated how designers are now concerned about the energy costs of moving bits locally, within a chip, let alone having to move off the chip. The key, he said, is to rethink the software so that calculations are recompiled to run in a more energy-efficient manner. He sees software, not hardware as the key to progress towards the next generation of exascale machines (those that will be able to perform 1018 floating point operations each second, compared to the current peak performance of 55 x 1015 on the Chinese machine.)
The US Department of Energy (DoE) clearly shares this assessment, as it is funding a project led by Peter Beckman, from the Argonne National Laboratory, to spend the next three years designing an operating system for an exascale machine. The project, called Argo, is expected to start on 1 August and will involve not just Argonne but also Lawrence Livermore and Pacific Northwest national Laboratories as well as academic partners such as the University of Illinois. Currently, Dr Beckman said, large systems are not managed as collections of nodes in a global operating system but by a node operating system stacked together. The objective, he said, is that ‘power becomes a managed resource in the same way you [the operating system] manage disk space and memory. You need a goal oriented management software where there is a trade off between power consumption and the optimal computational environment. For exascale, you need to move to this global, goal-oriented approach where the entire system manages it actively.’
The project is being funded by the DoE, essentially the US Government’s lead department in the development of the next generation of computers, because, according to Dr Beckman, industry would have a hard-time doing it, because high-performance computing is such a relatively small part of their business. The practical outcome would be the creation of s functioning operating system that could be ‘productised’ by a big vendor such as Cray or Intel. ‘Intel has to own more and more of the software to be successful in large machines that go to big customers,’ he continued, ‘they want fully integrated machines. The compiler is at the heart of generating good, optimised code. Intel needs to make its chip look good, so it has to have a system software stack that is optimised for that platform.’ Cray and IBM have already been moving in that direction, he said.
At an intergovernmental level, the US is looking to team up with both Japan and Europe on exascale projects. The DoE is discussing a partnership agreement on exascale with the Japanese Ministry of Education, Technology and Culture, covering system software, application software, and hardware platforms. According to Dr Beckman, system software may be the easiest issue as it could all be open source. Under the agreement, all the research will be open source, but commercial companies may take make use of that to pursue their own paths and strategies so it is not envisaged that there would be a commercial tie-up between, say, IBM and Fujitsu.