Tom Wilkie hears the views of the main processor manufacturers, as he continues this series on the role of software in supercomputing
At ISC’13, an Audi RS5 motor car was the centrepiece of the Intel stand, as the company chose to showcase its new generation of Xeon Phi processors, less by displaying the chips themselves but rather what they could accomplish.
Behind the gleaming vehicle, a 52 node cluster was running high fidelity visualisations of a new Audi design. The car manufacturer and Autodesk had come together to adopt real-time predictive rendering of designs, thus eliminating the need to construct so many physical prototypes. The benefit for Audi is shorter design times and lower costs of developing new cars.
It was a visually striking way of illustrating what turned out to be a recurrent theme of this year’s International Supercomputing Conference (ISC’13): that processors, memory chips and interconnects are only part of the story; high-performance computing needs clever software. Some of it is needed to compile the jobs, manage the system, and run the machines in an energy-efficient manner, as discussed in the first article in this series about software. But ultimately, if it is to be useful, high-performance computing needs application software that will deliver results for scientists and engineers, a point stressed in the preceding article to this one.
The sort of virtual prototyping being demonstrated by Audi represents a business transformation in the automotive industry, Raj Hazra, general manager of Intel’s Technical Computing Group, said in an interview. The implications go much wider than Audi itself, he said, and reach all the way down the automotive supply chain: ‘If your supply chain is not brought into this digital world, you lose efficiency – your supplier also has to be able to plug into the digital model.’ There is little point, he said, of Audi designing in silico, if the supplier of a critical component is still having to make physical prototypes. Thus its adoption by Audi has the effect of driving virtual modelling and engineering further down the supply chain.
But hand-in-hand with the move to parallelism has come the development of heterogeneous computer architectures, where much of the computation is hived off from CPUs to specialised processors – accelerators such as Nvidia GPUs or Intel’s Phi coprocessors. Intel announced the latest range of its Xeon Phi co-processors at ISC’13: the 7100 family with high performance and most memory; the 3100 family, a more ‘mid-range’ product; and, intriguingly, the 5100 family which comes with a high-density form factor, unpackaged, in a way, so that OEMs can integrate it into their own systems.
This shift to heterogeneous architectures presents its own problems to application programmers. Nvidia’s GPUs do not share a lineage with the traditional x86 architectures of most CPU processors and instead require their own language, Cuda. However, according to Sumit Gupta, general manager of the Tesla Accelerated Computing Group at Nvidia, this should not be too much of an obstacle to application programmers: ‘the hard part is the parallel programming. Cuda is 5 per cent to 20 per cent of the effort.’ He also pointed to widespread and growing industry support for OpenACC – a standard designed to simplify parallel programming of accelerators in a heterogeneous computer. Within just a couple of days of attending a workshop on OpenACC, people were getting a doubling of speed of execution of their programs, he said. The company was continuing to focus on developers and programmability, to make the technology applicable to many more people, he said.
Nvidia too was at pains to put end-users centre-stage. Gupta cited the example of Nuance, a leader in the development of speech recognition and natural language technologies. It is using GPUs in neural networks to speed up voice recognition – training the neural network to understand human speech by using terabytes of audio data. Once the models are trained, they can recognise patterns of words spoken to them by relating them to the patterns that had been learned earlier. ‘The Kepler GPU has been a breakthrough for them – they are where they want to be,’ he said. Among the applications could be voice-enabled cars or better service to customers who, when phoning a company, are taken through an automated, voice-operated menu of options.
Raj Hazra and Intel prefer to describe systems built on the Xeon CPU with Phi coprocessor as ‘neo-heterogeneous’. Since the Phi shares the x86 tradition: ‘you do not have to spend time to port your code, you want to maximise the time to optimise it,’ Hazra said.
Hazra believes that HPC is becoming democratised: ‘It is not just Governments, but companies like Audi and small- and medium-sized companies that have a need for some HPC capability.’ But he sees the issue of legacy application software as one of the major issues impeding the wider use of HPC. End-users see the changes that have already taken place in computer architectures and worry about the risks of committing in case architectures may change again. Hazra emphasised that Intel’s approach is designed to minimise that risk of adoption. Users can take a code that runs on Xeon CPUs, he said, explore optimising it on the Phi coprocessor, and it they decide to stay on Xeon, they will find that it still runs better. It is a risk mitigation approach, he continued, because the work has not been wasted, as the code still performs better. ‘People need to find it acceptable to take the risk. That’s what Xeon Phi has had as its design constraint from day one.’
There may be further changes to the processors and architectures of high-performance computers in the future. The need for low energy consumption has generated interest in processors deriving from mobile phones and tablet computers which, because they are battery-powered are designed from the outset to achieve low energy consumption, and several vendors are already offering ARM processors. To avert fears that the next technology change may require yet another re-writing of software, Sumit Gupta pointed out that Nvidia supports the technology and that Cuda 5.5 on ARM is already available.