ANALYSIS & OPINION

Getting the science done, faster

30 November 2012



Big events such as November’s supercomputing conference and trade show, SC12 in Salt Lake City, generate their own energy and excitement. Now that the adrenalin rush is over, Tom Wilkie offers a personal view of the themes and emerging trends

Something was lacking in Salt Lake City, this November. Despite ill-informed anxieties about the capital city of the Latter Day Saints, it was neither coffee nor beer – both were in plentiful supply. Shockingly, the US Government was missing from the biggest supercomputing event of the year.

Both the US Department of Energy (DoE) and the Department of Defense – whose procurement policies over the past few decades have essentially created and sustained the US supercomputing industry – had abruptly cancelled travel for their employees to attend SC12, the supercomputing conference and trade show. More worryingly, they had also cancelled the exhibition stands from which they promote their work.

Jeff Hollingsworth, Chair of the 2012 conference and who had therefore spent the best part of the past two years organising the event, found some positives – SC12 had the largest ever number of commercial exhibitors as well as the largest floor area taken by commercial companies, he said. In the short term, he continued, SC is healthy. Some DoE employees, predominantly those who had been invited to present papers at the conference, had obtained permission and funding to travel. (Others were so dedicated to the technology, and the event, that they used up their own holiday allocation and, in some cases, travelled at their own expense to be there.)

However, with his background as Professor of Computer Science at the University of Maryland, Hollingsworth stressed the importance of conference attendance to science and engineering, warning the cost-cutters that the papers presented at a conference ‘are yesterday’s results. It’s the hallway conversations that are the wellspring of new ideas that generate tomorrow’s results. You don’t get that from Skype conversations.’ He pointed with pleasure to the large numbers of student papers that had been nominated for some of the prizes that SC awards each year: ‘It speaks very well for our future. We need to expand the high-tech workforce.’

It has been a mainstay of the US supercomputing shows that the major national laboratories (particularly those funded by the DoE) took a massive amount of floor space and built suitably impressive stands. Although seldom discussed explicitly, this presence (albeit renting floor space at a reduced rate) has served as a closet US Government subsidy to the conference and thus the supercomputing industry itself – in addition to the overt purpose of enhancing the profile of the labs and the work they do.

Although some attendees felt that this year’s ‘no-show’ was more a reaction to financial scandals elsewhere in the US Government, the symbolism was unsettling. Professor Hollingsworth himself admitted to ‘more than a minor concern’ that in the current tight fiscal climate ‘there are questions about whether there will be sufficient [US Government] funding for a full-scale exascale programme.’

Currently, the most powerful machine in the world is a petascale computer: Titan, a Cray XK7 system installed at Oak Ridge National Laboratory. The Top500 rankings are published twice a year, to coincide with the European ISC and the US SC conferences, and in November’s listings Titan achieved 17.59 Petaflops (1015 floating point calculations per second) on the Linpack benchmark. The system uses AMD’s Opteron 6274 16C 2.2GHz processors, Cray’s Gemini interconnect, and its 560,640 processors included 261,632 Nvidia K20x accelerator cores.

Everyone in the industry is agreed that exascale computing – 1018 floating point calculations per second and therefore about hundred times faster than Titan – cannot be achieved by ‘more of the same’ (i.e. a simple extrapolation of current technology) but will require investment in new and possibly unorthodox technologies both at the hardware and software levels.

Nervousness about the commitment of the US Government and the Congress to this sort of long-term investment in technology could be seen at the official SC12 press conference, where Dona Crawford, associate director, Computation, at the Lawrence Livermore National Laboratory, announced that the US Council on Competitiveness is to be funded to the tune of nearly one million dollars over the next three years to develop recommendations to Congress on extreme computing. Why should this be necessary, journalists at the press conference asked, when the Department of Energy had already sent to Congress, in February, its plan to build an exascale machine before the end of the decade.

In addition, it was announced that the consultancy company IDC is to be funded to create an economic model estimating the return on investment (ROI) from high-performance computing. Since no one in the history of science and technology studies has successfully predicted ROI in advance (it’s easier to do in hindsight), one could be forgiven for thinking that these are signs of worry that the constituency favouring exascale is unravelling and thus both ‘studies’ are a disguised form of advocacy that needs to be put in place to shore up support for the exascale project in the USA.

To add to the nervousness among US proponents of supercomputing, the European computer industry appears to be getting its act together and to be working hand-in-glove with Government (in the form of the European Commission). The Europeans staged a conference session to outline the ‘European Technology Platform for High Performance Computing’ (ETP4HPC). This is a project specifically to boost the European ‘HPC value chain’, according to Jean-Francois Lavingnon, director of HPC for French manufacturer Bull. The point is to create an industry-led framework that will define Europe’s research priorities and action plans, in particular delivering a Strategic Research Agenda by the end of this year, that will form part of the European Union’s ‘Horizon 2020’ research programme. Although Europe’s HPC suppliers have only about 4.3 per cent of the world market, there is a recognition in both the public and private sector in Europe, according to Giampietro Tecchiolli CTO of Eurotech, that ‘HPC represents a strategic tool in the development of other technologies.’

Aside from the politics, the processor technology underpinning HPC is also getting interesting – and political in its own way. Although some large manufacturers, notably IBM and Fujitsu, still make special processors, much of the industry has standardised for many years on x86 commodity chips from Intel or AMD. A few years back, Nvidia transformed the scene with its GPU accelerators and the signs are that the technology is going to become even more complicated as several new companies, such as ARM and Texas Instruments, enter the fray.

Although the ARM processor is currently available only in a 32-bit configuration, there is a roadmap through a 40-bit processor to 64 bits in the future. In mid-October, Penguin Computing announced its Ultimate Data X1 (UDX1) system, the first server platform offered by a North American system vendor that is built on the ARM-based EnergyCore System on Chip (SoC) from Calxeda. And although HP had launched the first phase of its Project Moonshot programme to develop extremely low-energy servers using Intel’s Atom processors in the summer, at SC12, it had examples of blades incorporating incorporating Calxeda’s realisation of ARM processors. Calxeda, based in Austin Texas, is in the vanguard of low-energy high performance systems using ARM.

Other manufacturers are taking an interest as well, however. ARM’s profile was further enhanced when, on the first day of the exhibition, the Mont-Blanc European project announced that it had selected the Samsung Exynos platform for its prototype low-power high-performance computer. The Samsung Exynos 5 Dual features a dual-core 1.7GHz mobile CPU built on ARM Cortex-A15 architecture plus an integrated ARM Mali-T604 GPU for increased performance and energy efficiency.

This is claimed to be the first use of an embedded mobile SoC in HPC. It will enable the Mont-Blanc project to explore the challenges and benefits of deeply integrated, energy-efficient processors and GPU accelerators, compared to traditional homogeneous multicore systems, and heterogeneous CPU + external GPU architectures. The system already runs in consumer and mobile devices such as the Samsung Chromebook and Google’s Nexus 10.

Ironically, this pioneering step into a new world of HPC was invisible to anyone passing by the Samsung stand at SC12, which concentrated largely on memory systems. It appears that corporate decisions had been made on what would be promoted at SC12 before the Mont Blanc decision was reached, and the design was not changed to accommodate it.

The rise of embedded and mobile chip makers was exemplified further by Texas Instruments, at SC for a second year, who demonstrated systems that balance multiple ARM Cortex-A15 MPCore processors and its own digital signal processors (DSPs) on the same chip. According to Arnon Friedman, business manager, Multicore DSP, for TI: ‘The message we are trying to get out is that it’s a full processor, not just an accelerator. We’re running high-performance Linpack on DSPs.’ It’s potentially a better way to cloud computing, he said, and ‘one of the ways we can do heterogeneous architectures.’ According to TI, the cloud’s first-generation general-purpose servers sometimes struggle with big data applications including not only high-performance computing but also video processing. Based on TI’s KeyStone architecture, the processors, with the integration of security processing, networking and switching, reduce system cost and power consumption in workloads such as high-performance computing, also video delivery, and media and image processing.

At SC12 both Intel and Nvidia made high-profile announcements about the availability of the Phi coprocessor and the Tesla K20 respectively. In a mirror image of the US Government’s decision to withdraw, Intel provided a highly visible symbol of its commitment to HPC by expanding its presence to create a massive stand on the show floor, taking over space that had previously been allocated to US national laboratories before the cancellation. Intel’s Diane Bryant, general manager of the company’s Data Center and Connected Systems Group, was at pains to stress that the Phi is not an accelerator but is a coprocessor. It has the same architecture as the mainstream Xeon family and therefore there is no need to develop a new programming model to make use of the Phi’s capabilities. Although the name ‘Nvidia’ was never uttered during any of Intel’s press briefings about the Phi, this represents Intel’s rejoinder to the inroads that the Nvidia GPUs have made in high-performance computing. Nvidia has developed Cuda, its own parallel programming model for GPUs.

But although Intel and Nvidia have locked antlers, the system vendors, appear to be agnostic and, in the course of SC12, many of them duly announced support for both the Phi and the Tesla K20. Among those releasing press announcements of support for both technologies at SC12 were SGI, HP, Penguin and Silicon Mechanics. Somewhat ahead of the pack, Cray had announced in June that its next generation Cascade supercomputer, the XC30, would be available with the Intel Phi coprocessors, and duly followed that in August with a similar announcement regarding the Nvidia Kepler GPU. Brian Payne, director of marketing for Dell’s Power Edge Servers, appeared at the Intel press conference to endorse the Phi coprocessor.

AMD, somehow, could not quite shake off the impression that it is falling behind, despite the fact that the Titan machine, the fastest in the world, uses its Opteron processors. Moreover, the SANAM system from the King Abdulaziz City for Science and Technology came second in the Green500 listings for energy efficiency. SANAM has an Intel CPU system that uses AMD's new FirePro S10000 GPU accelerators. AMD decided to time the promotion of its FirePro Workstation Graphics at SC12. It had announced in October that it will design 64-bit ARM processors in addition to its x86 processors to create a highly-integrated, 64-bit multicore System-on-a-Chip (SoC) for cloud and data centre servers. The first ARM-based AMD Opteron processor is targeted for production in 2014 and will integrate the AMD SeaMicro Freedom.

Rumours have been circulating for some time that AMD might be a candidate for merger or acquisition (http://www.top500.org/blog/about-assumptions-and-acquisitions/ ) and the company’s future was a subject of speculation on the show floor, even by companies that use AMD components. In a sign of how the times have shifted in favour of mobile platforms – phones, tablets etc. – Qualcomm, the San-Diego based global ‘fabless’ semiconductor company that specialises in mobile and wireless chipsets, has been mentioned as a potential acquirer.

Just as the boundaries are blurring between mobile chipsets and the main processors for HPC systems, so the definition of a supercomputer is also changing. Although the Top500 list is still definitive and is eagerly examined to see who is in the Top 10 and who has fallen out, much of the emphasis at SC12 was on moving data around, rather than on fast numerical calculations. ARM processors may not yet have 64-bit capability, but they do have a significant role to play in data processing. The divide between the number crunchers and the data processors is most evident in the way in which companies want to access cloud computing. For many organisations, their own internal compute resources will sometimes not be enough. But to burst out into the cloud for low-latency number crunching, means that users want to access ‘bare metal’ cloud and not virtual machines as are offered by Amazon Web Services (AWS).

On the other hand, those whose compute tasks require high throughput but where low latency is less of an issue may be perfectly happy to burst out into a virtual machine. Thus many companies on the show floor were advertising ‘bare metal’ while others were happy to offer software that configured jobs for running on virtual machines such as AWS. Arend Dittmer, director of Product Marketing for Penguin Computing, remarked in an interview that while there had initially been a low uptake of the company’s on-demand offering, ‘the train has now left the station and a lot more companies will leverage the benefits of the cloud.’ He pointed out that Hadoop applications use lots of I/O and flop rate was less important. Penguin itself offered ‘bare metal’ cloud services to its customers. The big question, he concluded, was what is the pricing model of the independent software vendors (ISVs), for that would determine how fast organisations were able to take up cloud computing.

Cloud computing, and the appropriate pricing model, is exercising the minds of ISVs, according to Silvina Grad-Freilich of The MathWorks. The company is one of very few ISVs who attend SC and who take a booth on the exhibition floor. It is important to The MathWorks, however, she said because it allows the company to keep its finger on the pulse of new developments in the industry and to hear customers’ views. Before Cloud computing, people used to delay purchasing software until they had the hardware, she commented. But often they did not know how much hardware they would need to run particular jobs. This translated to frustration for the end user scientist or engineer – and loss of revenue for The MathWorks. But the Cloud is solving that, she said, and the company had been successful in developing pricing models that allowed users to go into the cloud if they needed to. After all, she concluded: ‘What is important to us is to provide access to our software.’

It was an unconscious echo of the fundamental point made by Jeff Hollingsworth, the Chair of SC12: ‘HPC exists for the applications. We can build rooms full of blinking lights, but that’s not what it is about. It’s not about the latest in silicon, but about getting the science done.’