In his second report from the International Supercomputing Conference (ISC’14) which took place in Leipzig at the end of June, Tom Wilkie discusses different approaches to energy efficiency in high performance computing.
A casual observer at this year’s International Supercomputing Conference (ISC’14), held in Leipzig at the end of June, could be forgiven for thinking that the most popular engineering discipline in high-performance computing (HPC) today is a throwback to Victorian times – plumbing. Liquid cooling of supercomputers has finally come of age, with more than 21 of the exhibition stands at ISC’14 featuring liquid cooling in some form or other.
But, as Jean-Pierre Panziera, Chief Technology Director of Extreme Computing for Bull, pointed out, this is anything but Victorian technology. Bull is using its own supercomputers to work out the complex computational fluid dynamics involved in designing an efficient system, he said. In the search for energy efficiency, Bull is already looking beyond what Panziera referred to as ‘passive options’, such as cooling, to the more active approach of monitoring the energy usage of its servers and components and optimising the way in which the computation proceeds so as to be more energy efficient.
HP also announced its entry in the liquid cooling stakes at ISC’14, with a theatrical unveiling of a new liquid cooled system, the Apollo 8000, on its stand. This too uses warm water, although the heat is conducted from the components by hot plates and heat pipes to be removed by the water cooling at the edge of the board. HP claims the design has advantages in terms of ‘dry disconnect’. It is also hedging its bets, in that part of the newly announced Apollo series, the 6000, is a more traditional, air-cooled version.
But alcohol rather than water generated a lot of interest on the show floor, with a highly ingenious two-phase approach being pioneered by the Belgian company Calyos. By combining a two-phase latent heat effect with a passive capillary pump, the Calyos system allows the heat to be removed from the components without any need for external energy to pump the fluid round.
Calyos can use methanol as the working fluid but its designs also include the use of a non-toxic hydrofluorocarbon, pentafluoropropane (R245fa). According to Maxime Vuckovic, head of engineering development, thanks to their very low thermal resistance, these passive solutions make it possible to use highly dissipative solutions for rejecting the heat to the environment and they can work with hot water (> 50°C) or natural air cooling.
Because it is a sealed system, maintenance should not be required and, because no power is consumed in pumping the working fluid, energy costs are low also. Calyos has close ties with the European aircraft manufacturer Airbus Group (formerly EADS) because the need to remove heat from electronic components is a requirement in satellites and space probes, as well as in some military applications. Calyos’ technology, therefore, has a wider range of applicability than just high-performance computing.
It was noteworthy that one of the teams participating in the Student Cluster Competition at ISC’14 was using a liquid-cooled cluster. The team from the Edinburgh Parallel Computing Centre (EPCC) was the fastest in the LINPACK test using a liquid-cooled GPU-accelerated system provided by the British-based company, Boston. It was the first public demonstration of this liquid-cooled technology from Boston.
If proof were needed that energy efficiency and liquid cooling had truly come of age, then one needed to look no further at ISC’14 than the Chinese company Sugon. As already reported here, it was demonstrating China’s first liquid-cooled HPC technology. An immersion system in concept, it uses working fluid provided by 3M, with a boiling point of 50 C, so that coolant closest to the processor changes phase, is cycled to a condenser, and circulated back to the system. Shen Weidong, general manager of the company’s data centres department remarked that to lower the power usage effectiveness (PuE) of a data centre significantly: ‘liquid cooling is the only way.’ In Beijing, he continued, it was no longer possible to build data centres due to power restrictions.
According to Bull’s CTO, Jean-Pierre Panziera, it has been shipping direct liquid cooled systems, using hot water, for more than a year now. It has seen its business flip, he said, so that about 90 per cent of the blades that Bull ships are direct liquid cooled rather than the traditional air-cooled systems. Efficiency was increased, he said, by bringing the water to the component as this made it possible to employ hot water cooling at about 50 degrees C: ‘Too hot to have a shower!’ he pointed out, but no energy had to be expended in pre-cooling the liquid. Currently the only air cooling in such systems was for the power supply, and he expected liquid cooling to be extended to that area in the next generation of the company’s systems.
Bull is also a partner with Calyos in developing its novel technology, Panziera pointed out. But while that partnership may yield a new, and even more energy efficient way of doing the ‘plumbing’ at some point in the future, Bull is also interested in moving beyond cooling to more active ways of reducing the energy costs of computing. According to Panziera, Bull has partnered with the Technical University of Dresden in a project entitled HDEEM, for ‘high-definition energy-efficiency monitoring’. Bull is instrumenting its boards by incorporating a FPGA (field-programmable gate array) to monitor for temperature (and hence energy consumption) at relatively high frequency (around one millisecond) at an accuracy of 2 per cent.
The project has developed a way in which this information can be fed back to the performance monitoring software suite known as Vampir, developed at the Technical University of Dresden. Vampir was designed to offer application developers in high-performance computing an effective tool for collecting and analysing data about the performance of their code in a given machine. This data presented by Vampir can be used to identify such issues as computational and communication bottlenecks, load imbalances and inefficient CPU utilisation. With that knowledge, developers can tune their codes to run more efficiently, in terms of the speed with which the computation is executing and to shorten the overall time that it takes to perform a job. However, the purpose of HDEEM is to provide precise metrics not just in performance but also in terms of power consumption, Panziera said.
The subject of monitoring the performance of next-generation supercomputers so as to be able to optimise them for energy consumption, was the subject of presentations at a session on international cooperation held in Leipzig, the day before the ISC’14 itself formally started. The session, chaired by Pete Beckman from the US Argonne National Laboratory, heard how the Tsubame-KFC at the Tokyo Institute of Technology is highly instrumented. The AC power consumption of the compute notes is monitored as is the DC power consumption of the CPUS and GPUs. The temperature of the CPUs and GPUs are also monitored. The system is immersed in an oil bath and the power consumption of the oil pump is monitored as is that of the secondary water loop. But the problem, the session heard, was to devise a power-performance API. Japan expects to mandate such APIs as part of the official procurement for future Tsubames and thus to persuade vendors to adopt them more widely.
After ISC’14 finished, the Green500 list of the world’s most energy efficient supercomputers was announced and Tsubame-KFC once again topped the list as the most efficient machine. The top end of the Green500 list remained largely unchanged with Tsubame-KFC remaining the only system above the 4,000 MFLOPS/Watt mark. The ten greenest machines from the previous Green500 list all remained in the top 15 of the new list and were joined by similarly heterogeneous architectures. In addition, the ranking of the most energy-efficient heterogeneous supercomputing systems derived more from the GPU than on the CPU. For instance, the 15 greenest systems are all accelerated by NVIDIA Kepler K20 GPUs, but different Intel CPUs are sprinkled throughout the top 15 with Piz Daint at number 5 and CSIRO at number 7 having Intel Sandy Bridge CPUs and Tsubame at number 8, having Intel Westmere CPUs, both older CPU generations. The rest of the top 15 systems all use Intel Ivy Bridge, the latest generation of CPU.