Why supercomputers are becoming cool machinesTweet
Liquid cooling can take many forms but expertise from embedded systems can also point the way to a green future for HPC, as Tom Wilkie discovers
By the time this issue of Scientific Computing World leaves the printers, Europe’s first high-performance cluster to be cooled by total immersion in mineral oil will have started operations at the University of Vienna.
The official inauguration will take place later, in the presence of local dignitaries. But the Netherlands-based integrator ClusterVision, which provided the machine, is already working on another total immersion machine for a customer in the UK. A slightly different design, this cluster is due to become operational by the end of the summer.
The energy needs of Exascale, as discussed in John Barr’s article on page 22, are driving a search for high-tech ways of reducing power consumption, including: low-power consumption chips from mobile and embedded applications; redesigning the way data is moved on the chips and in the machine; and rescheduling jobs to be energy rather than compute efficient.
But electricity is expensive in Europe today, and that cost will only rise into the future. Such considerations are already driving European supercomputer owners to seek out energy-efficiency. Cambridge University’s Wilkes machine scored second highest in the most recent Green500, for example. The sense of urgency is perhaps not so acute in other parts of the world where electricity is cheaper, but even there data centres are being moved to sit adjacent to hydropower or other convenient and cheap sources of power.
Europe does not make its own CPUs, but it does have expertise in embedded processors, which have to draw low power, and this, according to Paul Arts, director of HPC research and development for Eurotech has led to some fruitful cross-fertilisation. An international company based in northern Italy, Eurotech has a presence both in the embedded-systems sector and in high-performance computing, where its systems have been some of the most energy-efficient on the market. Eurotech Aurora systems took both the first and second slots in the June 2013 Green500 list.
But Eurotech’s approach is not total immersion, rather it has developed over the past six years a cooling system that delivers warm water to the processor. According to Arts, contact cooling is widely used within the Eurotech group for applications other than HPC, in embedded computing for example: ‘So the competence for cooling and thermal designs is far from new to the group’. But he stressed a further advantage to the technological solution Eurotech had chosen: ‘We want to build a compact machine. Through compactness we can create high speed. With immersion cooling, you have to create space between the components [for the coolant to flow between them] and you have to have something on top of them to create more heat-exchange surface. In our case, the coolant does not have to flow between the components and the cold plates are attached to the components via a thermal interface material that conducts the heat. It is efficient cooling in a smaller space.’
At the present, the design point for Eurotech is to obtain ‘free cooling’ to the outside air, which means that the coolant water can be delivered at up to 50 degrees. ‘In the end, the only temperature that’s interesting is the junction temperature of the silicon itself. It depends on what components we are talking about, but if we are talking about CPU and embedded industrial silicon, we are talking about 105 degrees. For consumer silicon, it can be 100 degrees.’
The headroom can be used not so much for overclocking but to take advantage of the higher power CPUs available from Intel. He remarked laconically: ‘Overclocking will bring warranty issues’. Eurotech is investigating energy reuse, inside a building for example, but there the water temperature may have to be higher – around 60 to 65 degrees – and ‘you may have to buy extended temperature range components for this.’
Arts stressed the importance of considering power conversion in assessing energy efficiency, and here again, the technology developed by the embedded systems side of the company can offer advantages: ‘Our goal is to make power conversion as efficient as possible – we share a lot of information with our colleagues in embedded. Power conversion in embedded, especially for portable devices, is critical. Many of the power conversion steps on a board – going from 48V to 12V and from 12V to 3.3V and even down to 0.9V – all these steps have quite a bit of inefficiency. So the goal is not only to keep the compute elements cool but also the power elements if we are to reach energy efficiency.’ More efficient power conversion also has the advantage of leading to more compact machines, he added.
ClusterVision’s approach to making its machine compact, within the context of total immersion cooling, has been to join forces with Green Revolution Cooling to design the first ever skinless supercomputer, by removing the chassis and unnecessary metal parts that would obstruct oil flow and thus also keep down the costs of the initial investment.
According to ClusterVision, the solution almost eliminates power consumption for air-cooling – cutting it to just five per cent of a conventional system’s consumption. In turn, this cuts total energy consumption by about half, and also reduces the initial capital outlay by removing the need for equipment such as chillers and HVAC units. Another area of saving is reduced current leakage at the processor level in the submerged solution, resulting in less wasted server power.
‘Power efficiency in high-performance computing is of growing concern, due to technological challenges in the ongoing race to Exascale and, far more importantly, growing concerns on climate change. With this reference, we set the stage for a new paradigm,’ said Alex Ninaber, technical director at ClusterVision.
The ClusterVision machine will be used by Austrian research organisations which are collaborating on the Vienna Scientific Cluster (VSC-3) project. The VSC-3 cluster is designed to balance compute power, memory bandwidth, and the ability to manage highly parallel workloads. It consists of 2020 nodes based on Supermicro’s motherboard, each fitted with two eight-core Intel Xeon E5-2650 v2 processors running at 2.6GHz. The smaller compute nodes have 64 GB of main memory per node, whilst the larger nodes have up to 128 and 256GB of main memory. The interconnect system is based on Intel’s Truescale QDR80 design. Software includes the BeeGFS (formerly known as FhGFS), parallel file-system, from the Fraunhofer Institute for Technological and Industrial Mathematics (ITWM). The VSC-3 cluster is managed using Bright Cluster Manager from Bright Computing.
On the other side of the Atlantic, a different submerged cooling solution has emerged as a ‘proof of concept’ announced by 3M, in collaboration with Intel and SGI. This uses two-phase immersion cooling technology. SGI’s ICE X, the fifth generation of SGI’s distributed memory supercomputer, and the Intel Xeon processor E5-2600 hardware, were placed directly into 3M’s Novec engineered fluid.
According to 3M, the two-phase immersion cooling can reduce cooling energy costs by 95 per cent, and reduce water consumption by eliminating municipal water usage for evaporative cooling. Heat can also be harvested from the system and reused for heating and other process technologies such as desalination of sea water.
In common with the other cooling systems that use a heat-transfer fluid other than air, the 3M technique reduces the overall size of the data centre – the company estimates that the space required will be reduced tenfold compared to conventional air cooling. The partners also believe that their immersive cooling will allow for tighter component packaging – allowing for greater computing power in a smaller volume. In fact, they claim that the system can cope with up to 100 kilowatts of computing power per square metre.
‘Through this collaboration with Intel and 3M, we are able to demonstrate a proof-of-concept, to reduce energy use in data centres, while optimising performance,’ said Jorge Titinger, president and CEO of SGI. ‘Built entirely on industry-standard hardware and software components, the SGI ICE X solution enables significant decreases in energy requirements for customers, lowering total cost of ownership and impact on the environment.’
Beyond the plumbing, the next step will be energy-literate sophistication in the way the cluster runs its jobs. Eurotech believes that it can improve the plumbing by constant engineering improvement, so that it can reduce the cost of a water-cooled supercomputer to less than that of an air-cooled machine – even when cost savings in terms of equipment to reject heat to the outside environment are taken out of the equation. But using expertise from the other side of its business, in embedded systems, Eurotech is starting to provide high-frequency measurement of temperature and power at the nodes within the cluster. With real data on how much power different applications consume and how that power consumption is distributed, software writers will have the opportunity to build applications that can influence the power usage. ‘For management of our nodes, we tend to use low-power processors; and on the indirect tasks of an HPC – for example measuring the temperature sensors – we use very efficient compute modules that we “borrow” from our colleagues [on the embedded side],’ Arts said.
‘If you look at the cross-links we have in-house, then you can see HPC technologies ending up in embedded and the “nano-pc” technologies ending up in HPC’. High-performance computers, he continued, ‘have many sensors on different nodes, and you have to transport that data.’ Building on the cross-disciplinary expertise, Arts believes it will be possible ‘to read out data across the whole machine – to make the developers aware of the energy and temperature across the machine. This feedback is the first step to energy-efficient programming’. The idea of programming for energy rather than compute efficiency is being promoted by many people and, he continued: ‘What we as manufacturers want is to give people tools to work with’.
Arts concluded: ‘What I also want to say is that we have a big research community in Europe that is focused on energy-efficient high-performance computing. As an integrator of the hardware, I see myself as an enabler of the community, so it can push the limits. I think this is possible within Europe and we are very strong. With European projects – such as Prace – we are able to bring new technologies into this field. Europe has a very strong team working towards energy efficient solutions. I am very proud of that.’