Cooling technology options for HPC
In the face of mounting costs and increasing energy bills for those operating facilities with HPC systems, selecting the right cooling technology is as important as choices around hardware and software. Over-engineering, or using excessive cooling, can create additional unneeded costs but equally the infrastructure must be able to meet requirements both for today’s computing needs and future upgrades.
In addition, there are other factors that might sway the decision, such as the data centre set-up and total potential power consumption, which can force the hand of those setting up cooling solutions. If an exotic cooling solution cannot fit into the data centre, cannot operate within a given power budget, or work with the existing infrastructure, then another solution may need to be chosen.
Lucky for the HPC community, then, that there are a wide range of technologies and many more implementations of each; from bespoke systems to retrofitting cooling technology around an existing rack and infrastructure. There are options available; it just depends on the requirements of a particular HPC centre. Do they want to focus on performance, obtaining a specific power usage effectiveness (PUE), or just meeting the demands for cooling their system at the minimum possible cost?
Selecting the right technology
Cooling technology is imperative to HPC operations but choosing the right technology depends on a number of different factors, such as server density, cost, facility power consumption and data centre infrastructure.
Technologies available for servers can largely be split into four main categories: rear door heat exchangers (air cooling), rear door heat exchanger (RDHX), liquid cooling and immersive cooling, which typically uses oil to submerge electronic parts.
Air cooling is the most simple, and with cheapest infrastructure costs air cooling relies on fans to carry heat away from components. This has all but been replaced in HPC, as users require increasingly dense computing solutions, which generate more heat per rack of servers.
RDHX is a mixture of air and water. Chilled water is fed to a coil or backplate inside the RDHX. The rack-mount devices eject the hot exhaust air through the RDHX, transferring the heat to the water and ejecting cool air out of the RDHX.
Liquid cooling is the most popular technology used in HPC today. While there are various flavours, the basic concept remains the same. Water is pumped through a closed system up to a back plate placed near the hot components. Water has a better thermal conductivity, so this is potentially a higher performance system but requires additional infrastructure. For example, many water-cooled data centres have a raised floor, so so that all the pumps can be run into the servers.
Immersion cooling directly immerses hardware in a non-conductive liquid. Heat generated by the electronic components is directly transferred to the fluid, reducing the need for active cooling components, such as interface materials, heat sinks and fans that are common in air cooling. As the liquid heats up, it circulates and moves away from the hot components, creating a flow that keeps the component cool. While immersion cooling can theoretically deliver the highest performance and PUE, it is seen by some as too troublesome for their HPC installations, as it can make maintenance of changing components tricky.
PUE is a measure of power from cooling as a factor of total power from the system. It helps to inform managers how much is being spent cooling a system, compared to the total cost of energy used for computation, storage etc.
The market for high-performance cooling systems has grown significantly, as technology shifts from simple air cooling to technologies with water and, in some cases, immersion cooling that can support increasingly dense sever configurations. Water is still the standard for most HPC users as it provides a good balance between performance and cost of set-up and infrastructure.
Innovations in water cooling
Choosing the right technology can be a complicated issue. Expecially when even in a single technology there are several variations. For example, in water cooling there are a number of competing technologies that take slightly different approaches. As mentioned above, water cooling heat exchangers in the rear door or water pumped directly to the backplates mounted on computing components. To make the decision even more complicated, users can now opt to use hot or cold water in some installations. Hot water provides additional benefits, as the temperature does not need to be reduced as much through cooling or evaporation before being pumped back to the HPC system – providing additional energy efficiency.
CoolIT, one of the main providers of direct contact liquid cooling (DCLC) technology, recently announced a partnership with Intel to develop its technology for Xeon scalable processors. At the time of the announcement, the cooling products for Intel’s advanced performance processors (codename Cascade Lake) were expected to be released in the first half of 2019.
While there will be a number of other technologies that can be fitted to these processors in a short timeframe from release, getting an early look at the technology will help CoolIt to design its proprietary coldplate technology specifically to fit the new processors. This time also allows the company to fine-tune the system specifically for this processor technology.
In contrast to the DCLC technology developed by CoolIT, Motivair has opted to put its experience and IP into the development of chilled door RDHX technology. Recently this was announced as the cooling solution for one of Nasa’s supercomputing platforms. Nasa’s Goddard Space Flight Center recently completed commissioning on its expansion of advanced computing and data analytics used to study the Earth, solar system and universe.
This fully-integrated HPC cluster solution is equipped with Motivair’s ChilledDoor Rack Cooling System to handle higher density heat-loads, while maximise the cooling efficiency – reducing PUE as much as possible.
This project was in partnership with Supermicro Computer, and Nasa Center for Climate Simulation (NCCS). The NCCS is part of the Computational and Information Sciences and Technology Office (CISTO), of Goddard Space Flight Center’s (GSFC) sciences and exploration directorate.
Cool technology development
Cooling companies have been fine-tuning their technology to better suit HPC-specific workloads. This kind of development could be used to deliver solutions that can effectively cool more dense supercomputing systems, products designed for specific HPC technology, or entirely new technologies that must be supported through the development pipeline.
In addition, some of the more niche technology providers, such as those delivering immersive cooling technologies, are now able to expand into additional markets as the technology gains adoption.
At the end of 2018 Green Revolution Cooling (GRC) announced the expansion of a channel partner programme, which it has been piloting in select regions for the last year. The programme aims to ensure the availability of local, well-trained and experienced data centre professionals, to assist GRC’s customer base in simplifying adoption of immersion cooling systems.
This can be crucial in delivering the right system to the right user community, as technical sales advisors can advise on the right technology for each installation.
The pilot programme has been focused on Europe and Asia-Pacific regions, and is now expanding into other regions, including the Middle East and South America. At the time of the announcement, Allard Prins, commercial director of Netherlands-based Submerged Cooling Solutions, said: ‘We are excited to be representing GRC in Europe. Our parent company currently supports approximately 2,000 conventional installations in Europe, and we are clearly seeing the demand for more immersion cooling expertise. This partnership programme enables us to responsively deliver that local expertise to customers struggling with data centre cooling challenges.’
GRC’s channel programme enables partners to successfully engage and support customers in applications sectors, including blockchain, academic and commercial high-performance computing, artificial intelligence, defence and high-frequency trading. Organisations with experience in designing, building and maintaining infrastructure for the data centre market are encouraged to apply for participation in GRC’s partner programme.
‘GRC now has customers in 13 countries, and interest in immersion cooling continues to expand globally,’ said Peter Poulin, CEO of GRC. ‘As our business scales, our network of partners is critical to delivering an exceptional customer experience anywhere in the world.’
To date, GRC has expanded its global footprint with 15 partners around the world. Specific emphasis has been placed on Europe, the Middle East and Asia-Pacific regions. GRC has signed agreements and trained partners in Western Europe, Saudi Arabia, Singapore, Japan and China.
At the end of 2018 the Texas Advanced Computing Center (TACC) at The University of Texas, Austin, was awarded a $60 million grant from the National Science Foundation, for its new Frontera supercomputer, making it the fastest at any US university and among the most powerful in the world.
GRC’s immersion cooling system will be used to keep Frontera’s GPU-based servers from overheating in Austin’s 100°F summers.
As the HPC industry approaches exascale, the importance of energy efficiency – and maximising the efficiency and performance of cooling technology – becomes paramount to ensuring that the cost of HPC resources does not become prohibitively expensive.
To meet the ambitious targets for exascale computing, many cooling companies are exploring optimisations and innovations that will define cooling architectures for the next generation of HPC systems.
The best cooling solution ultimately depends on the user’s needs. Some may be willing to accept higher computer room temperatures to reduce cooling costs and handle server faults in software. Others may use free cooling in cold climates.
Technology availability is another factor that influences choice: while some technologies or techniques are grounded like hot aisle and cold aisle containment, others are definitely less mainstream such as adsorption chillers, fuel cells or geothermal cooling. Ultimately, the chosen method for cooling HPC systems should minimise the use of energy by initially consuming less and recovering it, while maximising the system density. It is not easy to find this balance, however.
In the long run, I think that liquid cooling will continue to be the dominant technology, and will continue to proliferate outside the HPC niche and into the general server market in the next few years.
CoolIT Systems Featured product
Rack-based Direct Liquid Cooling technology enables dramatic increases in rack densities, component performance and power efficiencies. CoolIT’s Passive Coldplate Loops are specifically designed for the latest high TDP processors from Intel, NVIDIA and AMD. CoolIT partners with several server OEM groups to provide liquid cooling solutions directly from their factories and backed by original certified warranty. This peace of mind helps to build confidence in Direct Liquid Cooling as the new standard for High Performance and energy efficiency-focused data centers. Ready solutions are available with Cray Shasta, Dell EMC PowerEdge C6420 and Intel Buchanan Pass OEM servers.