The power efficiency of HPC systems is improving, but at a slower pace than computer performance is increasing. A new generation of HPC systems usually draws more power than its predecessor, even if it is more power-efficient. A major problem for the next generation of HPC data centres today is how to power and cool the systems – both at the high-end, where exascale systems are on the horizon, and at the mid-range where petascale departmental systems will become a reality. Without significant changes to the technologies used, exascale systems will draw hundreds of megawatts. According to the report Growth in data centre electricity use 2005 to 2010 by Jonathan Koomey of Analytics Press, data centres were responsible for two per cent of all electricity used in the USA in 2010. So the power required to run HPC data centres is already a problem, and one that is only going to get worse.
We are seeing a rise in the use of HPC use in enterprise, e.g. the use of HPC tools and techniques to handle big data, and the processing of data at internet scale. New approaches are being applied to the design of HPC systems in order to improve their energy efficiency (e.g. the use of processors or accelerators that consume less power), but this adds to system complexity, makes the systems more difficult to program and offers only incremental improvements. The bottom line is that as HPC systems become ever more capable, they put increasing demands on the data centres in which they live.
Reducing the power consumed by HPC data centres has three components. The first is to ensure that the data centre itself is efficient, the second is to cool the HPC systems effectively, and the third is to design HPC systems that consume less power.
Efficient data centres
Ten years ago the electricity used to power a supercomputer was a relatively small line item in the facility costs. The cost of powering and cooling an HPC system during its lifetime is now of the same order of magnitude as the purchase costs of the system, so has to be budgeted for during the procurement. It is not uncommon for an HPC data centre to draw 10 times as much power today as it did 10 years ago.
HPC data centres house supercomputers, but also much ancillary equipment that supports these supercomputers. Once you have bought your power-efficient supercomputer, you need to ensure that the data centre as a whole is efficient. The Green Grid uses the power usage effectiveness (PUE) metric. In an ideal world the total facility energy would be close to the power used by the IT equipment, giving a PUE of close to unity.
Flaws in the system
While PUE can be a useful metric, it is flawed. If an HPC system is updated and the revised system draws less power, if nothing else in the data centre changes the PUE actually gets worse, despite the fact that less power is being used. So care must be taken when considering PUE figures.
The total facility energy covers things such as UPS, battery backup, and cooling. A PUE figure of around 2 is common, while some of the largest internet-scale companies and HPC data centres now claim PUE figures of better than 1.2. The league table for the fastest supercomputers, the TOP500 list (www.top500.org), lists the fastest 500 HPC systems in the world (or at least those that have been made public), providing interesting information about the technology used to build each system, its size and performance, as well as the power it consumes.
An alternative supercomputer league table is the Green500 list (www.green500.org). This lists the most energy-efficient supercomputers in the world. Rather than focusing on peak performance, the Green 500 is interested in performance per watt (it lists systems by the number of megaflop/s they deliver per watt) – see box. All of the top 10 systems on the current list use Nvidia K20 GPUs. The majority of the next 20 entries are IBM BlueGene/Q systems followed by clusters using Intel’s Xeon Phi accelerators. The top-ranked system delivers 4,503 megaflop/s per watt and the top 48 systems all use exotic technology (IBM BlueGene/Q, Nvidia GPUs or Intel Xeon Phi).
The first commodity cluster system on the list that does not use compute accelerators is at number 49 and delivers 1,248 megaflop/s per watt. While this looks poor compared to the specialist systems at the top of the list, many HPC systems in operation today deliver less than 10 per cent of that efficiency. One conclusion that can be drawn about the power efficiency of supercomputers is that there is a very wide spectrum of power efficiency available today, and that going beyond the standard components in commodity servers can greatly improve efficiency, but at the cost of ease of programming.
Cost implications
Data centres do much more than provide a cost effective, energy-efficient environment for the supercomputer. The quality of service required will vary depending on the type of supercomputer facility being run. For example, the resilience and security requirements of an HPC service supporting real-time risk analysis in investment banking will be very different from what is offered by an academic facility. Resilience can be improved by adding redundant compute, storage, power, cooling, and networks – but all at significant cost, so this is not always appropriate.
Providing UPS cover for all compute nodes can add 10 per cent to the power budget, but many HPC centres can live with occasional unscheduled down time so it may be better to provide UPS protection only for management nodes and storage. If a data centre is purpose-built to house a specific machine it can be very focused and efficient. However, many HPC facilities need to support more than one large system, and also a range of smaller systems exploiting different technologies. A modern high-end HPC system is probably liquid cooled, but older or smaller systems may use a combination of liquid cooling, air cooling and hybrid options (such as air-cooled components with liquid-cooled rear doors).
In these mixed environments, it can be difficult to run a data centre at a PUE of better than 1.5.
When a parallel application executes across all of a large HPC cluster, the power draw can be dramatic. An HPC system moving from just ticking over running the operating system to being busy can double the power required, while hammering away on a computationally very intensive parallel application can take 50 per cent more power. This behaviour is very unlike the use-pattern in a commercial data centre or cloud system, which tend to even out spikes as the workload is generated by many users running many applications rather than a single user running a single application. This variable power usage is an issue for two reasons.
The first issue is cooling. In a commercial data centre that allocates part of its capacity for HPC, this can cause physical hot spots, while for large HPC data centres the requirements on the cooling capacity for the whole data centre can change significantly at short notice.
The days of running the cooling system all of the time at the rate required to cool the system when it is running at full tilt are at an end, as the cost is too high. More dynamic cooling systems are now required to adapt to the changing workload. This is very important for large HPC data centres, which must work hand in glove with their power providers to ensure they have the power that they need, when they need it, at a price that is affordable.
Designing a contract that allows such flexibility, and remaining within the confines of that contract in order to ensure that you have the required power at the required price, can be a difficult balancing act. High-end HPC data centres today have power supplies in excess of 10 MW, and that requirement is only going to increase – at least in the short to mid-term.
Efficient cooling
Water is orders of magnitude more efficient than air at removing heat, and if warm water is used rather than cold water, there is no need to use chillers (at least for most of the year in most parts of the world). So the theoretically ideal way to cool the system is using warm water – but this can make the system infrastructure much more complex and therefore expensive. For air-cooled systems it is important to separate the warm and cool areas in a data centre. Allowing warm and cool air to mix makes it more difficult to maintain the desired temperature for the supercomputer. While many modern data centres handle this very well, others do not. Increasing the inlet water temperature can have a significant impact. In many countries, this allows the warmed water to be cooled by external air most of the time without the need for chillers, leading to a drop in PUE of more than 10 per cent.
Some HPC systems use air cooling (and whatever cooling technology is used, it is still a good idea to keep the data centre at a temperature that is comfortable for humans to operate in). Alternative options include direct liquid cooling taking the coolant to the source of hot spots, indirect cooling where the components are air cooled but the air then passes through a liquid-cooled door. Depending on the ambient temperature, the resultant water can then be cooled by chillers or a cooling tower. If there are several systems that are all water cooled, that may seem to suggest a standard approach to cooling, but different equipment from different manufacturers seldom comes with the same recommended water temperature. All of which suggests that HPC data centres of the future must have lots of capacity, and lots of flexibility in order to support a wide range of requirements.
Efficient systems
It can cost as much to power and cool an HPC system as it costs to buy it in the first place, and (without significant efficiency improvements) the first exascale system will consume hundreds of MW of power, something that is not sustainable. So a number of things need to be considered. First, the power consumption of individual components should decrease (for example, by using low-power ARM cores developed for the mobile market). This implies that we must use many more of them, adding to the level of parallelism and therefore programming complexity. Second, the use of accelerators, such as Nvidia GPUs and Intel’s Many Integrated Core (MIC) Xeon Phi devices, can deliver higher compute performance in a lower power envelope. Third, alternative architectures such as IBM’s BlueGene can deliver excellent compute performance more efficiently than clusters of standard servers.
When considering HPC data centres it is easy to focus on the facility and the systems, but the systems software (e.g. energy-efficient job scheduling) and the efficiency of application software also need to be considered. If applications are not optimised for the target architecture, it doesn’t matter how efficient the data centre is – it will be wasting energy. The aim should be to minimise the energy required for the application to generate an answer.
The need for energy-efficient programming
Without a doubt, the biggest issue confronting HPC data centres today is power consumption. The use of alternative technologies such as accelerators and efficient cooling strategies using hot water and free cooling can reduce overall power consumption, and increased efficiency has been demonstrated by PUE figures that have improved from around 2 to much closer to 1 in recent years. While these steps are positive, they miss the point. The HPC industry has been talking for some years about building exascale systems by the end of this decade. Such a system built from today’s technology would require around 100 million cores and would draw more than 500 MW. The target maximum power consumption for an exascale system is 20 MW, so staggering improvements are required if the industry is to get close to its exascale target.
As stated at the beginning of this article, reducing the power consumed by HPC data centres has three components – building efficient data centres that house efficiently cooled HPC systems that have themselves been designed to consume less power. The first two points are already being well addressed by the industry, and there is room for few additional savings if a data centre has a PUE of close to one and deploys warm water cooling, especially if some of the heat generated is captured. But the elephant in the room is the power consumed by the HPC systems themselves. Unless the power consumption can be reduced significantly, data centres will neither be able to cool nor afford to power future HPC systems.
What is all of this electrical power used for? Ironically, very little of it is used to drive computation. Most of it is used to move data from one place to another. So instead of building better data centres perhaps the industry should focus on building a very different style of HPC machines. Adding power-efficient accelerators or using liquid cooling rather than air cooling are refinements of existing technologies, but are not the game changers required to make exascale systems and their data centres a reality.
There is a need to reduce the power used moving so much data. How can that be achieved? Step one is to build more power-efficient components for handling data, but that will bring only small wins. Step two is to design systems with a much higher degree of integration, so that data movement can be minimised. But to bridge the gap between 500 MW and 20 MW we need to do things very differently. The current algorithms, programming models, and technology roadmaps won’t get close to where we need to be, so we must aggressively explore different approaches, different algorithms, different programming models and different technologies.
It feels as if the HPC industry is sleep-walking its way towards the failure to deliver affordable, usable exascale systems within the target timescale of the end of the decade. Addressing issues relating to data centres in support of the next generation of HPC systems without radically changing the way we build and program them may be no more valuable than rearranging the lifeboats on the Titanic.
With more than 30 years’ experience in the IT industry, initially writing compilers and development tools for HPC platforms, John Barr is an independent HPC industry analyst specialising in technology transitions.