An energy crisis in HPC
The Dell-powered Hyperion cluster at Lawrence Livermore National Laboratory aims to be first non-proprietary petascale supercomputer.
Green computing is becoming important for HPC, not only for reasons of energy conservation and cost reduction, but also because data centres are reaching the limits of power available to them. Paul Schreier examines steps being taken to cut HPC power requirements
In our report on the grid computing facilities at CERN , we noted that computational tasks are being farmed out to scientists around the world, not only because it gives these experts immediate access to never-before-seen data, but also because the CERN computers have reached the point where a new power-generating facility would soon have to be built for any further server expansion. A major limiting factor has become cooling capability. HPC suppliers universally agree that this is no unique situation, and many data centres are running into this problem. Purchasing ‘green’ energy-efficient computers isn’t just a matter of protecting the environment or cutting energy bills; in some cases, it’s a necessity just to get the amount of computing power you need for your scientific research projects in the available space and in the available power budget.
HPC suppliers are addressing this problem on a number of fronts: at the chip level, at the blade/box level, in computing infrastructure, and with software. HPC users must consider the entire ecosystem ‘from chips to chillers’ when discussing data centre optimisation with their IT managers. Power has only recently started to become such an issue. In the ‘old’ days of HPC, just a few years ago, the first two questions that vendors heard related to performance and cost, relates Ed Turkel, product marketing manager for scalable computing and infrastructure at HP. Now, the dialogue is different: for users, power, size and cooling are just as important. Given a three-year hardware lifetime, over that same period power costs can equal hardware costs.
Note that the largest supercomputers can consume megawatts. Japan’s 5120-processor Earth Simulator requires 11.9 MW, enough for a city of 40,000. Of all this power, only a fraction is used for actual computations; according to Lawrence Livermore National Laboratory, for every watt of power its IBM BlueGene/L consumes, 0.7W of power is required to cool it.
Removing this extra heat is crucial, and the penalty for not doing so is greatly reduced reliability. Wu-Chun Feng of Virginia Tech, one of the founders of the Green500 List and an early proponent and experimenter with energy-efficient systems, reports that empirical data supports his theory that every 10oC increase in temperature results in a doubling of the system failure rate.
Processors are major consumers
For a good place to look to reduce power, consider studies from IBM, which indicate that power consumed by processors make up roughly 20 to 30 per cent for a mainframe while, for a blade, it is more than 50 per cent. Given this potential, chip suppliers are making progress. IBM’s BlueGene/P supercomputer – which doubles the performance of its predecessor BlueGene/L while consuming just a little bit more power – uses the PowerPC 450 chip. A specialised device within the PowerPC family, it is designed for low power consumption running at 850MHz (compare that to a Power6 in a high-end HPC box at 5GHz). In fact, says Herb Schultz, IBM deep computing marketing manager, the world’s fastest computers take the approach of using lower clock speeds, meaning somewhat lower performance, but putting many of these chips in a supercomputer – what some call the ‘army of snails’ approach. This leads to a better-than-linear performance-topower improvement and is why you see so many processors in BlueGene/P computers, which are scalable in increments of 1,024 up to at least 65,536 nodes, each with dual processors.
For very specialised applications, IBM uses its PowerXCell, a special member of the Power family designed for specific workloads. For instance, this chip – which was originally designed for gaming and animation – has vector capability and eight special on-chip units to execute specialised instructions. For certain applications this chip has a big payback: a system based on it leads the major lists of energy-efficient computers, such as the Green500 (www.green500.org), by a wide margin.
Schultz adds that there is a strong trend towards hybrid computing, where one machine has several types of processors. Some address general-purpose applications while others are chosen to solve specific problems. This approach, however, requires special software – and not all codes are amenable to this technique. Eventually he sees that other hardware vendors, as well as independent software vendors (ISVs) and software-tool vendors, will recognise this market potential and develop corresponding products.
The most energy-efficient HPC system in the latest Top500 list and Green500 list is at the Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw, and is based on an IBM BladeCenter QS22 cluster.
Tim Carroll, senior manager for HPC at Dell, isn’t so convinced: ‘In HPC, the first goal is getting results to scientists. Not all their applications are written to run on specialised processors, and not all of them are interested in rewriting their code. People recognise the value of semiconductor research, but more and more they want central IT departments to manage their resources, not researchers.’
To save power, some commercial chips use dynamic voltage and frequency scaling (DVFS), examples being Intel with its SpeedStep and AMD with its PowerNow! and Cool’n’Quiet. Here, the voltage and frequency of a processor, or some of its subsystems, are scaled down depending on the tasks it must perform. For instance, when a processor is accessing external memory there’s no need to keep the CPU running at full speed. But, because transferring between power-off and power-on states can take time, applying DVFS can have a detrimental impact on system performance, which is crucial in HPC systems.
Many HPC systems stay with the standard version of processors that consume 80 to 95W each, but are not run at the absolute highest clock speeds. These chips cost less than the ‘bin 1’ level chips with the best performance, but they represent a good balance of price/performance. Power is also driving processor architecture adds HP’s Turkel. Rather than place all the transistors in one core, the trend is to split the transistors among two or four slightly lower power cores, but in a scheme that consumes far less power.
Clock speed is just one element of power savings, notes IBM’s Schultz. First, you can have software that monitors a system at a high level, such as stopping spinning disks and sharing a workload. Certain programs can exercise different pathways through electronics in a PC. Companies are also writing algorithms to be more power efficient so they are tuned for specific hardware. For instance, IBM has an engineering maths library tuned for its Power platform.
Blades on the move
Looking beyond semiconductor devices, the power supply and fans in a server can draw just as much power. And, while computer suppliers are moving to more efficient power supplies, the fact is that blades offer more power efficiency than do racks, which until recently were the ‘bread and butter’ of servers – but that is changing. Not only can blades share resources such as power and cooling, they can eliminate unneeded components such as graphics chips. Further, blades reside in managed enclosures where microprocessors can set the conversion point of power supplies and control fan speed. In fact, even though 1Us are becoming more efficient, HP’s Turkel reports that some customers have seen a 22 per cent reduction in power consumption when comparing 1U rack-based servers to blades. Meanwhile, more than half of HP’s cluster sales are in blades, and of the 209 HP entries in the Top500, 201 are systems with blades.
Simply going to servers, however, doesn’t always achieve the desired computing densities. In some data centres, for instance, only the bottom row of slots in a four-tier chassis are populated, because otherwise there’s insufficient cooling to handle the extra heat in the chassis and in the data centre.
Data centres in a container
Removing heat from a computing system with an air conditioner is not necessarily the most efficient method. In addition, unless the data centre is laid out properly, air cooling can lead to hot spots among and between servers. Further, air-conditioning systems can take a large amount of space, even exceeding the computer footprint. These are among the reasons why manufacturers of blade enclosures are looking at ways to take heat directly from the enclosed chassis before it enters the room. Indeed, water cooling is making a comeback, and IBM has retrofitted some of its products with water cooling.
In its latest cabinet, Cray applies its ECOphlex phase-change liquid exchange technology. It uses only a small amount of chilled water and instead relies on an inert coolant that gets converted from a liquid to a gas. The phase-change coil is more than 10 times as efficient as a water coil of a similar size. And with the ability to remove more heat, adds Steve Scott, Cray’s CTO, there’s less incentive to improve energy efficiency by using low-power processors or running processors at reduced clock rates.
Realising how great a saving is possible in data centres, HP just over a year ago purchased the company EYP Mission Critical Facilities, which is dedicated to optimising the design of new data centres and improving efficiency in existing ones. Another aspect is HP’s Performance Optimised Data Center, which is a 40-foot portable container filled with racks and cooling and easily deployable indoors or outdoors where extra highly efficient compute resources are needed. When looking at PUE (power utilisation efficiency) metrics where the perfect score is unity, older data centres might have a rating of two or three, while well-run data centres have a rating of 1.6 or 1.7 and very well run centres score at 1.2 or 1.3. The ‘pod’, using water-cooled heat exchangers in the ceiling along with heat exchangers on the sides of sealed racks, achieves a rating of 1.2.
A similar ‘container’ approach is taken at Dell’s Data Center Solutions group, where some customers are surprised how containers stacked two high might mean they don’t have to erect a new computer facility. This is especially interesting for companies such as those in the Web 2.0 environment, where hyperscale data centres are deploying a thousand nodes per month.
Software shares the load
Many servers run at utilisation rates that are far too low, sometimes even 10 or 20 per cent, but they still need 60 per cent of full power to keep them available. Rather than install a new server for each application, known as ‘server sprawl’, why not set up a virtual server? This approach uses the untapped processing of multiple physical servers to create a logical server. The utilisation factor for each physical server increases dramatically, often leading to the greenest of all computers – the one that doesn’t need to be installed. It’s also possible to virtualise storage to make the most efficient use of disk capacity. With the virtualisation approach, IBM notes that some of its customers have seen a 30 per cent reduction in power consumption.
HPC systems delivered in a container with all the latest energy-saving features can be an alternative to erecting a new building.
Another approach is to use realtime temperature and wattage data available from modern servers that conform to the Intelligent Platform Management Interface (IPMI) standard. After collecting such resource data, Moab software from Cluster Resources examines historical and scheduled workloads and then assigns jobs to those resources with the highest performance-perwatt capabilities. It further reduces energy use by automatically activating power-saving modes (standby or sleep) and power on/off, and it virtualises server environments and consolidates workload from underutilised servers. In addition, it sends CPU-intensive workload to cool nodes to allow hot nodes to cool off to reduce cooling demand and reduce hardware failure; it assigns to hotter nodes those jobs that produce less CPU heat (for example, jobs that are memory intensive rather than CPU intensive); reduces cooling costs by lowering the temperature of computing resources; schedules CPU-intensive low-priority jobs for off-peak hours to take advantage of lower energy costs; and sends workload to IT locations where energy costs are lower. While this software generally represents one to three per cent of the hardware costs, it can result in from 8.5 to 25 per cent in energy savings, says company president Michael Jackson. Further, these techniques can additionally improve workload productivity by from 10 to 30 per cent, all with the same hardware resources.
How far we’ve come
To consider the effects of all these efforts, one place to start is with the Green500 list of the most energy-efficient supercomputers. Of course, it is difficult to find the perfect objective benchmark that encompasses all issues, so the two researchers who put together the list – Dr Wu-chun Feng and Dr Kirk W Cameron, both from Virginia Tech – selected Mflops/watt. The #1 machine, with power consumption of 536.24 Mflops/watt, is located at the Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw, and is based on an IBM BladeCenter QS22 cluster using 4-GHz PowerXCell 8i processors. It is among the four systems on the list that have surpassed the 500 Mflops/watt barrier for the first time. It’s also instructive to note that peak computational power need not come at the expense of energy efficiency. In fact, the #1 entry on the Top500 list of the most powerful supercomputers – also a BladeCenter QS22 cluster – comes in at #7 on the Green500 list.
The Green500 list is based on the same systems as in the Top500, but it combines measured power numbers with what some people consider meaningless power numbers obtained from data sheets or elsewhere and ranks all systems then by calculating Rmax/Power without taking different architectures and systems sizes into account. Says Dr Frank Baetke, program manager of HP’s Global HPC-Technology Group: ‘Since the Top500 committee has started to add measured power numbers in the June 2008 edition of the list, we feel that it is more appropriate to supply measured power numbers to the Top500 only. If needed, it is easy enough to calculate the efficiency quotient Rmax/Power [Mflops/W]. It is obvious that systems based on special-purpose processors have different power characteristics than systems based on a general-purpose x86-compatible processor. We feel that only systems based on similar processors, architectures and comparable programming paradigms should be compared in regard to energy efficiency, a position also taken by the Top500 committee during the announcement and at the “Birds of a Feather” session at the SC08 conference. HP is not at all displeased with the 227 Mflops/W efficiency of the BL2x220 as that number places the system into the leading x86-based architectures.’
For its part, Dell doesn’t make putting a lot of systems on such lists a strategic goal, ‘and the Green500 is not one of our metrics,’ says Carroll. ‘Making sure we optimise a customer’s data centre – if that is a metric of success, we’re doing well above the norm. The Green500 is a recent phenomena, and it takes some years to look at system ratings and learn how to score. “Green” is just not performance per watt; it includes other factors such as CO2 emissions, disposables such as excess packaging, and toxic components. Benchmarks are like tax law: people can always find loopholes.’ Even so, Dell took 19 spots in the Green500 whereas Cray took 22 spots and SGI had 17, and they’re satisfied that they are easily holding their own in that respect.
Summarising the concepts in this article nicely, Carroll states that customers must be sure to anticipate power needs before looking to specify computers. The lead time for making facility changes, such as for getting electricians, can be weeks longer than the lead time for computers, and that can have an impact on research time availability. They must also budget for air conditioning, hot air and properly distributed power. ‘It would be a shame if when a shipment of systems comes in, you can only power up half of them.’ Such an idea is not so preposterous in today’s world.