Managing energy efficiency at NASA
All of today’s modern supercomputers must be optimised in some way for energy efficiency because of the huge power consumption of large supercomputers. The Top500 is a prime example of this. Each of the top 10 systems consumes megawatts of power, with the very largest consuming in excess of 15 megawatts.
These systems represent the largest systems in the world and are not typical of the average user, but they do demonstrate the limits of today’s HPC technology. Without sufficient action to reduce power consumption, the next generation of systems will continue to drive up power requirements.
William Thigpen, advanced computing branch chief for the NAS, commented: ‘If you look at any of the top systems they are all drawing multiple megawatts of power. They are taking somewhere in the region of a 33 to 50 per cent of that power, just for cooling.’
‘These systems draw a large amount of power, but if we can save something on the order of a third of the power by doing things in a smarter way I think it is our duty to do that,’ commented Thigpen. ‘We should be good stewards of the earth.’
One approach to solving this problem is being developed by the NASA advanced supercomputing division (NAS) as part of its high-end computing capability (HECC) project. NASA hopes to solve some of its future power issues by developing a modular HPC cluster using energy efficient technology and making use of evaporative cooling and the local climate to remove the need for much of the water and power used to cool the system.
NAS operates NASA’s High-End Computing Capability (HECC) Project, which is funded through the agency’s High-End Computing Program and its Strategic Capabilities Assets Program. HECC supports more than 1,200 users from around the US, with more than 500 projects running at any one time.
The NAS Division is part of the Exploration Technology Directorate at Ames Research Center. The Directorate’s mission is to create innovative and reliable technologies for NASA missions.The NAS facilities latest supercomputer Electra is a 16-rack, 1,152-node cluster from SGI which delivers 1.2-petaflop peak performance (1.09 Pflop/s LINPACK rating #96 on November 2016 TOP500 list).
This new system is the first iteration of this new energy efficient design and has been determined to operate within a power usage effectiveness (PUE) range of 1.03 to 1.05. This is a major improvement on Pleaides, NASA’s flagship supercomputer which operates with a PUE of approximately 1.3. Pleiades currently sits at number 13 on the Top500.
This new system is the first generation of this new modular design to supercomputing at NASA but the success of the project has led Thigpen and his colleagues at NAS to seriously consider employing this design when they come to replace Pleiades.
‘We are looking to start building a large modular system that will still be in this PUE range in the 2018 timeframe,’ said Thigpen. ‘Even this year we are planning an expansion to Electra that, when fully populated, would be an eight-petaflop computer.’
Although the power savings are an important driver behind this, Thigpen stressed that the modular design affords a large degree of flexibility when upgrading the system. For example, NASA’s last supercomputer, Pleiades, has been upgraded with seven generations of Intel processor technology and it is expected that the follow-up to Pleiades will also go through several years of upgrades during its lifetime. A modular design helps systems designers to upgrade, as new modules can easily be attached to the existing system.
Thigpen said: ‘I think the modular solution really affords a way to easily expand and you really don’t even have to know what your end system is because you can add the components that you need as you grow into them. We do not need to know for instance that we might be limited to 15 megawatts. Maybe we will be at six megawatts – but, in either case, using this approach we can add what we need, when we need it.’
Electra is a self-contained, modular HPC design with energy efficient HPC as its primary focus. The new system has been dubbed a Data Center on Demand (DCoD) because of its modular design. However, this modular design does not just provide large savings to energy as the new system also reduces water consumption by as much as 99 per cent.
As a direct comparison, Cooling the 2,300 processors that the container is designed to hold would require approximately 6,000 gallons of water per year – compared to the almost two million gallons it would take to cool those same processors in the current NAS facility.
This is largely because the NAS facility ‘wastes’ water as it is used to cool the HPC system which subsequently evaporates. The current NAS facility uses approximately 50,000 gallons of water each day which evaporates into the air through a cooling tower located next to the main NAS facility.
Electra obtains the impressive PUE figure through a combination of outdoor air and evaporative (adiabatic) cooling to get rid of the heat generated by the system. The initial system will contain four racks of SGI servers dedicated to scientific computation, although an expansion has already been announced for later this year.
‘I can’t stress enough the advantages over traditional computing centres on energy efficiency,’ said Thigpen. ‘It’s not just the electricity. Our current system is between a PUE of 1.26 and 1.3 – in that range. Looking at going to 1.03 is really great from an electricity standpoint, but from a water point of view we are saving 99 per cent of the water.’
This saving allows the facility to spend more money on computing resources, increasing the amount of science and engineering generated by the facility.
However, Thigpen stressed that although the potential increase in total performance is welcome, it is important to focus on energy efficiency, purely to reduce the amount of energy we waste powering today’s supercomputers: ‘We [NAS] do not only look at engineering models that are simulating the next launch vehicles, how the universe was formed, or what happens when black holes collide – we also look at the impact of man on the environment.’
He explained that for this project the NAS facility paid for their own utilities in addition to all of the other associated costs that come with setting up a new supercomputer – and that it’s this holistic approach to resource management that separates NAS from other centres.
‘We are responsible for paying for everything – there are a lot of computing centres that do not ever get a power bill or a water bill but someone in that organisation is paying for that. To be able to save that money goes directly into being able to do more science and engineering.’
The challenge of reducing energy-efficiency and increasing performance is a constant struggle for any HPC centre. The message here is that efficiency savings can be made when planning a new data centre or supercomputer. To make the biggest savings requires that data centres are designed from the outset with energy efficiency in mind.
‘I would say that anybody that is running a supercomputer whether they are in a federal agency in some country or whether they are in a private company – they do not want to waste money, they want results. This is a way of getting more bang for your high-end computing buck, concludes Thigpen.’