The king is dead, long live the king

Share this on social media:

Robert Roe finds that upgrading legacy HPC systems is a complicated business, and that some obvious solutions may not be the best options

Upgrading legacy HPC systems relies as much on the requirements of the user base as it does on the budget of the institution buying the system. There is a gamut of technology and deployment methods to choose from, and the picture is further complicated by infrastructure such as cooling equipment, storage, networking – all of which must fit into the available space.

However, in most cases it is the requirements of the codes and applications being run on the system that ultimately define choice of architecture when upgrading a legacy system. In the most extreme cases, these requirements can restrict the available technology, effectively locking a HPC centre into a single technology, or restricting the application of new architectures because of the added complexity associated with code modernisation, or porting existing codes to new technology platforms.

Barry Bolding, Cray’s senior vice president and chief strategy officer, said: ‘Cray cares a lot about architecture and, over the years we have seen a lot of different approaches to building a scalable supercomputer.’

Bolding explained that at one end of the spectrum are very tightly integrated systems like IBM’s Blue Gene. Bolding continued: ‘They designed each generation to be an improvement over the previous one, but in essence it would be a complete swap to get to the next generation. Obviously, there are advantages and disadvantages to doing this. It means that you can control everything in the field – you are only going to have one type of system.’

At the other end of that spectrum is the much less tightly controlled cloud market. Although some cloud providers do target HPC specifically, the cloud is not fully deployed in mainstream HPC today. Bolding stressed that the current cloud market is made up of a myriad of different servers, ‘so you never really know what type of server you are going to get.’ This makes creating ‘very large partitions of a particular architecture’ difficult, depending on what type of hardware the cloud vendor has sitting on the floor. ‘There is very little control over the infrastructure, but there is a lot of choice and flexibility,’ Bolding concluded.

Adaptable supercomputing

Cray designs its supercomputers around an adaptable supercomputing framework that can be upgraded over a generation lasting up to a decade – without having to swap out the computing infrastructure entirely. ‘What we wanted to do at Cray was to build those very large, tightly integrated systems capable of scaling to very large complex problems for our customers; but we also want to provide them with choice and upgradability,’ said Bolding.

He continued: ‘What we have designed in our systems is the ability to have a single generation of systems that can last for up to 10 years. Within that generation of systems, as customers get new more demanding requirements, we can swap out small subsets of the system to bring it to the next generation. We build and design our supercomputers to have a high level of upgradability and flexibility; much higher than for instance, the IBM Blue Gene series.’

Keeping the plates spinning

One critical point for organisations that provide continuous services as a result of their high-performance computing systems is that these services have to continue during upgrades. The UK’s Met Office, which provides weather forecasting for government, industry, and the general public, is a case in point. Its HPC centre provides time-sensitive simulations in the form of weather reports, but also flood warnings and predictions for potentially catastrophic events such as hurricanes or storm surges. As such its system absolutely cannot go out of production, and any upgrades must be carried out seamlessly without disruption to the original system.

This very specific requirement is also faced by the system administrators at Nasa’s High-End Computing Capability (HECC) Project. The HECC has to run many time-sensitive, simulations alongside their usual geoscience, chemistry, aerospace, and other application areas. For example, if a space launch does not go exactly according to plan, simulations will be needed urgently to assess if there was any significant damage, and how the space probe should be managed for safe re-entry and recovery. Apart from the safety issue of ensuring that a space probe does not re-enter the atmosphere in an uncontrolled fashion, with the risk of it crashing into a populated area – if it were a manned mission then the safety of the crew depends on fast and accurate simulations.

William Thigpen, advanced computing branch chief at Nasa, explained that this wide-ranging set of requirements, combined with a need to provide more capacity and capability to its users, puts the centre in a fairly precarious position when it comes to upgrading legacy systems. The upgrade process must be managed carefully to ensure there is no disruption in service to its varied user base.

Thigpen said: ‘A technology refresh is very important but we can’t shut down our facility to install a new system.’ He went on to explain that Nasa is not a pathfinding HPC centre like those found in the US Department of Energy’s (DOE) National Laboratories – the HECC is focused primarily on scientific discovery, rather than testing new technology.

Thigpen explained that when it is time to evaluate new systems Nasa will bring in small test systems – usually in the region of 2,000-4,000 cores, around 128-256 nodes – so that the systems can be evaluated against a wide range of applications used at the centre. Thigpen said: ‘The focus at Nasa is about how much work you can get done.’ In this case it is the science and engineering goals, coupled with the need to keep the current system operational, which necessitates that Nasa focuses on scientific progress and discovery rather than ROI, FLOPs, or developing a specific technology – as is the case for the DOE.

At Nasa, because the codes are so varied, it means that any new system must be as accommodating as possible, so that Nasa can derive the most performance from the largest number of its applications. In some ways, this actually locks the HECC facility into using CPU-based systems, because any switch in the underlying architecture – or even a move to a GPU-based system – would mean a huge amount of effort in code modernisation, as applications must be ported from its current CPU-based supercomputer.

Assessing requirements

So the requirements for upgrading HPC systems can range from the obvious, such as increasing performance or decreasing energy consumption, to the more specialised, such as path finding for future system development at the US Department of Energy’s National Laboratories, or the pursuit of science and engineering goals for HPC centres including NASA’ High-End Computing Capability (HECC) Project.

Against this background, it is necessary to understand how a system is utilised currently, so that any upgrades can increase productivity rather than hinder it.

One aspect that Bolding was keen to stress is that Cray can offer hybrid systems that can take advantage of both CPU and GPU workloads. ‘If you have workloads that are more suited to the GPU architecture, you can move those workloads over quickly and get them in production,’ he said.

 An example of this architecture is the Cray BlueWaters system, housed at the University of Illinois for the US National Center for Supercomputing Applications (NCSA). This petascale system, which is capable of somewhere in the region of 13 quadrillion calculations per second, is a mix of a Cray XE and a Cray XK, and contains 1.5 petabytes of memory, 25 petabytes of disk storage and 500 petabytes of tape storage.

‘It is basically combining those two systems into a single integrated framework, with a single management system, single access disk and its subsystems to the network. You just have one pool of resources that are GPU based and a pool of resources that are CPU based,’ explained Bolding.

Cray has tried to reduce the burden of upgrading legacy systems by designing daughter cards, which remove the need to replace the motherboards – as the CPU or GPU fits directly into the daughter card and then into a motherboard. This allows newer CPU technology to fit into an older motherboard without the need to replace the whole system.

Bolding said: ‘Cray has designed small cards to make that a more cost effective change for our customers. We designed small cards on which the processors and the memory are socketed, so when a new generation comes out, we are actually only replacing a very small component.’

However, the particular requirements of Nasa’s users mean that switching to GPUs is viewed cautiously. Thigpen went as far as to say that if Nasa upgraded to a GPU-based system, then an increase in performance of anything up to 25 per cent would still be insufficient to warrant the extra effort that would have to go into porting applications, and then optimising the codes for GPUs. ‘We need to support the user base,’ said Thigpen.

Pleiades, the HECC machine, is a SGI supercomputer made up of 163 racks (11,312 nodes) which contain 211,872 CPU cores and 724 TB of memory. The system also has four racks that have been enhanced with Nvidia GPUs but, as a fraction of the total system, this  is used for only a few applications that really lend themselves to the massively parallel nature of GPUs.

Continuing contracts

It is a common practice in HPC to award multi-stage, multi-year contracts, rather than one-off procurements. The frequency of such multi-stage contracts is not increasing, which suggests that it is a specific group of HPC users with precise requirements that most often use these kinds of contracts because they ensure built in upgrades. Bolding said: ‘An example of that is the UK’s Met Office, where they have a multi-stage installation over time. We have seen it for many years, which is why we made it a market requirement for our products.’

Another example is a contract that was awarded to SGI in March 2015, by the energy company Total; it choose SGI to upgrade its supercomputer ‘Pangea’, which is located at Total’s Jean Feger Scientific and Technical Centre in Pau, France. SGI boosted Pangea’s then current SGI ICE X system with an additional 4.4 petaflops of compute power, supported with M-Cell technology, storage and the Intel Xeon Processor E5-2600 v3 product family.

At the time of the announcement, Jorge Titinger, president and CEO, SGI highlighted that Total had been a customer of SGI for more than 15 years, and SGI’s goal has been continually to provide the capacity and power required for the company to pursue its oil and gas research throughout the world. Upgrading and supporting systems over a continued period is crucial, as companies – even those in the oil and gas sector, do not want to have to buy in a completely new HPC system every few years.

The UK Met Office announced that it was purchasing a new £97 million Cray system earlier this year. A multi-year project, the system  – based on a Cray XC40 – will be somewhere in the region of 16 petaflops of peak performance with 20 petabytes of storage once the contract is finished in 2017.

Bolding said: ‘Some of the first sites where we saw these large multi-stage procurements were some of the national weather centres. They have always bought multi-stage because they don’t like to swap out their infrastructure very often, as it risks taking them out of production.’

One aspect of using multi-year contracts, especially for storage, is that customers can take advantage of the inexorable march of technology, as hard disks will at least increase in capacity, if not drop in price every year. By only buying as much storage as is needed immediately, and then adding more capacity as required, customers will often get a better price for the same capacity.

There will continue to be a number of options for upgrading legacy HPC systems and, while some may seem more appealing than others, ultimately it takes a careful understanding of the system’s user base to decide the best path, and the rate at which to upgrade.