The next generation
Dr Oz Parchment, IT infrastructure services manager, University of Southampton, describes his role and discusses the university's latest supercomputer, Iridis 3
The University of Southampton has an excellent pedigree in HPC and has been operating supercomputers for the past three decades. Its latest machine, Iridis 3, the third installation of the Iridis cluster, was officially launched in February 2010 and provides a substantial improvement in performance, giving researchers 20 times more computational power to play with than its predecessor.
Iridis 3 is the result of almost two years of planning and development in cooperation with OCF, which designed and built the system, and IBM. The machine uses IBM iDataPlex server technology and contains more than 8,000 processors, making it one of the largest university-owned supercomputers in the UK (ranked 74 in the Top500). It is also one of the greenest, ranked 25th in the Green500 list, and the second most energy efficient supercomputer in the UK, operating at 299.52 Megaflops per Watt of power.
The efficiency of Iridis 3 is down to IBM’s iDataPlex technology, which reduces the number of fans and power supplies needed to cool and power the components. It also uses an inbuilt water cooling system incorporating a heat exchanger to cool the expelled heat before it enters the data centre, generating up to 400kW of cooling power. The efficiency of the machine was something we and OCF had to consider very carefully; Iridis 2 was air cooled, but we stipulated that whatever IBM provided us with had to be water cooled, simply in order to run it in the university’s data centre. The data centre, a building dating back to 1975, isn’t equipped to handle the heat generated from the supercomputer’s 1,000 nodes without water cooling, and it’s only through the innovative design provided by OCF and IBM’s iDataPlex server technology that we were able to run such a powerful machine.
Iridis 3 was delivered at the end of September 2009 and made live to the University’s user base in November 2009. It is now operating at up to 99 per cent capacity during the day, thanks largely to Adaptive Computing’s job scheduler, which optimises system utilisation. Scheduling is a very complicated process and the program has to make really quite complex decisions to schedule the various jobs with their requirements onto the available nodes in an efficient manner.
One of our priorities is to grow new areas of activity (even though the machine is at 99 per cent usage) and the scheduling method employed had to be fair and equal for everyone at the university. Scheduling is partly dependent on prior usage, in that researchers who haven’t used the facility for a while move to the front of the queue, while heavy users have to wait their turn. We’re also putting together a portal for nontraditional users to try to encourage other staff to utilise the facility.
Currently, there are around 500 users of the system from a range of disciplines. One of the chemistry projects is investigating multiscale modelling of protein interactions at cell membranes. Standard simulation techniques are computationally expensive and the researchers hope to develop a multiscale approach, whereby part of the environment is modelled at a so-called ‘coarse-grain’ level to speed up simulation, while accuracy is maintained using the standard, atomic-level description in other parts. Computer modelling acts as a guide to focus drug discovery research down certain avenues of investigation. Therefore, speeding up these simulations could in turn lead to faster drug development.
Other groups are simulating the movement of fluids or ‘sloshing’ in liquefied natural gas (LNG) containers on ships. Sloshing of LNG is potentially explosive and the group is modelling container design to minimise this effect. Elsewhere, archaeologists at the university are using Iridis 3 to recreate physically accurate models of tools and other artefacts, as well as entire rooms in ancient buildings. A lot of data is involved in generating these simulations due to the precision required in the models.
It’s still early days for Iridis 3 and currently we don’t have enough data to know where the performance bottlenecks are in the machine. Expanding the system for Iridis 4 won’t be considered until the end of 2010. We don’t have a lot of room to expand in the data centre, but there are other options available – expanding to six cores per socket rather than four, for instance, could achieve a 50 per cent increase without changing the data centre footprint. However, the memory bandwidth has to be scaled in line with the number of cores to avoid the system slowing down. One of the key metrics for us is ‘how long does it take for a user job to complete?’ That’s down to the number of cores and the performance of each of the cores. For many of our key users in engineering (big CFD simulations, for instance) memory bandwidth is the limiting factor, so if we reduce the memory bandwidth per core, we’re slowing them down.
Iridis 4 is still a little way off though and we’ll give researchers time to get to grips with the current installation before we start to think about upgrading.