More cores means slower supercomputing

The worldwide attempt to increase the speed of supercomputers merely by increasing the number of processor cores on individual chips unexpectedly worsens performance for many complex applications, simulations have found.

A team of researchers simulated key algorithms for deriving knowledge from large data sets. The simulations show a significant increase in speed going from two to four multicores, but an insignificant increase from four to eight multicores. Exceeding eight multicores causes a decrease in speed. Sixteen multicores perform barely as well as two, and after that, a steep decline is registered as more cores are added.

The problem is the lack of memory bandwidth as well as contention between processors over the memory bus available to each processor. (The memory bus is the set of wires used to carry memory addresses and data to and from the system RAM.)

Richard Murphy, a senior member of the technical staff from the Sandia Laboratory where the research took place, told that it is 'difficult' to pinpoint how prevalent the problem is: 'The data presented in the spectrum article is for a class of large-data size applications, specifically in informatics. We've seen very similar trends for many large-data HPC applications. There are smaller data HPC applications that won't suffer from this problem, and, speaking very generally, common desktop users are unlikely to see this problem because their data sets are small or streaming, rather than large and random access.'

He adds: 'I also think we will see this problem for other large data problems outside of traditional HPC or informatics. Some preliminary and informal discussions in other communities (finance, databases, etc.) tend to confirm this view.'

But the team at Sandia are working hard to try and resolve this problem, as Murphy explains: 'We're working on a number of strategies to address the problem. Basically, the problem of "data movement" (from CPU to memory, CPU to CPU, and over networks) tends to be the biggest challenge in running these applications, as opposed to the problem of "computing", which is doing the actual mathematics. We've looked at both memory and networking as performance bottlenecks and methods of alleviating the problem.'

So is it a case of simply rewriting the code to work over multiple processors as is often the case with a move to multicore? Unfortunately not, as Murphy says: 'There are certainly some code refactoring methods that can help to solve the problem for different applications, including new data decompositions, better data sharing mechanisms, etc.

'However, there are some important applications that are very unlikely to gain performance via these methods. And, in other cases working on hardware to alleviate the problem may be less expensive (since software costs are very high). We're working on numerous hardware mechanisms that could help, including better synchronisation, improved data movement support (such as gather/scatter), and more capable networks.'

Twitter icon
Google icon icon
Digg icon
LinkedIn icon
Reddit icon
e-mail icon

For functionality and security for externalised research, software providers have turned to the cloud, writes Sophia Ktori


Robert Roe looks at the latest simulation techniques used in the design of industrial and commercial vehicles


Robert Roe investigates the growth in cloud technology which is being driven by scientific, engineering and HPC workflows through application specific hardware


Robert Roe learns that the NASA advanced supercomputing division (NAS) is optimising energy efficiency and water usage to maximise the facility’s potential to deliver computing services to its user community


Robert Roe investigates the use of technologies in HPC that could help shape the design of future supercomputers