ANALYSIS & OPINION
More cores means slower supercomputing
20 January 2009Tweet
The worldwide attempt to increase the speed of supercomputers merely by increasing the number of processor cores on individual chips unexpectedly worsens performance for many complex applications, simulations have found.
A team of researchers simulated key algorithms for deriving knowledge from large data sets. The simulations show a significant increase in speed going from two to four multicores, but an insignificant increase from four to eight multicores. Exceeding eight multicores causes a decrease in speed. Sixteen multicores perform barely as well as two, and after that, a steep decline is registered as more cores are added.
The problem is the lack of memory bandwidth as well as contention between processors over the memory bus available to each processor. (The memory bus is the set of wires used to carry memory addresses and data to and from the system RAM.)
Richard Murphy, a senior member of the technical staff from the Sandia Laboratory where the research took place, told scientific-computing.com that it is 'difficult' to pinpoint how prevalent the problem is: 'The data presented in the spectrum article is for a class of large-data size applications, specifically in informatics. We've seen very similar trends for many large-data HPC applications. There are smaller data HPC applications that won't suffer from this problem, and, speaking very generally, common desktop users are unlikely to see this problem because their data sets are small or streaming, rather than large and random access.'
He adds: 'I also think we will see this problem for other large data problems outside of traditional HPC or informatics. Some preliminary and informal discussions in other communities (finance, databases, etc.) tend to confirm this view.'
But the team at Sandia are working hard to try and resolve this problem, as Murphy explains: 'We're working on a number of strategies to address the problem. Basically, the problem of "data movement" (from CPU to memory, CPU to CPU, and over networks) tends to be the biggest challenge in running these applications, as opposed to the problem of "computing", which is doing the actual mathematics. We've looked at both memory and networking as performance bottlenecks and methods of alleviating the problem.'
So is it a case of simply rewriting the code to work over multiple processors as is often the case with a move to multicore? Unfortunately not, as Murphy says: 'There are certainly some code refactoring methods that can help to solve the problem for different applications, including new data decompositions, better data sharing mechanisms, etc.
'However, there are some important applications that are very unlikely to gain performance via these methods. And, in other cases working on hardware to alleviate the problem may be less expensive (since software costs are very high). We're working on numerous hardware mechanisms that could help, including better synchronisation, improved data movement support (such as gather/scatter), and more capable networks.'