Scalable debugging demonstrated on Jaguar supercomputer
Under the Debugger Software Enhancement programme for Petascale production grade tools, Allinea Software's Distributed Debugging Tool (DDT) has demonstrated scalable debugging to 220,000 cores on Oak Ridge National Laboratory's (ORNL) Jaguar supercomputer, a Cray XT5.
In Q2 2009, Allinea began a collaborative project with Oak Ridge National Laboratory (ORNL) to extent the scalability of its DDT product. The project goal is to enable ORNL's users to debug MPI applications that span many hundreds of thousands of processors, while delivering novel capabilities that can radically simplify this task.
'ORNL and Allinea are partnering to enhance the scalability of the DDT debugger with the goal to support the complete Jaguar System. The work has progressed in a timely manner and has demonstrated the ability to debug a 220,000 process job,' commented Richard Graham, applications performance tools group leader at ORNL. 'We are very pleased with our partnership and the success we are achieving with Allinea Software.'
'Given the inherent complexities of developing applications at Petascale, it is very important that our users are not frustrated by the very tools that are intended to help solve their problems,' said Dr David Lecomber, CTO of Allinea. 'Our initial work has therefore focused on making the basic Petascale debugging experience much the same as it would be on very modest numbers of processes. Current benchmarks also show that we can perform key actions – like stepping 220,000 MPI processes, setting breakpoints, or comparing variables across this number of processors – in a couple of hundred milliseconds or less.'