In a bid to address large scale debugging with MPC and CUDA, a further collaboration agreement has been signed between Allinea Software and the French government-funded technological research organisation, the CEA.
To date, the aim of the collaboration between the supplier of software development tools for high performance computing (HPC) and the French organisation has been to develop enhancements to the Allinea DDT (Distributed Debugging Tool) for next generation hybrid and ‘many-core’ computer systems. Based on the work carried out in phase one, both parties are seeking to address a major challenge and offer a solution for large scale systems.
Dr David Lecomber, CTO at Allinea Software, explains: ‘Research organisations require ever more powerful computational resources, and so their software needs to run on ever larger numbers of processors. This increases the complexity and risks of errors in software development; Allinea is one of the only suppliers of tools that can handle this complexity and identify errors before they hold up important research.
‘Based on all of the knowledge we acquired from the first project with CEA, this new collaboration intends to address the concerns and support the aims of all HPC developers worldwide. A key aim is clearly to provide a solution now and in the future for MPC unified parallel runtime large scale debugging.’
Phase two will focus on making debugging tools both portable and easy to use for large scale debugging well over 100,000 cores. With an interface that is intuitive at every scale of parallelism, Allinea DDT's architecture has already been shown to scale well to existing large systems. Another important aim is to improve the previous work carried out on CUDA debugging by adding more features to address the CUDA architecture and fully exploit the enhancements which Nvidia now offers.
‘We’re very pleased with the results of our collaboration with Allinea Software over the past year and we want this to continue,’ comments Pierre Leca, head of Simulation and Information Sciences Department, CEA. ‘They are clearly an active, technology leader, with the right product already available and unrivalled expertise in parallel performance. Working together, we can address the growing numbers of cores in clusters and within individual nodes to take advantage of optimisations that are feasible for these systems. This will ensure that developers can use a tool that scales well on large many-core clusters both in terms of performance and ease of use,’ Leca added.