At the 2016 Neural Information Processing Systems (NIPS) Conference in Barcelona, Spain, supercomputing provider Cray announced the results of a deep learning collaboration between Cray, Microsoft, and the Swiss National Supercomputing Centre (CSCS) – to increase the performance of large-scale deep learning algorithms using Cray supercomputers.
Cray worked with Microsoft and CSCS, a world-class scientific computing center, to leverage their high-performance computing expertise to scale the Microsoft Cognitive Toolkit (formerly CNTK) on a Cray XC50 supercomputer at CSCS nicknamed ‘Piz Daint’.
Running larger deep learning models is opening the path to new scientific possibilities, but conventional systems and architectures limit the problems that can be addressed, as models take a long time to train. Using supercomputers can drastically reduce this time if the supercomputing architecture can make efficient use of learning tools to accelerate the training of Deep Neural networks (DNN).
‘Cray’s proficiency in performance analysis and profiling, combined with the unique architecture of the XC systems, allowed us to bring deep learning problems to our Piz Daint system and scale them in a way that nobody else has,’ said Professor Thomas Schulthess, director of the Swiss National Supercomputing Centre (CSCS). ‘What is most exciting is that our researchers and scientists will now be able to use our existing Cray XC supercomputer to take on a new class of deep learning problems that were previously infeasible.’
Deep learning problems share algorithmic similarities with applications traditionally run on a massively parallel supercomputer. By optimising inter-node communication using the Cray XC Aries network and a high-performance MPI library, each training job can leverage significantly more compute resources – reducing the time required to train an individual model.
‘Applying a supercomputing approach to optimise deep learning workloads represents a powerful breakthrough for training and evaluating deep learning algorithms at scale,’ said Dr. Xuedong Huang, distinguished engineer, Microsoft AI and Research. ‘Our collaboration with Cray and CSCS has demonstrated how the Microsoft Cognitive Toolkit can be used to push the boundaries of deep learning.’
A team of experts from Cray, Microsoft, and CSCS have scaled the Microsoft Cognitive Toolkit to more than 1,000 NVIDIA Tesla P100 GPU accelerators on the Cray XC50 supercomputer at CSCS. The result of this deep learning collaboration opens the door for researchers to run larger, more complex, and multi-layered deep learning workloads at scale, harnessing the performance of a Cray supercomputer.
Dr Mark Staveley, Cray’s director of deep learning and machine learning said: ‘We are working to unlock possibilities around new approaches and model sizes, turning the dreams and theories of scientists into something real that they can explore. Our collaboration with Microsoft and CSCS is a game changer for what can be accomplished using deep learning.’