NERSC finalises supercomputer contract
The National Energy Research Scientific Computing Center (NERSC), the mission high-performance computing facility for the US Department of Energy’s Office of Science, has moved another step closer to making Perlmutter — its next-generation GPU-accelerated supercomputer — available to the science community in 2020.
In mid-April, NERSC finalised its contract with Cray — which was acquired by Hewlett Packard Enterprise (HPE) in September 2019 — for the new system, a Cray Shasta supercomputer that will feature 24 cabinets and provide 3-4 times the capability of NERSC’s current supercomputer, Cori. Perlmutter will be deployed at NERSC in two phases: the first set of 12 cabinets, featuring GPU-accelerated nodes, will arrive in late 2020; the second set, featuring CPU-only nodes, will arrive in mid-2021. A 35-petabyte all-flash Lustre-based file system using HPE’s ClusterStor E1000 hardware will also be deployed in late 2020.
‘We are excited about the progress our applications teams are making optimising their codes for current and upcoming GPUs,’ Deslippe said. ‘Across all of our science areas we are seeing applications where a V100 GPU on Cori is outperforming a CPU Cori node by 5x or greater. These performance gains are the result of work being done by tightly coupled teams of engineers from the applications, NERSC, Cray, and NVIDIA. The enthusiasm for GPUs we are seeing from these teams is encouraging and contagious.’
Since announcing Perlmutter in October 2018, NERSC has been working to fine-tune science applications for GPU technologies and prepare users for the more than 6,000 next-generation NVIDIA GPU processors that will power Perlmutter alongside the heterogeneous system’s AMD CPUs. Nearly half of the workload currently running at NERSC is poised to take advantage of GPU acceleration, and NERSC has played a key role in helping the broader scientific community leverage GPU capabilities for their simulation, data processing, and machine learning workloads.
At the core of these efforts is the NERSC Exascale Science Applications Program (NESAP). NESAP partnerships allow projects to collaborate with NERSC and HPC vendors by providing access to early hardware, prototype software tools for performance analysis and optimisation, and special training. Over the last 18 months, NESAP teams have been working with NERSC staff and NVIDIA and Cray engineers to accelerate as many codes as possible and ensure that the scientific community can hit the ground running when Perlmutter comes online.
For example, using the NVIDIA Volta GPU processors currently available in Cori, NERSC has been helping users add GPU acceleration to a number of applications and optimize GPU-accelerated code where it already exists, noted Jack Deslippe, who leads NERSC’s Application Performance Group.
As part of NESAP, in February 2019 NERSC and Cray also began hosting a series of GPU hackathons to help these teams gain knowledge and expertise about GPU programming and apply that knowledge as they port their scientific applications to GPUs. The fifth of 12 scheduled GPU hackathons was held in March at Berkeley Lab.
‘These hands-on events are helping ensure that NESAP codes and the broader NERSC workload will be ready to take advantage of the GPUs when Perlmutter arrives,’ said Brian Friesen, an Application Performance Specialist at NERSC who leads the hackathons. ‘In some cases, NESAP teams have achieved significant speedups to their applications or key kernels by participating in a hackathon. In other cases, teams have developed proof-of-concept GPU programming methods that will enable them to port their full applications to GPUs.’
Meanwhile, NERSC and NVIDIA are collaborating on innovative software tools for Perlmutter’s GPU processors, with early versions being tested on the Volta GPUs in Cori:
Roofline Analysis: The Roofline Model, developed by Berkeley Lab researchers, helps supercomputer users assess the performance of their applications by combining data locality, bandwidth and parallelisation paradigms into a single figure that exposes performance bottlenecks and potential optimisation opportunities. NERSC has been working with NVIDIA to create a methodology for Roofline data collection on NVIDIA GPUs, and a set of performance metrics and hardware counters have been identified from the profiling tools, nvprof and Nsight Compute, to construct a hierarchical Roofline. This helps users gain a holistic view of their application and identify the most immediate and profitable optimisations. The methodology has been validated with applications from various domains, including material science, mesh and particles-based codes, image classification and segmentation, and natural language processing.
OpenMP Offload PGI compiler: Since early 2019, NERSC staff have been collaborating with NVIDIA engineers to enhance NVIDIA’s PGI C, C++, and Fortran compilers to enable OpenMP applications to run on NVIDIA GPUs and help users efficiently port suitable applications to target GPU hardware in Perlmutter.
Python-based data analytics: NERSC and NVIDIA are developing a set of GPU-based high performance data analytic tools using Python, the primary language for data analytics at NERSC and a robust platform for machine learning and deep learning libraries.
RAPIDS: Using RAPIDS, a suite of open-source software libraries for running data science and analytics pipelines on GPUs, NERSC and NVIDIA engineers are working to understand the kinds of issues NERSC data users encounter with GPUs in Python, optimise key elements of the data analytics stack, and teach NERSC users how to use RAPIDS to optimise their workflows on Perlmutter.
‘Giving our users access to the very latest in GPU-accelerated technology this year is an important step towards ensuring that our users remain productive and are able to utilise the systems to prepare for the Exascale era. Our efforts in getting our diverse user base familiar with the new technology has been very encouraging and we look forward to Perlmutter delivering a highly capable user resource for workloads in simulation, learning and data analysis,’ said Sudip Dosanjh, NERSC Director.
Located at Lawrence Berkeley National Laboratory, NERSC is a DOE Office of Science user facility.