Next Oak Ridge supercomputer announced by Cray and AMD

Share this on social media:

The US Department of Energy (DOE), Oak Ridge National Laboratory (ORNL) has announced details of its latest supercomputer alongside Cray and AMD. Scheduled to be delivered to ORNL in 2021, the system known as 'Frontier' is expected to deliver more than 1.5 exaflops of processing performance. The announcement that Cray and AMD will deliver this new supercomputer highlights the diversity available to HPC users, continued competition in processor development will help to drive innovation and these large systems help to drive research and development for future generations of HPC users. At the very least demonstrates that there are options available to HPC users beyond Intel CPUs.

The Frontier system is designed to use future generation High Performance Computing (HPC) and Artificial Intelligence (AI) optimiSed, custom AMD EPYC CPU, and AMD Radeon Instinct GPU processors. Researchers at ORNL will use the Frontier system’s performance to simulate, model and advance understanding of the interactions underlying the science of weather, sub-atomic structures, genomics, physics, and other important scientific fields.

'Frontier represents the state-of-the-art in high-performance computing. Designing and standing up a machine of its scope requires working closely with industry, partnerships which not only enable breakthrough science but also ensure American scientific and economic competitiveness on the global stage,' said Jeff Nichols, associate laboratory director for Computing and Computational Sciences, ORNL. 'We are delighted to work with AMD to integrate the CPU and GPU technologies that enable this extremely capable accelerated node architecture.'

'AMD is proud to partner with Cray and ORNL to deliver what is expected to be the world’s most powerful supercomputer,' said Forrest Norrod, senior vice president and general manager, AMD Datacenter and Embedded Systems Group. 'Frontier will feature custom CPU and GPU technology from AMD and represents the latest achievement on a long list of technology innovations AMD has contributed to the Department of Energy exascale programs.'

AMD's history with the DOE started with the Jaguar supercomputer in 2005 and Titan supercomputer in 2012. The Frontier system leverages years of exascale technology investments by DOE. The contract award includes technology development funding, a center of excellence, several early-delivery systems, the main Frontier system and multi-year systems support. 'The Frontier system architecture embodies the compute and data-intensive capabilities required to unlock the full potential of the exascale era,' said Jeff Nichols, Associate Lab Director at ORNL. 'The power and flexibility of the system will enable the creation of new converged HPC, analytics, and AI applications across the full breadth of the exascale computing program’s mission.'

To enable the HPC and AI workloads to run simultaneously across the system, Slingshot was designed to incorporate intelligent features like adaptive routing, quality-of-service, and congestion management. Frontier will utilise Cray’s new Shasta system software for monitoring, orchestration, and application development to provide a single developer interface across the system. The new software stack is a fully containerised architecture that combines the scalability and performance of HPC while enabling the productivity and portability of cloud. In addition to the capabilities native to the Shasta system, Cray has also been awarded a separate joint development contract to pursue new foundational technologies for the Frontier system. This includes the development of new high-density compute infrastructure, enhancements to HPC developer tools for GPU scaling and AI, and the creation of a Center of Excellence to establish best practices for exascale application development and tuning.

Compute Blade and Cabinet Infrastructure

To reach sustained exaflop performance, the Frontier system will use dense compute and cabinet infrastructure capabilities. For Frontier, Cray is designing a new AMD EPYC CPU and Radeon Instinct GPU powered blade for the Shasta high-density cabinet. Cray will also engineer new high-efficiency power delivery and integrated direct liquid cooling capabilities for key server components to ensure high operational energy efficiency and low total cost of ownership.

'The Frontier design is a marvel of engineering and AMD is proud to be bringing its technical innovation to the project in conjunction with Cray, Oak Ridge National Lab and the Department of Energy,' said Mark Papermaster, executive vice president, and chief technology officer, AMD. 'AMD has a long history of pushing the boundaries of compute performance and working with DOE on advanced exascale research. I’m very excited to see a combination of custom AMD EPYC CPUs, purpose-built Radeon Instinct GPUs, and our open software development toolset selected to power this amazing machine.'

Software Innovation and Collaboration

To enable developer productivity, users will require a  high-level software development environment with tightly-coupled compilers, tools, and libraries which abstract away system complexity. The Cray Programming Environment (Cray PE) has delivered these core capabilities for Cray users for decades and, as part of this program, will see a number of enhancements for increased functionality and scale.

This will start with Cray working with AMD to enhance these tools for optimised GPU scaling with extensions for Radeon Open Compute Platform (ROCm). These software enhancements will leverage low-level integrations of AMD ROCmRDMA technology with Cray Slingshot to enable direct communication between the Slingshot NIC to read and write data directly to GPU memory for higher application performance. Cray PE will be integrated with a full machine learning software stack with support for the most popular tools and frameworks. The HPC development capabilities of Cray PE, in combination with an optimised and scalable data science suite, are being developed to enable developers to make converged use of analytics, AI, and HPC .

Application Development and Tuning

To accelerate user adoption of the system, a Center of Excellence will be established by Cray and Oak Ridge National Lab to drive collaboration and innovation, and to assist in the porting and tuning of key DOE applications and libraries for the Frontier system. This will include collaborative modernisation of new and legacy code to support directive-based programming models such as OpenMP, and delivering training and workshops for hands-on learning of how to fully leverage the system. This collaboration will ensure that best practices are defined and disseminated quickly to further accelerate development of exascale-class applications.   

'This is another major win for Cray and means that in 2021 America’s top two supercomputers and most powerful entries in the global exascale race will use the Cray ‘Shasta’ architecture,' said Steve Conway, Hyperion Research senior vice president of research. 'This architecture is designed to support the extreme heterogeneity needed for future HPC and AI workloads.'

The contract award includes technology development funding, a center of excellence, several early-delivery systems, the main Frontier system, and multi-year system support.  The Frontier system is expected to be delivered in 2021 and acceptance is anticipated in 2022.