Thanks for visiting Scientific Computing World.

You're trying to access an editorial feature that is only available to logged in, registered users of Scientific Computing World. Registering is completely free, so why not sign up with us?

By registering, as well as being able to browse all content on the site without further interruption, you'll also have the option to receive our magazine (multiple times a year) and our email newsletters.

Tech focus: Servers

Share this on social media:

New server technologies are pushing the boundary of density and computing power for HPC and AI

HPC and AI simulations require large amounts of computing power and this is driving an increase in demand for powerful servers that can leverage high-density configurations of both CPU and accelerator technologies. By squeezing more computing performance out of the available space and resources, scientists can increase the size and complexity of their simulations or reduce time spent waiting for results.

Recently, Liqid, a server company based in Denver, USA, announced what it claims is the fastest available single-socket server. The company released a white paper highlighting the technologies impressive Random IOPs and sequential bandwidth performance for this solution based on the Dell EMC PowerEdge R7515 Rack Server.

Liqid provides a composable infrastructure platform that aims to reduce vendor lock-in by enabling users to build a data centre architecture that changes to meet their business needs and scales as needed.

For the release of this new server, Liqid worked with AMD and Dell to deliver a server solution based on on Gen-4 PCI-Express (PCIe) fabric technology. The server, known as the ‘LQD4500’, is coupled with the AMD EPYC processors, enclosed in Dell’s EMC PowerEdge R7515 Rack Server - providing an architecture designed for AI-driven HPC application environments.

‘It’s undeniable that new data-intensive workloads, including artificial intelligence and edge computing, will increasingly rely on automation if they are to run efficiently while hiding some of the complexity. The current generation of static hardware can be limiting for these workloads without massive over-provisioning,’ said John Abbott, cofounder and systems infrastructure analyst at 451 Research, part of S&P Global’s Market Intelligence division. ‘AI, deep learning and other new workloads require the integration of new infrastructure to support them.’

The optimised platform from Liqid, Dell Technologies, and AMD enables scientists to deploy server systems with the highest compute and storage performance for core datacentres, as workflows and business needs scale up. 

‘In high-computational environments, the more compute power you can pack into a single node, the more accurate your results will be, and the more quickly those results can be implemented in the real world,’ said Sumit Puri, CEO and cofounder, Liqid. ‘We are proud to collaborate with industry leaders like Dell Technologies and AMD to provide adaptive architectures for AI and HPC applications to help solve some of the most vexing problems facing businesses and the world.’

Supermicro has also recently updated its portfolio of servers with the latest AMD EPYC processors. New GPUs from Nvidia based on its Ampere architecture were also announced to be included in some of Supermicro’s latest server offerings.   

At the time of the announcement Supermicro, announced that its latest server products had broken more than 27 world records for performance benchmarks. In addition to the industry’s first blade platform, Supermicro’s entire portfolio of new H12 A+ Servers fully supports the newly announced high-frequency AMD EPYC 7Fx2 Series processors.

Besides the new H12 SuperBlade and single and dual-socket multi-node Twin A+ Servers, Supermicro also introduced its next-generation WIO line of A+ Servers as well as a 4U server supporting eight double-width GPUs. With PCI-E 4.0 x16 support, these A+ Servers can deliver up to 200G connectivity and feature a large memory footprint of up to four terabytes (4TB) per socket running fast DDR4 memory at up to 3200MHz.

‘Supermicro 2nd Gen AMD EPYC processor-based A+ Servers have achieved 27 world-record performance benchmarks and counting,’ said Vik Malyala, senior vice president, field application engineering and business development, Supermicro. ‘For instance, our A+ Servers achieved world-record performance for the TPCx-IoT benchmark with the H12 TwinPro that established the highest performance and lowest dollar per performance over the previous record holders with the new high-frequency AMD EPYC 7Fx2 processors.’

The new A+ SuperBlade features a performance and density optimised resource-saving architecture with up to 20 hot-pluggable single-socket nodes in 8U and integrated networking fabric up to 100G EDR InfiniBand with 200G HDR coming soon.  The SuperBlade uses shared cooling, power and networking infrastructure to increase the efficiency and significantly reduce initial capital and operational expenses for many organisations.

Supermicro A+ WIO systems offer a wide range of I/O options enabling customers to optimise storage and networking alternatives to accelerate performance, increase efficiency, and find the perfect fit for their applications. 

These single-socket 1U servers support three PCI 4.0 x16 slots, 8 or 16 DIMM versions, and redundant high-efficiency Platinum Level power supplies. Both WIO and Ultra A+ systems can support 10-24 NVMe drives per single or dual-processor system without requiring a PCI-E switch to provide excellent storage performance.

For maximum acceleration of AI, deep learning, and HPC workloads, Supermicro’s new A+ GPU system supports up to eight full-height double-wide (or single-wide) GPUs via direct-attach PCI-E 4.0 x16 CPU-to-GPU lanes without any PCI-E switch for the lowest latency and highest bandwidth. 

The system also supports up to three additional high-performance PCI-E 4.0 expansion slots for a variety of uses, including high-performance networking connectivity up to 100G. An additional AIOM slot supports a Supermicro AIOM card or an OCP 3.0 mezzanine card.

Supermicro offers a wide-ranging portfolio of EPYC based systems and server building blocks including ATX and E-ATX motherboards. From single-socket mainstream and WIO servers to high-end server systems and multi-node systems, including BigTwin and TwinPro. Supermicro enables customers to build application-optimised solutions with a multitude of configuration possibilities to match their own workload requirements.

Boosting AI performance

In May Nvidia CEO Jensen Huang, announced the latest GPU architecture ‘Ampere’, which can deliver more than five petaflops of AI performance in a single 4U server. This performance has a huge impact on potential research outcomes for AI. Just a few years ago this is the kind of performance that would be seen in top supercomputers, not just a single server. For example in June 2016, five petaflops would be just outside the top 10 spot on the Top500.

Supermicro announced two new systems designed for artificial intelligence (AI) deep learning applications that fully leverage the third-generation Nvidia HGX technology with the new Nvidia A100 Tensor Core GPUs as well as full support for the new Nvidia A100 GPUs across the company’s broad portfolio of 1U, 2U, 4U and 10U GPU servers. The NVIDIA A100 is the first elastic, multi-instance GPU that unifies training, inference, HPC, and analytics by enabling users to partition the GPU into multiple smaller systems or operate it as a single large GPU depending on the workload.

‘Expanding upon our portfolio of GPU systems and Nvidia HGX-2 system technology, Supermicro is introducing a new 2U system implementing the new Nvidia HGX A100 4 GPU board (formerly codenamed Redstone) and a new 4U system based on the new Nvidia HGX A100 8 GPU board (formerly codenamed Delta) delivering five petaflops of AI performance,’ said Charles Liang, CEO and president of Supermicro. 

‘As GPU accelerated computing evolves and continues to transform data centres, Supermicro will provide customers with the very latest system advancements to help them achieve maximum acceleration at every scale while optimising GPU utilisation. These new systems will significantly boost performance on all accelerated workloads for HPC, data analytics, deep learning training and deep learning inference,’ added Liang. 

As a balanced datacentre platform for HPC and AI applications, Supermicro’s new 2U system makes use of the Nvidia HGX A100 4 GPU board with four direct-attached Nvidia A100 Tensor Core GPUs using PCI-E 4.0 for maximum performance and Nvidia NVLink for high-speed GPU-to-GPU interconnects. 

This advanced GPU system accelerates compute, networking and storage performance with support for one PCI-E 4.0 x8 and up to four PCI-E 4.0 x16 expansion slots for GPUDirect RDMA high-speed network cards and storage such as InfiniBand HDR, which supports up to 200Gb per second bandwidth. 

‘AI models are exploding in complexity as they take on next-level challenges such as accurate conversational AI, deep recommender systems and personalised medicine,’ said Ian Buck, general manager and VP of accelerated computing at Nvidia. ‘By implementing the Nvidia HGX A100 platform into their new servers, Supermicro provides customers the powerful performance and massive scalability that enable researchers to train the most complex AI networks at unprecedented speed.’

Optimised for AI and machine learning, Supermicro’s new 4U system supports eight A100 Tensor Core GPUs. The 4U form factor with eight GPUs is ideal for customers that want to scale their deployment as their processing requirements expand. 

The new 4U system will have one Nvidia HGX A100 8 GPU board with eight A100 GPUs all-to-all connected with Nvidia NVSwitch for up to 600GB per second GPU-to-GPU bandwidth and eight expansion slots for GPUDirect RDMA high-speed network cards. Ideal for deep learning training, data centres can use this scale-up platform to create next-gen AI and maximise data scientists’ productivity with support for ten x16 expansion slots.

Customers that require large scale HPC simulations or who are investing in AI and DL technologies may want to explore the latest server offerings to ensure that they have the best available performance to accelerate their simulations.

For example, scientists who want maximum acceleration can look to Supermicro’s new A+ GPU system which supports up to eight full-height double-wide (or single-wide) GPUs via direct-attach PCI-E 4.0 x16 CPU-to-GPU lanes without any PCI-E switch for the lowest latency and highest bandwidth. 

The system also supports up to three additional high-performance PCI-E 4.0 expansion slots for a variety of uses, including high-performance networking connectivity up to 100G. An additional AIOM slot supports a Supermicro AIOM card or an OCP 3.0 mezzanine card.

The A100 GPU includes a new multi-instance GPU (MIG) virtualisation and GPU partitioning capability that is particularly beneficial to cloud service providers (CSPs). When configured for MIG operation, the A100 permits CSPs to improve the utilisation rates of their GPU servers, delivering up to 7 times more GPU Instances for no additional cost. Robust fault isolation allows them to partition a single A100 GPU safely and securely.

The A100 adds a new third-generation Tensor Core that boosts throughput over V100 while adding comprehensive support for DL and HPC data types, together with a new Sparsity feature that delivers a further doubling of throughput. The TensorFloat-32 (TF32) Tensor Core operations in A100 provide an easy path to accelerate FP32 input/output data in DL frameworks and HPC, running 10 times faster than V100 FP32 FMA operations or 20 times faster with sparsity. For FP16/FP32 mixed-precision DL, the A100 Tensor Core delivers 2.5 times the performance of V100, increasing to 5 times with sparsity. 

The GPU enables Bfloat16 (BF16)/FP32 mixed-precision Tensor Core operations running at the same rate as FP16/FP32 mixed-precision. Tensor Core acceleration of INT8, INT4, and binary round out support for DL inferencing, with A100 sparse INT8 running 20 times faster than V100 INT8. For HPC, the A100 Tensor Core includes new IEEE-compliant FP64 processing that delivers 2.5 times the FP64 performance of V100. 

The Nvidia A100 GPU is architected to not only accelerate large complex workloads, but also efficiently accelerate many smaller workloads. 

A100 enables server solutions that can accommodate unpredictable workload demand, while providing fine-grained workload provisioning, higher GPU utilisation, and improved TCO. 

Whether looking for initial test systems to explore AI research or to replace large computing systems at your organisation, by taking a look at the latest server technologies scientists can accelerate the performance of their research.  

______

Boston Limited Featured product

Intel Select Solution for Simulation and Modeling

Intel Select Solutions for Simulation and Modeling are a guided path to success with quick-to-deploy infrastructure that significantly reduces the complexity for the purchaser, Boston have been very successful working with key customers deploying Intel Select Solution-based clusters in their CFD environments.  Using a standards-based approach defined in the HPC Platform Specification, these solutions provide verified interoperability with common applications used in simulation and modeling.

They must also meet or exceed characteristics and performance thresholds that are needed for scaling performance across the cluster. Branded designs like the Boston Intel Select Solution for Simulation and Modeling, have demonstrated these capabilities and are ready for deployment.

The recently upgraded Boston Intel Select Solution for Simulation and Modeling utilises 2nd Generation Intel® Xeon® Scalable Gold processors,  the latest Intel® SSD DC Family for local scratch storage, Intel SSD DC Family storage to augment parallel file system storage and Intel® Omni-Path networking; this solution can also be customised at Boston to your exact requirements.

www.boston.co.uk/products/solutions/hpc/intelselect.aspx

 

Exclude from view: