Tech focus: Memory and processors

Share this on social media:

Credit: Carlos Castilla/Shutterstock

Memory and processor technologies are becoming more diverse and competitive in the HPC market. Buoyed by interest in AI and ML, demand for HPC hardware is increasing. High performance processors and accelerators are necessary to deliver sustained application performance, but this has meant that memory and I/O bottlenecks are becoming increasingly common in HPC.

Heterogeneous architectures are also becoming more commonplace. This technology focus will highlight memory, processor and accelerator technologies available to researchers and scientists using HPC hardware and help to showcase the growing diversity of hardware for supercomputing.

For several years memory has increasingly been a stumbling block for HPC applications, as computational performance, or the number of floating point operations per second (FLOPs) that a system can produce outstrips the ability to feed data into the system. Known as the memory bottleneck, it has meant that innovation of memory technology is becoming crucial to driving performance for increasingly parallel systems which require vast amounts of bandwidth to stream data where it is needed consistently.

HPC systems require increasingly large memory subsystems, and this creates a problem for system designers who are trying to deliver enough data to each node to keep up with modern heterogeneous HPC systems.

In recent years, the movement of data – rather than the sheer speed of computation – has become a major bottleneck to large-scale or particularly data-intensive HPC applications. This has driven demand for bandwidth increases in and memory technology that can continue to support sustained application performance for HPC applications.

Products available

Achronix’s Speedster7t FPGA family is optimised for high-bandwidth workloads. Built on TSMC’s 7nm FinFET process, Speedster7t FPGAs feature a new 2D network-on-chip (2D NoC), an array of new machine learning processors (MLPs) optimised for high-bandwidth and artificial intelligence/machine learning (AI/ML) workloads, high-bandwidth GDDR6 interfaces, 400G Ethernet and PCI Express Gen5 ports – all interconnected to deliver ASIC-level performance while retaining the full programmability of FPGAs.

Aldec has developed a portfolio of FPGA accelerator boards to meet various expectations. Currently, Aldec provides several board configurations that can cope with the acceleration of the most demanding and sophisticated algorithms in two main categories – Large Scale HPC and Embedded HPC.

Derived from the family of HES prototyping boards, the FPGA accelerators were designed and optimised for large-scale HPC applications. To address the growing needs of embedded HPC, Aldec designed a special family of compact TySOM boards that utilise Xilinx Zynq devices which integrate both ARM Cortex processors and FPGAs structures in one chip.

Alpha Data’s ADM-PA100 is an adaptable PCIe form factor Versal ACAP Data Processing Unit suitable for early development and rapid deployment of solutions based on Xilinx Versal ACAP VC1902 AI Core device.

The PCIe form factor is suitable for desktop, lab, rack mount and data centre deployments in commercial temperature ranges. Additionally, the board can optionally be deployed stand-alone without any reliance on a host CPU. The FMC+ interface allows off-chip support for many standard and custom interfaces that can be supported by the Versal ACAP through the very wide range of Alpha Data and 3rd Party FMC IO adapters.

AMD EPYC Server Processors processors are built to handle large scientific and engineering datasets with top performance – ideal for HPC workloads, compute-intensive models and analysis techniques. 

EPYC delivers powerful processing capabilities thanks to the many innovations that went into its development. AMD “Zen 3” microarchitecture-based cores and AMD Infinity Architecture are the foundation, accelerating computation and helping protect against security threats. This is the first x86 CPU technology with true 3D die stacking, delivering 3X the L3 cache compared to standard 3rd Gen EPYC processors for breakthrough per-core performance.

Arm HPC solutions, including Arm Neoverse, address the needs of the HPC community today and in the future. Neoverse V1 was designed with two goals in mind: to provide designers with a platform for achieving the impossible while minimising the time, cost and independent development needed to get there. 

Features of the V1 include the Scalable Vector Extension and CMN-700 mesh interconnect, delivering extreme performance while giving designers the flexibility and freedom to experiment. Neoverse V1 designs can be customised, but remain compatible with mainstream OSes, libraries, compilers, middleware and debug and profiling tools.

BittWare has one of the largest FPGA product portfolios in the market. Offering a wide range of production-ready FPGA cards with Achronix, Intel and Xilinx FPGAs. Allowing researchers to develop and deploy applications quicker and with reduced cost and risk by having us integrate your FPGA cards into our certified, integrated FPGA servers.

Cerebras is a computer systems company dedicated to accelerating deep learning. The Wafer-Scale Engine (WSE) – the largest chip ever built – is at the heart of its deep learning system, the Cerebras CS-1. 

Each core on the WSE-2 is independently programmable and optimised for the tensor-based, sparse linear algebra operations that underpin neural network training and inference for deep learning. The WSE-2 empowers teams to train and run AI models at speed and scale, without the complex distributed programming techniques required to use a GPU cluster.

Intel provides a comprehensive HPC technology portfolio that helps customers achieve results for demanding workloads. Intel HPC hardware is designed to scale from stand-alone workstations to supercomputers with thousands of nodes. Deep learning acceleration is built right into the chip, so Intel processors are a great choice for solutions that combine HPC and AI.

The Intel product lineup includes the Intel Xeon Platinum and Gold processors and Intel Xe GPUs – the first Intel Xe-based discrete GPU for HPC. Intel also provides Intel oneAPI cross-architecture software development tools for HPC. 

Taking full advantage of Kalray’s patented technology, the Kalray MPPA Coolidge processor is a scalable 80-core processor designed for intensive real-time processing. Coolidge is the third generation of Kalray’s MPPA DPU (data processing unit) processors. 

Coolidge is natively capable of managing multiple workloads in parallel to enable smarter, more efficient and energy-wise data-intensive applications. It offers a unique alternative to GPU, ASIC or FPGA, bringing unique value to multiple applications from data centres, to edge computing and in embedded systems.

Kingston server SSD and memory products support the global demand to store, manage and instantly access large volumes of data in both traditional databases and big data infrastructure.

The need to store and manage larger amounts of data has increased exponentially in recent years. Data centres, cloud services, edge computing, internet of things and co-locations are just some of the business models that amass tremendous volumes of data.

Marvell offers a broad portfolio of data infrastructure semiconductor solutions spanning compute, networking, security and storage. The company’s products are deployed by organisations in enterprise, data centre and automotive data infrastructure market segments that require ASIC or data processing units equipped with multi-core low-power ARM processors. 

Marvell’s ThunderX2 family of processors enables the design of servers and appliances that are optimised for compute, storage, network and secure compute workloads in the cloud. Fully compliant with Armv8 architecture specifications including Arm SBSA, ThunderX2 further accelerates adoption for Arm servers in mainstream HPC deployments by providing the next level of computing performance and ecosystem readiness for commercial deployments. 

MemVerge’s Memory Machine virtualises DRAM and persistent memory so that data can be accessed, tiered, scaled and protected in-memory. The new Memory Machine Cloud Edition introduces innovation needed for efficient Big Memory Computing in the cloud. AppCapsule snapshot technology paired with a sophisticated System and Cloud Orchestration Service allows big memory cloud workloads to recover from Spot terminations gracefully and automatically.  

Micron’s technology is powering a new generation of faster, intelligent, global infrastructures that make mainstream artificial intelligence possible. Its fast, vast storage and high-performance, high-capacity memory and multi-chip packages power AI training and inference engines – whether in the cloud or embedded in mobile and edge devices.

Micron offers a variety of HPC technologies from GDDR6 and other memory products to SSDs and AI interference engines. Their Deep Learning Accelerator (DLA) solutions comprise a modular FPGA-based architecture with Micron’s advanced memory solutions running Micron’s (formerly FWDNXT) high-performance Inference Engine.

Xilinx’s Alveo U55C accelerator card delivers the efficiency and scalability called for in HPC applications. The U55C delivers dense compute and HBM, with onboard 200Gbps networking enabling massive scale-out using Xilinx’s groundbreaking open-standards-based clustering. Built around Xilinx’s powerful Virtex XCU55 UltraScale+ FPGA, the Alveo U55C card delivers fast application acceleration.