Overcoming the memory bottleneck

Share this on social media:

Robert Roe looks at the technologies that can help overcome the memory bottleneck in HPC systems










Increasing performance in today’s HPC systems is not as simple as turning up clock speeds and increasing the number of computational elements, as the amount of memory bandwidth increases with each upgrade to computational power. In order to reach the next level of performance, system and application designers need to think about a balanced architecture, which must include innovations in memory technology.

Jason Adlard, director of business development at Micron Technology, said: ‘HPC stretches across a number of different market segments that Micron addressed with our existing customers. We traditionally service the major server and PC vendors, in addition to the networking infrastructure vendors, automotive industry and many others.

‘It is of interest to us, because what is today’s HPC could be tomorrow’s mainstream memory solution. From the perspective of the market size and it being a volume revenue generator today, that isn’t really the interest for us. In HPC its more about wanting to ensure that we are understanding the demands from the markets that we are already serving, and what we can do to be the market expert in those areas, to bring some of that innovation pipeline into real life situations, where some of these technologies can be proofed in real applications,’ stated Adlard.

‘You have got the business reasons, financial reasons of economies of scale. There is an enormous amount of time that goes into research to improve on the technologies that are mainstream today. The demands from the HPC segment are varied, so we are looking to improve performance in many different areas.’

Some of the areas that Adlard notes as being areas of development for Micron include: reducing power consumption either at the memory solution level or the system level; reduction of latency; or increasing the bandwidth and speed of the solution.

‘We need to look at all of these different facets, it takes a lot of resource, in terms of staff hours and funding, to come up with working solutions, and then the very long process to introduce these technologies into the market, because the existing ecosystem is based around certain architectures. To displace those architectures and to gain the confidence of the market that these new innovations are viable solutions takes time,’ Adlard added.

Current HPC systems rely on DRAM for main system memory, while flash is the dominant technology for non-volatile memory used in storage. Many storage architectures currently in use will use older technologies but most new systems will focus on flash as the primary technology. ‘We do not envisage those technology types being displaced in the next decade,’ noted Adlard.

However that does not mean that companies are not working on the next set of innovations in memory technology.

Innovation tackles the memory barrier

In 2015 Intel and Micron announced a partnership to deliver the next generation of memory, which is seen as a cross between the benefits of today’s DRAM and flash systems. Known as 3D XPoint (cross point), the technology was described as being up to 1,000 times faster and have up to 1,000 times more endurance than NAND flash, with 10 times the storage density of conventional memory.

The 3D XPoint technology has a different architecture from other flash products. The companies described it as based on phase-change memory technology, with a transistor-less, cross-point architecture that positions selectors and memory cells at the intersection of perpendicular wires. To improve storage density, the 3D XPoint cells can be stacked in three dimensions.

In 2018 Micron and Intel parted ways, but the technology is still very much alive. Today users can purchase SSD’s based on the 3D XPoint architecture, and memory products are expected later in 2019. Intel’s continued development of the technology now falls under its Optane Memory brand.

Adlard notes that the 3D XPoint architecture is the main area of development for future memory products. He was keen to stress that there are additional technologies in the pipeline beyond 3D XPoint, but at this time Micron cannot discuss these technologies.

‘We are working on other things, such as 3D XPoint, a technology we are very much committed to. 3D XPoint is a Micron-owned technology. We are working on several projects around that particular innovation and I expect in the next two to three years to be able to introduce solutions to the market,’ said Adlard.

‘There are other types of innovation that are being worked on now, by our laboratories, which are even further out and probably won’t see the light of day for several years beyond that. In the near term, we do not expect there to be anything that will dramatically revolutionise the technologies around DRAM and NAND.’

For several years memory has increasingly been a stumbling block for HPC applications, as computational performance, or the number of floating point operations per second (flops) that a system can produce outstrips the ability to feed data into the system. Known as the memory bottleneck, it has meant that innovation in memory technology is becoming crucial, in order to drive performance for increasingly parallel systems which require huge amounts of bandwidth to constantly stream data into the system.

‘In computing systems, for some time now, memory has been the bottleneck to performance, so memory has been lagging behind processor performance, particularly in terms of latency and speed,’ said Adlard.

‘On the one side, it means that the memory technologists are in catch-up mode, but it also puts us in a strong position, in terms of being able to leverage those systems.’

Due to this memory bottleneck, memory is becoming a much more critical component to HPC performance than it was 10 years ago, notes Adlard. ‘A few years ago, it was just perceived as being a dumb commodity that was pegged onto the processor in an X86 system. But nowadays, in terms of breaking through any performance barriers, the impetus is much more on the memory, and what innovation the memory can bring to the system.

‘It puts us, as a memory vendor, in a very interesting situation, as those applications that are desperate to break through that wall and get maximum performance, are looking to us and our innovations to see how they can do that,’ added Adlard.

‘Activities and developments are taking place on the DRAM side, and the NAND side, and everything in between, in terms of latency. 3D XPoint would sit on that latency graph in between pure DRAM-based solutions and pure NAND-storage solutions, in terms of latency.

‘But again it depends on the application itself, what is the critical performance factor for that particular application? Is it latency, bandwidth, speed, power – but memory does play an ever-increasing role in addressing whichever of these criteria are important for a given application,’ concluded Adlard.

Mark Hur, operations director at Micron Technology, added: ‘When we talk about memory there are two aspects, volatile and non-volatile. Volatile memory, such as DRAM, can be addressed with some of the current technologies out there, such as GDDR6 devices and high bandwidth memory (HBM), but from talking with [people in the industry] it is never enough. If you give them 1TB/s, then they want 2TB/s.

‘I think the more interesting part of this story is the non-volatile storage. If you look at the supercomputers out there, they are often looking at peak flops as a measurement of performance, but in an actual workload, they are only getting 20 to 30 per cent of the actual peak flops. This can be attributed to a number of factors but one of them is the movement of data and that comes from the latency associated with storage technology,’ Hur continued.

‘In the future, I think you will see some convergence between storage and memory, such as 3D XPoint. I also think you will see storage moving closer to the compute elements of an HPC system. It’s something that people have been looking at for many years. I believe it is just getting to the point now that there is sort of a wall, and they need to do something. I do not think anyone has defined a particular strategy or architecture that will take them to the future but these are things that people looking at, especially at Micron. How do we grow and assist our customer base in a world where compute and storage are very close together,’ Hur concluded.

The next generation of I/O technologies

The 3-D XPoint technology is the basis for an EU project which hopes to meet the memory requirements for exascale HPC systems. Known as NextGenIO, and funded through Horizon 2020, the project aims to solve, or at least alleviate, the memory barrier through the use of Intel’s Optane DC Persistent Memory, which will sit between conventional memory and disk storage. The Optane brand is based on the 3D XPoint technology initially developed jointly by Micron and Intel. NextGenIO will design the hardware and software to exploit the new memory technology. The goal is to build a system with 100 times faster I/O than current HPC systems, a significant step towards Exascale computation.

Current HPC systems perform on the order of tens to hundreds of Pflops. Although this already represents one million billion computations per second, more complex demands on scientific modelling and simulation mean even faster computation is necessary.

One of the major roadblocks to achieving this goal is the I/O bottleneck. Current systems are capable of processing data quickly, but speeds are limited by how fast the system is able to read and write data. This represents a significant loss of time and energy in the system. Being able to widen, and ultimately eliminate, this bottleneck would majorly increase the performance and efficiency of HPC systems.

Dr Michèle Weiland, project manager for NextGenIO at the Edinburgh Parallel Computing Centre (EPCC), at the University of Edinburgh, explained the objectives of the project, which aims to eventually launch commercial products through project partners. ‘NextGenIO is working on improved I/O performance for HPC and data analytics workloads. The project is building a prototype hardware system with byte-addressable persistent memory on the compute nodes, as well as developing the system ware stack that will enable the transparent use of this memory,’ said Weiland.

‘Another key outcome from the project’s research is the publication of a series of white papers, covering topics ranging from the architecture requirements and design, to the system ware layer and applications. Each white paper addresses one of the core challenges or developments that were addressed as part of NextGenIO,’ added Weiland.

The project has been running since 2015 and has already delivered experimental results, the white papers and the design for a motherboard, which will be released as a commercial product by a project partner, Fujitsu.

The next stage involves installing a prototype system at the University of Edinburgh. This will then be used to test the new memory, which the project researchers will help to blur the line between memory and storage.

‘The difference is the memory can be plugged in right next to the processor, just like a DRAM DIMM. The processor will see them as a single space,’ said Weiland.

‘We focused on this because it offers two things. It offers a large amount of capacity, in the region of several terabytes per node, and it also offers performance. In terms of capacity, you can get many terabytes and, in terms of performance, although it is slower than DRAM, it is much faster than flash. Some people have been putting SSDs onto compute nodes and using that as a buffer to accelerate applications,’ added Weiland.

She notes that in today’s systems, if your simulation has to read a lot of data at the start, this takes up time at the beginning of the simulation. ‘During this time your processor is not necessarily doing anything useful, they just sit reading data and depending on the size of the simulation this can take a long time. We are looking at supporting techniques to pre-load the data before the job runs,’ stated Weiland.

‘When the simulation starts, the data is immediately ready for it, so you are using the processors for what they are best at, while everything else is done in the background.

‘You can use it both as memory and as storage, so it has different modes of operation. You can either say I want this to be memory, and your processor will see a very large memory region, or you can say I want this to be storage and you have to manage this space yourself,’ concluded Weiland.

Exclude from view: