As storage technology adapts to changing HPC workloads, Robert Roe looks at the technologies that could help to enhance performance and accessibility of
storage in HPC
The storage market in high performance computing (HPC) stretches from tape-based archives that have been around for more than 50 years to flash-arrays with new storage-class memory technology on the horizon. In this varied and competitive market, choosing the right technology to suit a particular use case is paramount to ensuring efficient use of computing resources.
This can make for a confusing choice for HPC centres trying to decide what storage technology would be right for their procurement. Some vendors are pushing for tiered systems making use of flash, spinning disk and archive-based technologies in a single storage system. Others opt for software-defined storage (SDS) or all-flash arrays, and this is further complicated by the use of cloud-based storage systems.
‘There are a number of things we take into account,’ comments OCF’s HPC business development manager, Andrew Dean. ‘Obviously, technology is a big one. What is the right technology to meet customers’ requirements? As an integrator, we have essentially free reign to pick the right technology for a particular use case.
‘We have strong relationships with a number of vendors, so wherever possible we would use one of those vendors that we have experience with, but if a user was to have a requirement outside of that, then we would look at the market and find the right solution,’ added Dean.
Dean notes that there is no easy answer. In order to understand which technology would be best suited for a particular deployment, it is important to look at the underlying use case and competencies of the user community that the system will serve. OCF actively works with partners such as Lenovo, Netapp and DDN to provide storage systems for its HPC clients.
These companies cover a wide range of storage technologies for IBM Spectrum Scale, LUSTRE, GPFS, flash and hybrid flash arrays – and even object storage for archival. The user’s experience and familiarity with a certain technology could drive the decision for procurement, or it could be based on targeting certain workflows and trying to accelerate performance by removing a bottleneck in storage input/output (I/O) operations.
Changing approaches to HPC
This varied approach helps to drive business by providing a wide range of products and services that help OCF to meet customer requirements. However, this approach is not taken purely from a commercial standpoint, as Dean explains. The way users consume HPC resources has changed. This is particularly true in academia, where resources are pooled to deliver better ROI when procuring HPC systems. This, in turn, creates a more varied user community with different application requirements and expectations for HPC performance.
‘Even in the last 10 years since I have worked at OCF, there has been a change in the way that HPC resources are consumed. Different departments bought their own HPC systems, physics had a cluster, chemistry had a cluster but now they are being brought together as service,’ said Dean.
‘Systems are being designed now with the user in mind, rather than from a purely performance- or technology-based standpoint. You need the technology in place to meet the demands of users, but the technology has been slowly adapted to meet the needs of the user community – to offer the best service possible for a wide variety of users,’ said Dean.
Alex McMullan, CTO EMEA at Pure Storage, has a different view. He would argue that ‘everything is a parallel processing job it’s just a question of how it’s being dressed up’.
While it should be noted that Pure Storage specialises in all-flash based storage systems, the point, from McMullan’s perspective, is that the convergence of technology such as machine learning, machine vision and traditional HPC with containerisation and private cloud infrastructure, is driving many different computing industries towards a similar data-centric model.
‘Traditionally HPC was a compute-intensive job, with storage at the beginning and end acting almost as a mailbox and an aggregation point. But as the models have changed, the datasets have got bigger. We are no longer in SIMD computer-intensive space; the rise of Nvidia has changed that game hugely. Now, we have this massively parallel capability,’ said McMullan.
‘We could debate whether Larrabee kicked it off 10 years ago, but we have now got a very clear idea of what HPC used to be, but even purists see that the label of HPC has become much wider, much more broad and accessible. It is no longer about LINPACK, it is more about the datasets you have,’ added McMullan.
For today’s HPC users, it seems that there is less worry about the type of file system used but, rather, how the technology is deployed and how it can be consumed.
This is where similarities between technologies, such as object storage, cloud and software defined-storage start to become apparent. Although wildly different on the surface, their implementation aims to make storage more accessible, reliable or faster for users.
“Everything is a parallel processing job it’s just a question of how it’s being dressed up”
‘Object storage was brought to our attention as the next big thing maybe six to eight years ago,’ said Dean. ‘To start with, object storage made a lot of sense for users that had, for example, an in-house application that could be developed to work specifically with an object storage application programming interface (API).’
Dean noted that one of the main drawbacks of using this technology was that HPC users do not have just a single use case for a supercomputer. ‘They have huge quantities of storage, but many of our customers are research organisations doing lots of types of research with different use cases and storage requirements,’ added Dean.
‘That has been the thing that has stopped the wider adoption of object storage within my customer base until this point. However, there are technologies that sit in front of object storage that make it more usable in these multi-use case environments. Things like IRODS and technologies like StrongLINK, that sit in front of the object storage layer,’ said Dean.
‘We are seeing S3 [Amazon S3] being adopted as a kind of pseudo-standard as well. I think those things coming together are making it a little more general purpose, which allows the technology to be more widely adopted within our base,’ concluded Dean.
There are a number of buzzwords floating around the HPC storage market. Two of the most prominent are software-defined storage (SDS) and flash-based storage. SDS focuses on the use of commodity or cheaper storage media that use software acceleration, which enables a company to differentiate with its own reliability for performance features built into the software layer.
Flash storage refers to storage media created using flash-based memory. Flash is different to spinning disks, in that the technology is based on non-volatile flash memory.
Flash memory: a potential future for HPC
‘You could argue that we have been doing software-defined storage for years because we have been working with file systems like Spectrum Scale for a number of years. Flash is coming into play; we are starting to see that having an effect on our customers,’ said Dean.
The move towards flash is steadily gaining pace as users move away from large file systems in favour of something that can deliver increased performance for highly parallel applications, which are becoming more ubiquitous in HPC.
‘Now I think that is shifting, where flash is getting faster and the capacity is getting much larger, at the same time as the price of the technology falls,’ said Dean. ‘It is now at the point where it is being considered by a number of institutions. Exactly when the tide is going to turn, I don’t know but it is going to happen at some point. There will still be a place for disks for the foreseeable future, but it looks like we will be building at least the smaller HPC systems using pure flash.’
This move towards flash is a sentiment shared by McMullan, who noted that data telemetry from Pure Storage flash arrays points towards HPC workloads becoming increasingly random in terms of the I/O.
‘It used to be large streaming sequential workloads but now we see much more random I/O, particularly with computer vision, machine learning, training, design and rendering for the special effects market. Everybody is much more random and the dataset characteristics have changed, this plays nicely into the strengths of flash, because that’s where disk starts to fall down,’ said McMullan.
McMullan explained that traditional HPC storage systems from two to three years ago would have been disk-based on hundreds or thousands of spindles, all delivering around 100 IOPS, which provides huge sustained bandwidth. ‘That is great for those large sequential workloads, whether it is database warehouse table scans, or analysing a large sequential dataset, but the workloads have changed,’ said McMullan.
‘The use cases change the characteristics of an I/O stream. The more nodes there are, the more parallelism, which means those nodes are looking at different parts of the dataset. As the grid expands, the datasets change and we see evidence of that quite clearly. Disks don’t do random I/O well at all, whereas flash devices do not care, as there is no latency penalty in the whole address space,’ added McMullan.
“There will still be a place for disks for the foreseeable future but it looks like we will be building at least the smaller HPC systems using pure flash”
The rise of cloud
While HPC is performance-focused, there is also a huge effort to increase the efficiency of resources. This is true of the hardware elements as we push towards exascale, but it is also true of the way that resources are consumed.
Cloud storage offers high availability of data and can be particularly effective in certain use cases, such as for HPC centres across multiple sites. ‘We have the technologies available to be able to do this, such as Spectrum Scale. We are able to look at multi-site solutions and whether the cloud could be a backup target or another tier used for the lowest performance/cost – in the same way we have used tape in the past. It could also be used as replica target, there are all sorts of opportunities around the way we can use cloud,’ stated OCF’s Dean.
‘If someone has got two data centres already, and they have power and rack space available, then it might not make sense to replicate data into the cloud but if they do not have that infrastructure, then it could make a compelling argument. Again, it is very use case dependant and our job, as an integrator, is to keep all these things in our kit bag and continue building uses cases and prove POCs to make sure that when a customer comes to us with a requirement, we are ready to meet that requirement,’ said Dean.
OCF and Amazon Web Services recently announced OCF as an AWS partner, so it is likely that that OCF will be pushing this provider as an option for HPC users that want to take advantage of cloud computing in the future.
‘The interesting and fast developing side of things for us, is definitely how the cloud can be part of our solutions and customer strategy for the storage space. The cloud is definitely one to watch. It will be very interesting to see how that develops,’ concluded Dean.
The flash future
While McMullan is focused on an all-flash future for HPC storage, this may only cover the next few years, as new technologies are on the way to drive performance even higher. ‘HPC is leading edge, in terms of the technological envelope, but the message from our customer base is very much “all flash”. Not just because of performance, but also the efficiency and the cost efficiency.
‘We see the industry moving en masse to an all-flash based technology for now, but then you have storage class memory and persistent memory coming up as the next wave of media beyond that. This provides another exciting envelope that everyone will push into,’ added McMullan.
In his opinion, the industry is driving towards flash technology and this can be seen from research done by Seagate and Western Digital that highlights the demand for drives is coming down, while demand for video archive drives is going up.