Predictions in HPC for 2020
Laurence Horrocks-Barlow, technical director of OCF predicts that containerisation, cloud and GPU based workloads are all going to dominate the HPC environment in 2020.
There have been some interesting new developments in the High Performance Computing (HPC) market coming to light that will become more prominent in the coming months. Particularly, containerisation, cloud and GPU based workloads are all going to dominate the HPC environment in 2020.
AMD’s new second generation EPYC ROME processor (CPU) has shown in benchmark testing to perform better as a single socket configuration than any other competitors’ dual socket. This new AMD CPU is proving to be very powerful and able to support GPU computing, with the ability to leverage new memory technologies, support PCIe Gen 4.0 and significantly increase bandwidth with 64GB/s.
AMD has agreements with cloud providers AWS and Azure to put its CPUs on their cloud platforms and promote the use of AMD CPUs for HPC. This interesting move reflects how a lot of our customers are now planning their next HPC cluster or supercomputer to include AMD in their infrastructure design to better support AI workflows.
AMD had previously been out of the HPC market for a period of time focusing primarily on consumer-based chips, but this has dramatically shifted in recent months. With Intel opting not to support PCIe at present, it has given AMD a competitive advantage in the processor market. Lenovo has come a little later to the game, but we will see new developments with AMD later on in 2020.
Other new developments worth noting are Mellanox’s ConnectX-6n which is the first adapter to deliver 200Gb/s throughput, providing high performance Infiniband to support larger bandwidth capability. Also, the recently launched Samsung Non-Volatile Memory Express (NVMe) Gen4 SSD has significantly faster performance speeds, delivering double the speed of its Gen3 SSD.
Over the last year, we’ve seen a strong shift towards the use of cloud in HPC, particularly in the case of storage. Many research institutions are working towards a ‘cloud first’ policy, looking for cost savings in using the cloud rather than expanding their data centres with overheads, such as cooling, data and cluster management and certification requirements. There is a noticeable push towards using HPC in the cloud and reducing the amount of compute infrastructure on-premise. Following the AMD agreement with cloud AWS and Azure and their respective implementation of technologies, such as Infiniband, into these HPC cloud scenarios, it’s becoming more likely to be the direction universities are heading in 2020.
I don’t foresee cloud completely replacing large local HPC clusters in the near future, but for customers with variable workloads, on-premise HPC clusters could become smaller and closely tie into the public cloud to allow for peaks in utilisation. Additionally, we’re increasingly seeing an uptake in using public cloud providers for ‘edge’ cases such as testing new technologies or spinning up environments for specific projects or events. With further understanding of the technologies involved and user requirements, most universities and research institutions are at least considering taking a hybrid approach.
One of the major downfalls of HPC in the cloud is the high cost of pulling the data back out of the cloud, which is understandably a cause for resistance in some organisations moving towards the cloud.
However, there are products coming onto the market from both NetApp and DDN that are ‘hybrid-ised’ for the public cloud, whereby you are able to upload some of your storage into the public cloud, process it and only download the changed content. This means only being charged for the retrieval of the new data that is required rather any more than is necessary.
Only a year ago, every storage vendor needed to have a cloud connector so organisations could move their data into the cloud and move it back in its entirety. The recognition by these storage vendors that organisations don’t want to store all their data on the cloud and only move small amounts of data in and out, will avoid the huge expenditure of data retrieval and move the adoption of HPC in the cloud forward in 2020.
Containerisation and storage developments
There is a big push on the parallel file system BeeGFS, now available on open-source which is seeing some extremely positive bandwidth results within HPC compute clusters. There are storage vendors who are now looking at containerising BeeGFS, so it can be included on embedded devices in the storage system to ensure faster deployment and configuration management.
Containerisation for a file system in a virtualised environment is becoming increasingly popular, notably IBM is looking at it for its IBM Spectrum Scale storage solution to ease the deployment of their IBM ESS product.
Containerisation allows you to put your applications or file systems in a ‘wrapper’, so they become very mobile, with the ability to tie them into standard configuration management. By designing components of the cluster as a container in the lab, it allows for faster deployment, ease of management and upgrading on-premise.
A lot of research institutions are using containerisation to containerise their scientific applications and experiments, as it enables a researcher use of the entire HPC environment with all the libraries and applications for an experiment. The researcher can then replicate the experiment multiple times around the cluster (emulating a 100 node job), running their experiment within this containerised environment, with very little dependencies on the host operating system or the administrator’s configuration of the cluster.
Once the experiment is complete, the researcher can archive the container which can then be easily reloaded multiple times on different occasions, making re-configuration much simpler and data retrieval more cost-effective.
The ability to restrict the containers and section up the memory, to avoid any memory leaks, is certainly becoming more prominent in recent months. Some providers are starting to limit access to the same system, via a total encryption multi-tenant approach, which secures part of a memory between containers and virtual machines (VMs), so they aren’t able to see each other’s memory maps.
One of the major security aspects of cloud computing and containerisation is the concern that other users or tenants on the system are able to start looking at memory maps and leaking information of research that is confidential for example, medical research using non-anonymised data. Having new security technologies coming onto the market whereby you are limiting the scope of the container or how the VM is able to access the memory goes a long way to reducing that worry.
GPU computing has become more significant with the rise in deep learning, used by artificial intelligence, data mining and data analytics.
NVIDIA’s support for Arm-based HPC systems combined with its CUDA-accelerated computing is giving the HPC community a boost to exascale. ARM‘s ability to produce incredibly low powered CPUs has incredible benefits in an HPC environment.
With many new technology developments and positive uptake of cloud and containerisation, 2020 will herald exciting times for the HPC market.