The latest cloud technologies for HPC and AI in 2023
A roundup of cloud technology providers that support researchers using HPC
Cloud computing provides huge potential to scientists and researchers, who can use the technology to access computing resources or new and emerging technology. Cloud can also help to facilitate collaboration, help organisations scale quickly and provide security and ease of use for domain experts accessing complex computing architectures – enabling researchers to get the most out of their investment in computing resources.
Cloud technology has reached a maturity level that makes it appealing to high performance computing (HPC) users. Whether using public or hybrid cloud, the technology offers flexibility for users who can create or ‘spin-up’ nodes with specific architectural requirements, use cloud bursting to increase the capacity of their in-house infrastructure – or it can increase the agility of a company that shares data over multiple sites.
Cloud enables organisations to access emerging technologies such as quantum computing hardware without the investment in prototype technologies. Users can adopt a strategy to learn and understand how a computing system can impact their business using a pay-per-use model, which enables them to evaluate new technology and then scale as necessary.
In the past, one aspect of designing and procuring HPC systems was the need to create a balanced architecture. This means looking at the kind of applications that will be run on a particular cluster to try and match the requirements of applications with the technologies that are needed. For example, some workloads require large memory nodes, high-speed storage or interconnects, or high-performance storage.
Cloud HPC allows people setting up this infrastructure to make more efficient decisions, particularly if they are cloud bursting or developing a hybrid cloud strategy – as they can build their in-house resources to cater for 80 per cent of the user requirements while using the cloud to provide GPUs or specific node architectures that suit a small number of users.
This allows all applications to benefit from this balanced architectural approach while still being able to cater to the specialised applications that have more niche requirements.
Based on Alibaba Cloud infrastructure, Alibaba Cloud Elastic High Performance Computing (E-HPC) is an end-to-end public cloud service. E-HPC provides individual users, education and research institutions, and public institutions with a fast, elastic and secure cloud compute platform that interconnects with Alibaba Cloud products.
atNorth is a Nordic data centre services company offering environmentally responsible, power-efficient, cost-optimised hosting facilities and HPC services. atNorth offers sustainable and extremely scalable HPC resources fully delivered as a service, enabling its users to focus on their simulation applications and calculations without worrying about the underlying HPC infrastructure.
AWS provides the most elastic and scalable cloud infrastructure to run your HPC applications. With virtually unlimited capacity, engineers, researchers and HPC system owners can innovate beyond the limitations of on-premises HPC infrastructure. AWS delivers an integrated suite of services that provides everything needed to quickly and easily build and manage HPC clusters in the cloud to run the most compute-intensive workloads across various industry verticals.
Cirrascale Cloud Services is a premier cloud services provider of deep learning infrastructure solutions for autonomous vehicles, medical imaging, natural language processing and other deep learning workflows. The company was designed to focus on helping clients choose the right platform and performance criteria for their cloud service needs.
LMX Cloud from Define Tech is a comprehensive Cloud HPC cluster management stack that supports a broad range of workloads and software environments, enabling organisations with an agile and scalable IT infrastructure. One of its many key features that speaks to HPC users in particular is the ability to ‘compose’ or dynamically configure their HPC resource when demand dictates. With LMX Cloud, HPC users and IT admins can auto-provision resources from pools of compute, GPU, FPGA, NVMe and storage-class memory in seconds and scale up or out as needed – all from a single, easy-to-use management interface, and this composable HPC feature is also compatible with job schedulers, so can be automated. HPC is all about scale and speed.
Google Cloud’s (HPC) solutions are easy to use, built on the latest technology and cost-optimised to provide a flexible and powerful HPC foundation that clears the way for innovation. Google Cloud enables users to scale their team and use pre-configured HPC virtual machines (VMs) to get jobs started quickly and with predictable performance. Get deeper insights and explore your results using Google’s AI and machine learning (ML) capabilities.
The Grey Matter Connected Cloud is a comprehensive pathway to ensure scientists and researchers are connected to the cloud. Their specialist cloud Solutions Team can help you build a cloud strategy and work with you to transform your business with the right licensing and cloud configuration, mobile devices for business, end-to-end cloud migration services, and post-deployment training and support.
Gompute provides a flexible HPC platform for CAE workflows and simulations. Gompute’s compute node capacity delivered in the service is bare metal, equipped with a high-speed, low-latency interconnect and large memory options. Private Cloud Access dedicated hardware to outsource ‘steady’ capacity or burst individual projects.
H66cloud from Hydro66 provides a mature enterprise-grade cloud environment, instant launch, high performance with GPU options and zero maintenance. The company says there is no single point of failure, 100 per cent guaranteed uptime, no upfront costs and the opportunity to cancel at any time. The customer controls whether to pay in five-minute increments and only for what they run, or longer commitments for known workloads. Real-time technical support is available around the clock when you need a helping hand.
The Azure HPC OnDemand Platform, or azhop, delivers an end-to-end deployment mechanism for a complete HPC cluster solution in Azure. Industry-standard tools, such as Terraform, Ansible and Packer, are used to provision and configure this environment. Each environment contains an Open OnDemand Portal for unified user access, remote shell access, remote visualisation access, job submission, file access and more, an active directory for user authentication and domain control, a PBS job scheduler and Azure Cycle Cloud to handle autoscaling of PBS nodes through PBS integration.
Nimbix offers cloud and on-premises HPC, giving engineers and scientists access to infrastructure and the software needed to build, compute, analyse, scale and deploy simulation and AI/ML/ DL applications for faster, more powerful, less expensive cloud computing. The Nimbix Supercomputing Suite is a flexible and secure as-a-service HPC solution. This as-a-service model for HPC, AI and Quantum in the cloud provides customers with access to one of the broadest HPC and supercomputing portfolios, from hardware to bare metal as a service, to the democratisation of advanced computing in the cloud across public and private data centres.
Penguin Computing’s Cloud Technology practice is focused on delivering software-defined architectures that enable you to run your workloads regardless of where your compute or data resources reside. The company suggests these platforms deliver the advances of a Cloud 2.0 world, where workloads are delivered on simultaneously addressable resources. Its goal is to enable you to run workloads everywhere as a seamless user experience by removing the complexities of workload portability, inclusive workflows, data locality and remote visualisation.
Open hybrid cloud is a recommended strategy for architecting, developing and operating a hybrid mix of applications, delivering a truly flexible cloud experience with the speed, stability and scale required for digital business transformation. The flexibility to run your applications across environments – from bare metal to VMs, edge computing, private cloud and public clouds – without having to rebuild applications, retrain people, or maintain disparate environments is the outcome of implementing an open hybrid cloud strategy.
ScaleCloud Enterprise from Scalematrix is designed to address the common trade-offs in cloud environments for compute-intensive workloads. The company says the product features top-of-the-line Intel processors and HPE servers housed in cabinet technology.
UberCloud – a cloud simulation platform for engineers – helps engineers run their simulation tools with high performance and reliability in the cloud. The company says its self-service software platform lets you create scalable cloud clusters, all while using the native GUI of Ansys, COMSOL, CST, NUMECA and more. There is no loss of features with simplistic web portals that only support batch use cases.