Tech focus: software tools
Software tools can help to manage the complexity of HPC systems by enabling HPC experts or scientists to focus on the jobs and scientific research being conducted, reducing the burden of running the system day to day.
That is not to say that an HPC system should be autonomous, but by carefully selecting the right set of software tools, from job scheduling to cluster or resource management, HPC users can reduce some of the complexity of running modern HPC systems and ensure efficient utilisation of their computing resources.
While there are several commercial tools available from vendors such as Altair, Bright Computing and Univa, there is a growing movement to develop open source tools for the HPC community.
The OpenHPC project, part of the linux foundation, is a collaborative effort to provide a reference collection of open-source HPC software components and best practices which are designed to help lower the barrier of entry deployment, advancement, and use of modern HPC methods and tools.
The collaborative nature of this project means that there is a community of organisations and developers working from a common desire to aggregate a number of common tools required to deploy and manage HPC Linux clusters. The community includes representation from a variety of sources, such as software vendors, equipment manufacturers, research institutions and supercomputing centres which aim to cover a wide variety of use cases.
This includes the provisioning of tools, resource management, I/O clients, development tools, and a variety of scientific libraries to help scientists and researchers use HPC more effectively.
The large number of contributors drives the varied aims of the project, as the partners aim to provide a stable and flexible open source HPC software stack, validated to run on a variety of hardware platforms. The project also aims to increase simplicity and to reduce the cost of deploying and managing HPC systems. This includes the performance and efficient utilisation of HPC systems, including insights and technical contributions from across the HPC ecosystem which are integrated and made available to the community.
The project members design packages which are pre-built with HPC integration in mind, with a goal to provide reusable building blocks for HPC. Over time, the community plans to identify and develop abstraction interfaces between key components to further enhance modularity and interchangeability.
In a presentation from August at the HPC Knowledge Meeting ‘19, Karl Schulz, OpenHPC project lead discussed the development of OpenHPC and the project’s vision to support a wide variety of users and systems. ‘In the last several years, OpenHPC has emerged as a community-driven stack providing a variety of common, pre-built ingredients to deploy and manage an HPC Linux cluster, including provisioning tools, resource management, I/O clients, runtimes, development tools, containers, and a variety of scientific libraries,’ Schulz said.
‘When we were forming this project, we really wanted to have people from different points of view. People like me, users of HPC systems, then people who develop software for HPC... people doing scientific research and the people who are administering HPC systems. The vision is that we want to have the points of view of a lot of different folks, including vendors,’ said Schulz.
‘It is building-block orientated, you do not have to use all of it. You can opt-in for bits and pieces. The same argument exists with provisioning systems. Maybe it’s not the one that you use, so it is easy to pull in other stuff. The other thing that has changed for me, is the fact that you can use these packages in your container, whether that be Docker or Singularity or Charliecloud, and I find myself doing this more and more,’ Schulz continued.
This focus on ease of use and making HPC more accessible is shown throughout the work done on OpenHPC. In addition to supporting this opt-in mentality to the feature set, OpenHPC also provides tools for multiple HPC architectures.
‘When we started we were just doing builds for X86, but we added Arm a couple of years ago and, one of the things I find to be pretty valuable is that, from a user point of view, the environment is really the same. The packages are named the same way. While they might be built differently and have different dependencies, from the point of view of a user who wants to get on and compile some code, and run some jobs into the resource manager, it looks basically identical,’ said Schulz.
Managing HPC infrastructure
Recently Bright Computing released details of a project to deliver its tools to the National Institute of Water and Atmospheric Research (NIWA), part of New Zealand eScience Infrastructure (NeSI)
NeSI is a collaboration of four institutions, including NIWA, working to provide HPC, analytics data and consultancy services to the science sector, government initiatives/agencies, and industry in New Zealand.
When HPC requirements threatened to exceed NIWA and NeSI’s resource capacity, they looked towards a future-focused upgrade that included OpenStack and cloud technologies. NeSI chose to partner with Cray and Bright Computing, who provided the hardware and software tools necessary for this new approach to the country’s eScience Infrastructure.
The proposed solution needed to enable two disparate locations to share data and practices with the primary systems residing in Wellington, and the backup system housed in Auckland. The system was also required to consolidate HPC investments into a single facility – including a large HPC cluster closely coupled to a Cray XC-class supercomputer located in Wellington, and a disaster recovery site in Auckland – to increase performance and reduce datacentre complexity. OpenStack was chosen to manage an on-premise cloud environment and the NeSI staff wanted this new solution to reduce the overall cost and management requirements.
Cray was selected for the hardware with an integrated OpenStack platform, and it partnered with Bright Computing to interpret NIWA and NeSI’s system requirements and help NeSI build a hybrid HPC and private cloud infrastructure, based on Bright Cluster Manager and Bright OpenStack.
Involving months of investigation, collaborative design, and solution realignment, Bright Computing and the combined Cray, NIWA and NeSI teams finalised a new infrastructure which includes three clusters, in part managed by Bright solutions. One cluster coupled to each of the Cray XC-class supercomputers, and a third cluster for development, test, training, and education.
Bright Cluster Manager lets users administer clusters as a single entity, provisioning the hardware, operating system and workload manager from a unified interface. Once your cluster is up and running, the Bright cluster management daemon keeps reports of any problems it detects in the software or hardware.
The software includes node provisioning, a graphical user interface (GUI), comprehensive monitoring which can monitor, visualise and analyse a set of hardware and software metrics, plus GPU management and cloud-bursting features.
Bright OpenStack is designed to easily deploy, manage and operate cloud infrastructure. The software sims to provide deployment on bare metal, advanced monitoring and management tools and dynamic health-checking, all in a single package. The single interface makes it easier to build a robust cloud with existing resources through its installation wizard, allowing users to install on bare-metal and configure OpenStack systems using role assignments.
New Zealand’s research, science and innovation minister, Dr Megan Woods, said: ‘This marks a step change for science in New Zealand and a further advancement towards an innovative, future-focused society.
‘The supercomputers are a significant upgrade with 10 times the computing capability of their predecessor. This will have a whole range of benefits for scientific research, including better understanding issues around climate change, genomics, the management of New Zealand’s freshwater assets and resilience to natural hazards.
‘One of its key uses will be to advance weather forecasting, enabling more precise forecasts and helping to refine forecasting of climate extremes and hazardous events. Improved weather forecasts will enhance the ability of critical services, such as Fire and Emergency New Zealand, to both identify and manage hazards. It will also help farmers and environmental managers make more informed decisions, using the best information available,’ Woods continued.
‘This investment of $23m represents some of the world’s most advanced supercomputing power and has been made possible by a strong collaborative initiative between NIWA and NeSI. The capabilities and potential have extended enormously since NIWA received the country’s first supercomputer almost 20 years ago,’ added Woods.
‘This facility is at the leading edge of international science. This is a crucial resource for New Zealand science that will assist our researchers to seek solutions to some of today’s most urgent problems.’
Racing to increase HPC utilisation
The number of cloud-based HPC systems or hybrid cloud environments are increasing rapidly, as this allows users to consolidate HPC resources across multiple sites, and provides an easier way to manage these disparate systems. Univa, a company previously known for cluster provisioning software, has been making significant strides in cloud provisioning tools.
One of the company’s main tools, Univa Grid Engine, is a batch-queuing system, derived from Sun Grid Engine. The software schedules resources in a datacentre and applies policy management tools. The product can be deployed to run on-premises, using cloud computing, or in a hybrid cloud.
In October, Oracle announced that it had been working with SportPesa Racing Point F1 team to integrate Univa Grid Engine with the company’s computing infrastructure. The team has a computing infrastructure capped at 25 Tflops, so it is of the utmost importance that the resources can be utilised as efficiently as possible. It uses Grid Engine, to manage its CFD cluster, which is reported to deliver a sustained 97 per cent utilisation. Otmar Szafnauer, team principal and CEO, said: ‘Univa help us with bringing CFD developments to reality faster. So they help with efficiency with our compute power.’
Sportpesa Racing Point also utilises Grid Engine, to ensure the correct simulations are running at the same time, provide the most efficient use of applications, and result in a quick turnaround of work throughput.
‘We wouldn’t be working with Univa if they didn’t help us. Univa helped tremendously in making us more efficient in running CFD programs, getting accurate results and doing so very quickly. And because we can get accurate results quickly, that means it shortcuts our development of the car, and gets us quicker lap times sooner – and that’s invaluable in the sport, because that’s what it’s all about,’ added Szafnauer.
Grid Engine software manages workload placement automatically, maximises shared resources, provides enterprise-grade dependability and accelerates deployment of any container, application or service in any technology environment, on-premise or in the cloud.
Monitoring and reporting tools allow users to track and measure resource utilisation in workload managed clusters with comprehensive monitoring and reporting.
Container support is available for containers such as Docker or Singularity, in a Univa Grid Engine cluster and blend containers with other workloads supporting heterogeneous applications and technology environments. The software also provides GPU support to help users get the most out of GPU-powered servers by optimally mapping Machine Learning, HPC or other GPU-based workloads onto the topology of GPU servers in clusters or clouds.
Managing massive volumes of data
Another recent case study from Univa highlights its work with Germany’s Bielefeld University, which aimed to make better use of the data generated by biotechnological activities and research projects at the university.
The Center for Biotechnology (CeBiTec) constitutes one of the largest faculty-spanning central academic institutions. As part of CeBiTec, the Bioinformatics Resource Facility (BRF), led by Dr Stefan Albaum, provides a high-performance compute infrastructure accessible to its 150 members and all partner groups in the Faculties of biology, technology and chemistry, and more than 1,000 national and international researchers and affiliates from academia and industry.
The BRF was utilising an earlier version of Grid Engine for controlling access to compute resources. Increasing demand for high performance computing from CeBiTec’s researchers and partners meant workloads were no longer being processed efficiently.
Generating enormous amounts of experimental data places a heavy burden on the HPC clusters tasked with its storage, processing, visualisation and integration. For example, the assembly of the DNA sequence of an organism, based on the results from a high-throughput sequencer is highly RAM-demanding, as millions of very short DNA sequences are puzzled together to finally yield complete DNA sequences. These complex workloads were creating inefficient resource usage and bottlenecks.
BRF selected Univa Grid Engine for its optimised throughput and performance. ‘Right away, Univa Grid Engine enabled highly efficient usage of our compute resources with a very small footprint,’ said Dr Albaum. ‘We like the fact users who do not have experience can quickly submit jobs on the cluster. Univa is an established, easy-to-use system for managing the largest-scale processing of huge datasets.
‘Univa is an outstanding workload orchestration solution for the distribution of large numbers of jobs on a compute cluster – even for heterogeneous set-ups like ours,’ concluded Dr Albaum.