Developing skills for HPC
Training and the development of specific skills to use HPC are becoming increasingly important as the number of users and potential applications continues to rise. Scientists and engineers in many disciplines can make use of HPC, it is not just a technology reserved for climate science and large scale astronomy simulations.
For many years most HPC users worked in national labs and were a relatively consistent set of employees involved in long-term research programmes. They were well supported by their research centres and Independent Software Vendors (ISVs) that continuously contributed time to training users and maintaining knowledge on the available tools. In contrast, many of the HPC users today have very different characteristics – they use HPC facilities intermittently and for short bursts of time. Today many more people need access to HPC facilities but do not necessarily have the experience and skills needed.
While these new users may be familiar with traditional cloud computing, which provides them with a bare-bones infrastructure to scale applications, they are less familiar with the HPC framework, which offers remote systems with pre-installed environments that are configured and targeted toward scientific computation.
Even though some uniformity does exist in the HPC world in terms of software stacks, the environments remain largely heterogeneous and therefore the users will be exposed to toolsets that they have never used before – or have very little experience with. As a result, these new users are not well-versed in the tools or systems, nor do they have a desire to gain in-depth, working knowledge of the infrastructure.
Furthermore, these users will rarely want to participate in full-scale training about a system they will only be used sporadically. This creates new challenges for developing successful training procedures for these HPC users.
In the typical HPC world, the most common type of training requests is either for an introductory course at the original purchase and delivery of the system, or for a set of advanced training packages for fine-tuning the skillsets of seasoned users. There has been an increase in the number of on-demand users and this, in turn, changes the type of training requests. There is a noticeable shift from users looking to learn all the bells and whistles to users wanting to maximise the use of tools for a specific problem. This is because the goal of HPC for these users is to solve a particular problem, to run a set of applications and receive a result, rather than to become experts in the use of HPC.
As HPC users become more common, and the use cases become more ubiquitous, it stands to reason that some users just need HPC on-demand rather than the traditional model. How can HPC facilities support these sporadic or part-time users alongside the more traditional users that demand frequent access to HPC systems.
HPC facilities are being challenged to provide more frequent and accessible training. In essence, they will have to take on a consumer-oriented model, in which the advice is narrowly focused on the task at hand, versus the model of a technical college that provides a full curriculum for developing an expert.
Regional HPC training and support
There are several opportunities for HPC training and skills development at various levels from introductory courses to application development and preparatory access to large scale tier-0 facilities. In this feature, we will discuss options available predominantly in the UK and Europe but this will by no means be an exhaustive list of the training options available to HPC users.
In the UK, training and skills development is predominantly provided by the Edinburgh Parallel Computing Centre (EPCC) and universities that have their own HPC resources. The EPCC, for example, is one of the major providers of training in high performance computing (HPC) in Europe, offering a range of courses for users of HPC throughout the UK and Europe.
ARCHER Training: The UK national supercomputing service, ARCHER, provides large amounts of HPC training that is free for all UK academics. Courses cover a range of abilities from beginner to advanced and are run at a variety of locations around the UK.
University of Bristol ACRC Training: The Advanced Computing Research Centre (ACRC) at the University of Bristol runs a number of HPC training courses and the majority of their material is freely available online for people to study remotely.
Bristol offers training and support across several primary areas including Linux and specific training for the various clusters at the university’s disposal. Before users are allowed to access the HPC systems they need to demonstrate some understanding of Linux and the University offers short courses on Linux and an intro to HPC training, access to the BlueCrystal HPC systems is also available for more advanced users or those with specific application requirements.
PRACE: The Partnership for Advanced Computing in Europe (PRACE) offers a number of different training opportunities including the PRACE Advanced Training Centre (PATC), PRACE training portal and access to advanced skills development PRACE training centres. These facilities deliver regular programmes of courses in many aspects of HPC and advanced computing. Bristol is an example of a university that offers several courses to support HPC users developing their skills. This includes introductory courses and more advanced offerings for experienced users.
PRACE and European training
PRACE makes it possible for researchers from public and private institutions from across Europe and the world to apply for resources on high-end Tier-0 HPC systems via a centralised peer review process.
PRACE is a huge provider of training and support for new and existing HPC users. While many may think of PRACE as delivering tier-0 facilities to European researchers it also offers new users and industrial users opportunities to get access to HPC systems through various programmes such as SHAPE and PRACE Preparatory Access. PRACE operates 14 PRACE Training Centres (PTCs) and they have established a state-of-the-art curriculum for training in HPC and scientific computing.
PRACE training courses are open to participants from all European countries. PTCs carry out and coordinate training and education activities that enable both European academic researchers and European industry to utilise the computational infrastructure available through PRACE and provide top-class education and training opportunities for computational scientists in Europe.
In addition, PRACE seasonal schools complement the PTC training program with three such events usually held throughout the year. One is usually held in autumn, one in winter and one in spring. Each of these is held in a non-PTC country (see next section) and at different geographical locations. Their curriculum is also different and usually varies to that of the PTC events. Registration for all PRACE Training courses is free and open to all. Specific courses can be found on the PRACE training portal.
In addition to computing time, support from a high-level support team (HLST) may be assigned to selected research projects. HLSTs will help projects of outstanding scientific value to further utilise the capabilities of PRACE Tier-0 systems through code optimisation.
HLSTs are available in combination to Tier-0 systems of the following PRACE hosting members: Grand Équipement National de Calcul Intensif GENCI, France; GAUSS Centre for Supercomputing GCS, Germany; CINECA – Consorzio Interuniversitario, Italy; Barcelona Supercomputing Center BSC, Spain; Swiss National Supercomputing Centre CSCS at the Swiss Federal Institute of Technology in Zurich (ETH Zurich), Switzerland.
The objective of PRACE Preparatory Access is to allow PRACE users to optimise, scale and test codes on PRACE Tier-0 systems before applying to PRACE calls for Project Access. The next PRACE call for proposals for Project Access will most likely open in Autumn 2021. Production runs are not allowed as part of PRACE Preparatory Access. Currently, PRACE offers four different schemes for Preparatory Access based on the type of application and the maturity of the project.
LearnHPC is a website set up to ensure that HPC is an accessible technology for the widest possible community of scientific researchers. The site acts as a gateway providing materials, resources and tools that will lower or remove barriers.
EU-wide requirements for HPC training are increasing as the adoption of HPC in the wider scientific community gathers pace. However, the number of topics that can be thoroughly addressed without providing access to actual HPC resources is very limited, even at the introductory level. In cases where such access is available, security concerns and the overhead of the process of provisioning accounts make the scalability of this approach questionable.
EU-wide access to HPC resources on the scale required to meet the training needs of all countries is an objective that we attempt to address with this project. The proposed solution essentially provisions virtual HPC systems in a public cloud. This infrastructure will allow us to dynamically create temporary event-specific HPC clusters for training purposes, including a scientific software stack. The scientific software stack will be provided by the European Environment for Scientific Software Installations (EESSI) which uses a software distribution system developed at CERN, CernVM-FS, and makes a research-grade scalable software stack available for a wide set of HPC systems, as well as servers, desktops and laptops.
Through the FENIX Research Infrastructure and AWS, LearnHPC offers the use of moderately-sized clusters configured specifically for your training events. At present, there is no specific mechanism to request access to LearnHPC resources.
In a recent interview with FENIX Research Infrastructure, Dr Alan O’Cais, software manager for E-CAM Centre of Excellence at Forschungszentrum Jülich, discussed the role of LearnHPC and the drive to develop HPC skills across Europe. ‘Through my involvement in the E-CAM Centre of Excellence and FocusCoE, I am aware that HPC training and education is a hugely important topic in the context of the EuroHPC Joint Undertaking. There is, however, an enormous logistical challenge in extending HPC training of a consistent standard to an ever-growing pool of researchers in 32 countries.’
‘One of the biggest hurdles that I foresee is providing educational access to HPC resources in a consistent way at the required scale,’ added O’Cais. ‘In the context of HPC training, I wouldn’t immediately draw a distinction between “students, researchers, and users”, I would see them all as learners.’
‘What LearnHPC will hopefully do for all learners is make the mechanics of accessing HPC training uniform, well documented and as easy as possible. We want to remove, hide or simplify the technical barriers that tend to increase the slope of the learning curve when it comes to HPC.
‘Learners may still ultimately need to know about ip-restricted ssh keys or how to compile the latest GCC compiler from source, but these can be introduced at a more appropriate time in their learning journey.’
- ARCHER » Training - www.archer.ac.uk/training
- ACRC training, Advanced Computing Research Centre, University of Bristol - www.bristol.ac.uk/acrc/acrc-training
- Training portal (prace-ri.eu) - https://training.prace-ri.eu
- PRACE preparatory access guide - https://prace-ri.eu/hpc-access/preparatory-access/preparatory-access-open-calls
- BlueCrystal Phase 4 user guide - https://www.acrc.bris.ac.uk/protected/bc4-docs
- BC4 User documentation - https://sso.bris.ac.uk/sso/login
- LearnHPC - Scalable HPC Training - http://www.learnhpc.eu