Supporting skill progression in HPC
DiRAC has created a structured training programme with a user feedback loop to drive training and skills development across its entire user base.
What can you tell our readers about the DiRAC facility and its resources?
Dr Clare Jenner, Deputy Director, STFC DiRAC High Performance Computing Facility: DiRAC is supported by the Science and Technology Facilities Council (STFC), funded through UK Research and Innovation (UKRI), and it is free at the point of use for academic researchers. DiRAC provides distributed HPC services to the particle physics, nuclear physics, cosmology, astrophysics and planetary science theory communities across the UK.
Due to our mixed research requirements across that very broad field, we run three different compute services hosted at four university sites across the country. Each of our resources has an architecture specially tailored to the different algorithmic problems our communities need to solve.
All of our users belong to an academic research project, which is run by a principal investigator based at a university in the UK. These groups will usually have codes in place that are being developed and added to by PhD students, early-career researchers and the post-docs and academics working within the team. People may come into a group with their own codes or write codes from scratch.
In terms of requesting resources, we have an annual call for time on our services, which includes a scientific justification and a technical application demonstrating that the DiRAC resource you’ve requested is right for the problem to be solved. You can also request research software engineering (RSE) time from our in-house RSE team if codes need to be ported or optimised for our technology.
Richard Regan, Training Manager, DiRAC: We service a broad spectrum of users; some researchers are really experienced and know what they're doing, but new PhD students, for example, might not have an awful lot of experience having only just come from their undergraduate programme and these users need a lot of training to make the best use of our services. Some users may have already worked on codes on local systems or their local clusters and have limited experience of HPC or have GPU skills. Still, it is up to us to build on those skill sets so they can be translated quickly into solid scientific output.
The first step for new users is the Essential HPC-Skills training programme. Can you explain how that was set up? What has changed over time?
Dr Clare Jenner: The training programme started back in 2011 with the creation of one of the UK’s first HPC-Skills tests – the driving licence. DiRAC was a much smaller service then and the objective of the test was to make sure everybody had the skills they needed to use the machines.
As the programme has evolved, we have created a set of supportive online materials to go with the test by trawling the web to find appropriate existing training resources and putting together a set of selected links on our website. We start off with a basic set of essential skills by covering the Unix environment, command scripts, version control, software engineering and testing and code scaling, for example. These are the very basic skills that aim to expose users to what they need to have confidence and a good understanding of HPC when they start to work on their code on our systems.
The skills we cover were pinpointed by asking the project investigators what skills they thought their new users should have when they joined their teams and what they would like them to have when they left, and this helped us to create a progression path through the skill set.
The training programme is continually evolving to meet the user's needs and now we're moving away from linking to other people's training materials and developing a new suite of material covering the same basic curriculum – but developed specifically for DiRAC users using DiRAC resources. This is a combination of bespoke instructor-led classes and self-taught materials and we're going to launch that course to our users early next year.
What opportunities are there for skills development after the Essentials programme?
Richard Regan: As I mentioned we've got a lot of users with different skill sets and the Essentials programme is just the first step where we give them the key skills to get onto our systems and to get their research started.
Once they've got those skills, we then support them in their growth by giving them opportunities for other courses that are specific to our systems. We run CPU essentials, GPU taster sessions and fundamental and advanced CUDA courses.
For example, in a few weeks’ time, we've got a CPU course: “AMD Induction Training”, that will help users to get the best out of our AMD systems. We're looking at compilers, optimisation and profiling. But you can only get the best benefit from that training if you already understand the basics of how to use our systems. Then these more advanced courses help researchers to develop more specific skills that allow them to take direct advantage of DiRAC’s particular resources.
Dr Clare Jenner: As Richard said, we have a broad spectrum of users in DiRAC and some of them are extremely proficient at coding. They are very highly experienced, and this allows them to explore the advantages of our resources in a very in-depth way. For these users, we have our hackathon programme.
Here, we team up with our vendors, for example, AMD, Intel, or Nvidia, for a three-day event where we invite users from our core code sets to come along and sit in a classroom with vendor technology specialists and in-house DiRAC experts from our RSE and technical teams and work on their specific codes, optimising them and testing new hardware or new software. Hackathons are for our most experienced researchers; it's cutting edge; it's an opportunity for knowledge transfer directly from the people who make the hardware to the researchers in a science project team.
More recently we have branched into providing training on advanced applications for science for all of our users in parallel to the HPC skills. Science is undergoing a data explosion and artificial intelligence (AI) and machine learning (ML) techniques are revolutionising the way scientists tackle their research. We now run a “Machine Learning Techniques for Science” course where we’ve collaborated with SciML, STFC’s Scientific Machine Learning Group, to put on a hands-on practical course showing how to use techniques such as decision trees and neural and deep neural networks within codes. These courses are extremely popular and are often over-subscribed with 24 hours of advertising, so we are now developing a more advanced AI/ML course tailored to our research communities as a follow-up to this more introductory course.
How do you know when to update or make changes to training opportunities?
Richard Regan: We get regular feedback from our users who have taken our training to tell us how useful the skills training is and what they would like to see in the training programme going forward, and we survey our entire user-base annually. We also get feedback from our vendors on what technologies are coming up and if our users want to use it. Then we get together and get them trained.
Dr Clare Jenner: DiRAC also has a long track record of collaborating with industry and we also run an industrial placement programme, where we partner with companies in the industry and the public sector to give our users the opportunity to take a six-month sabbatical from their research and work on an industrial project that is of interest to both the company and the user. For example, we've run these with Transport for London (TfL), where our users applied natural language processing to classify work orders, and with Guy’s and St Thomas’ NHS Foundation Trust where students used ML methods to measure the effect of deprivation indicators on asthma in young adults. These students come back and spread the knowledge and experience of the placements through our community, promoting the programme to other users.
We’ve also recently arranged some quantum placements where our users have worked with Atos’ quantum simulator at the Hartree Centre on projects looking at quantum field theory and the classification of pulsars. As part of those placements, the students also designed and delivered a workshop to introduce other DiRAC users to the basics of quantum computing and the practical application of quantum ML skills in research. This allows us to feed direct knowledge from the placements back into the DiRAC community. This kind of feedback is important because in four or five years when DiRAC comes to design new services, we will have users who are familiar with the new technologies and can help inform our procurement decisions. There is much more information about past projects and future industrial and academic placements on our website.
Dr Clare Jenner is Deputy Director of the STFC DiRAC High Performance Computing Facility
Richard Regan is the Training Manager for DiRAC