HPC for the next generation of scientists - but not for beginners
Today’s high-end computers are not easy to use: they have tens of thousands to millions of cores, and the architectures of both of the individual processors and the entire system are complex. Achieving top performance on these systems can be quite difficult. Furthermore, every three to five years, HPC architectures change enough that different optimisations or approaches might be needed in order to use the new systems efficiently.
But the characteristics of high-end computer systems are by no means the only source of difficulty. The scientific problems that are tackled on such systems are typically quite complex, and are often at the leading edge of the scientific or engineering domain. Consequently, CS&E projects are usually carried out by teams with several to as many as dozens of researchers who have expertise in different aspects of the science, mathematical models, numerical algorithms, performance optimisation, visualisation, and programming models and languages. Developing software with a large team instead of one or a few collaborators brings its own challenges and need for expertise in software engineering, which also needs to be included in the team.
Reflecting on the CS&E landscape described above, I was motivated to organise the Argonne Training Program on Extreme-Scale Computing (ATPESC) – an intense, two-week programme that covers most of the topics and skills that are needed to conduct computational science and engineering research on today’s and tomorrow’s high-end computers. The programme has three goals. First, to provide the participants with in-depth knowledge on several of the topics, especially programming techniques and numerical algorithms that are effective in leading-edge HPC systems. Second, to make them aware of available software and techniques for all the topics, so that when their research requires a certain skill or software tools, they know where to look to find it instead of reinventing the tools or methodologies. And third, through exposure to the trends in HPC architectures and software, to indicate approaches that are likely to provide performance portability over the next decade or more.
Performance portability is important for applications that often have lifetimes that span several generations of computer architectures. It is tempting to write software tailored to the platform one is using and exploiting its special features at all levels of the code. Researchers in the early stages of their careers may not be aware that by taking that approach, they are exposing themselves to having to rethink their algorithms and rewrite their software repeatedly.
At Argonne National Laboratory, we have Mira, an IBM Blue Gene/Q system with nearly a million cores; therefore we have first-hand experience with the challenges described above. Mira is the current flagship system in the Argonne Leadership Computing Facility (ALCF) that is funded by the Office of Science of the US Department of Energy. The use of systems like Mira can enable breakthroughs in science, but to use them productively requires significant expertise in computer architectures, parallel programming, algorithms and mathematical software, data management and analysis, debugging and performance analysis tools and techniques, software engineering, approaches for working in teams on large multi-purpose codes, and so on. Our training programme exposes the participants to all those topics and provides hands-on exercises for experimenting with most of them.
The ATPESC was offered in the summer of 2013 for the first time. It was offered again this year from 3-15 August, taking place in suburban Chicago. The 64 participants were selected from 150 applicants and are doctoral students, postdocs, and computational scientists who have used at least one HPC system for a reasonably complex application and are engaged in or planning to conduct computational science and engineering research on large-scale computers. Their research interests span the disciplines that benefit from HPC, such as physics, chemistry, materials science, computational fluid dynamics, climate modelling, and biology.
In other words, this is not a programme for beginners. The strong interest in our programme indicates that we are filling a gap. For example, not many university graduate programmes in the sciences cover software engineering or community codes.
Some research laboratories are planning to offer similar training programmes. We welcome the proliferation, since there is a growing need for knowledgeable computational scientists and engineers as the value of HPC is recognised in many fields.