Advancing HPC in the UK

Share this on social media:

Robert Roe interviews Mark Parsons on the strategy for HPC in the UK.

Professor Mark Parsons is the Director of EPCC and Associate Dean for e-Research. In this interview, he discusses the development of HPC in the UK, the UK’s place in the European supercomputing community, and the importance of developing software alongside new HPC systems.

What is the UK’s place in the European HPC community?

The UK position in Europe is an interesting one from an HPC and e-infrastructure point of view, because if you look at computational science in the UK we are really strong. I think any of the European countries would say that the UK has always had a very strong computational science community.

We do so much science, so we are naturally going to have a large number of people doing computational science. We were also right there at the beginning of PRACE (Partnership for Advanced Computing in Europe), the current European community that spans all the member states.

But in 2010 we did not sign up to be a hosting member of PRACE and the countries that did were France, Germany, Italy, Spain and Switzerland.

The UK declined to sign up at that point, so we have been the largest general partner in PRACE, and I have always felt that we should have been a hosting member, although it wasn’t the government’s decision to do that and we see that continuing into EuroHPC today.

Even just putting Brexit to one side, it is not at all clear to me that we would have wholeheartedly signed up to EuroHPC and we have not signed up at the moment. This begs the question if we are not throwing our hat in with the European HPC centres – what exactly is our strategy?

What is the UK’s strategy for HPC?

If you were to go back to the 1990s and the first decade of the new century, you would see that the UK was investing at roughly the same level as these other big European countries. That tailed off in the last few years. Obviously we have had ARCHER (Advanced Research Computing High End Resource), which was a big investment but around the time ARCHER came around, the German centres were putting in three systems the size of ARCHER.

That has continued to this day. We are about to get ARCHER 2 but if you were to look at the German centres, you would see that each of them has an ARCHER 2 sized system. That it is generally accepted. There has been a lot of work over the last year, about how the scientific community thinks about our e-infrastructure needs. That is not just supercomputing, although supercomputing is a large part of it.

About a year ago a process was started by, what is now, UK Research and Innovation (UKRI) to develop an e-infrastructure roadmap, and this fed into a road-mapping exercise that was happening for all scientific infrastructure – telescopes, particle physics accelerators, all these kinds of things plus e-infrastructure.

So people came together from across the research councils and we produced a roadmap which included a supercomputing roadmap – which provided a good view of what the scientific community needed over the next six to eight years.

This was a very large amount of money that was asked for, if you add up everything from the different reports, which any government would look at and say ‘we cannot possibly afford all of that’, but there is, and will be, a period of identifying the budget and working out how to best spend the funds that are made available.

We are in quite a positive phase at the moment because, for the first time in a long time, there is a proper look at the UK’s supercomputing roadmap and general e-infrastructure roadmap.

This is in the hands of UK Research and Innovation (UKRI) and what the next, or current, administration wants to do with the spending review. There will be a full spending review every three to four years, so when that happens UKRI will be part of that review and will be asking for money – part of that will be scientific infrastructure, of which, e-infrastructure is a key part.

Will the UK get a Tier-0 system?

There is definitely a large debate, of which I am deeply involved, around what the UK should do with Tier-0 or exascale computing. We are not part of EuroHPC, so we are not going to have access to the exascale systems that appear in Europe in 2023, they will also have some very large systems in 2021, around 150 to 200 Pflop systems, and we will not have access to that which will have a detrimental effect on our scientific and industrial communities ability to use the largest scale of supercomputing.

That raises the question of can we partner with another country, or should the UK think about investing in a large Tier-0 system? That is an ongoing debate.

I work very closely with Susan Morel, head of research infrastructure at EPSRC, for the best part of the last decade. I have been the advisor to Susan (the representative from the government) on the PRACE council.

Susan and I felt for a very long time that we should form a consortium with some other countries and invest in a Tier-0 system jointly. We had a northern European discussion going on, but that stalled due to political events... but now if you look at EuroHPC, that is actually the model that they have followed. The first two pre-exascale systems will all be consortia of countries that have put money in, alongside additional funding from the European Commission.

I have been leading a team over the last six months that has been writing a business case, or options analysis, for what we might do next.

If you look at what is happening in Japan, the US and to a lesser extent Europe, the development of these huge systems, the implementation and developing them as services is coupled with big software programmes. I sit on the Gordon Bell Committee and what has been fantastic to see with Summit, the 200 Pflop system from the US that came online last year, is the high quality of papers coming from people who have been using the entire system to do science that was impossible before.

But that has only happened because the US has invested in an exascale software programme, while also beginning investments in exascale hardware. If you are going to buy one of these systems you need to couple it with a big software programme or you will not see the benefits.

The challenge with these very large computers is that the codes that we run today on a system like ARCHER do not scale properly on an exascale system.

If you consider what an exascale system might look like, you are probably talking about 6,000,000 cores, on ARCHER today we have around 150,000. On the large systems today you are probably talking about something between 700,000 and 1,000,000 cores.

Getting to these very high core counts is very technically difficult from a computer science perspective. That is what the programmes are focusing on in the US, Japan and Europe.

What is the UK doing to prepare for these larger systems?

Over the last 20 years we have been living in a golden age of simplicity. It didn’t feel like it at the time, but it was much easier. Many codes today are struggling to get beyond 20,000 cores and yet we are going to build computers over the next few years that will have five or six million cores.

There is an enormous amount to be done, not just to make existing algorithms go faster, but to go back to the drawing board. Algorithms that were designed in the 80s or 90s are no longer fit for purpose, we need to rethink how we are modelling.

The Advanced Simulation and Modelling of Virtual Systems project is an example of this. Rolls Royce wants to do the first high-resolution simulation of a jet engine in operation. They want to model the structure, fluid dynamics, combustion, electro magnetics, the whole thing. So that over the next decade they can go to virtual certification.

But in order to do that, we need to rethink who we model different parts of the engine.

The models we have today scale to, in some cases, a few thousand cores or less, so we have to rethink the mathematics, the algorithms and what it means to model in the face of very high levels of parallelism. It is a fascinating topic, but it is also extremely hard.

But, if we get it right, it will be the most critical change to modelling and simulation in the last 30 years.