Building the Fugaku supercomputer
Professor Satoshi Matsuoka, director of the RIKEN Center for Computational Science, explains the 10-year process behind the development of the machine.
'The supercomputer has been in planning for 10 years. Now we have all the hardware installed on-site, but of course, being a very large machine, there are a few things we need to finalise,’ explained Professor Matsuoka.
‘General production, meaning anybody who can submit a proposal can access the system, that will take a bit more time – six months or so. However, we have early users, in particular the COVID-19 teams that are trying to tackle this global crisis. We also have some other teams working on results. Those early-access users who are expected to produce results are already on board.’
‘Starting in October we will be accepting some trial usage submissions, so effectively the machine is doing quite a bit of science at this point – but, officially from the government standpoint, the machine’s general production date will be somewhere around early April 2021.’
The Fugaku supercomputer is housed at RIKEN which is a network of research centres across Japan, with main campuses in Wako, Tsukuba, Yokohama, Kobe and Harima. RIKEN’s activities can be divided into four main categories: strategic research centres, research infrastructure centres, the Cluster for Pioneering Research and the Cluster for Science, and the Technology and Innovation Hub. Fugaku will support researchers across these disciplines, as well as industrial and academic partners across the world.
While the development has been extremely successful with the Fugaku system claiming the number-one spot on the Top500 and the HPCG benchmark as well as the 9th position on the Green500. There are still some developmental bugs and hardware checks and replacements that must be done to ensure the system is working effectively.
‘There are some software bugs. Seeing that this is the largest system ever built there are some things we have not anticipated, not just in hardware but also in software. For example, it turns out that some of the software does not scale to this size so we have to go and fix some of the different software pieces,’ commented Matsuoka.
‘It [Fugaku] uses a lot of power so we have to figure out how to best manage the power while giving users the maximum experience. There are a lot of things to be done but this is being done in parallel to these early science activities – both Covid-19 related and some early science applications,’ added Matsuoka. ‘They co-exist on the machine pretty much 24/7 now. They are also helping because they are running real applications. It is a long process and, in some sense, the scale of the machine makes it challenging.’
Professor Matsuoka also commented on the development of the system and how the proliferation of the technologies underpinning this new system will help to benefit the wider scientific community: ‘Our objective was never to build one gigantic machine but also for this technology to proliferate If we are widely successful then there should be a commercialised machine bigger than Fugaku potentially in the cloud somewhere,’ noted Matsuoka. ‘It remains to be seen if this happens with the same chip or maybe a derivative.
‘To run this as a national flagship machine has its own rules and guidelines so that, in some cases, may make access hard for some people. To have a variety of machines in a variety of places will allow technology proliferation not just for diversity but to push the envelope.’
Industrial and societal impact
A key distinction between this new Japanese flagship supercomputer and the previous systems whether it be the K computer, Earth Simulator or the Numerical Wind Tunnel, is that the Fugaku machine has been designed around industrial usage and societal impact rather than a focus on basic science research. It is also designed to be extremely general-purpose to support a wide variety of users and applications including big data and AI.
‘People have these misconceptions that Japan has had a national program to develop general-purpose supercomputers for a long time, but actually, that is not true. In the past, there are other machines like the Numerical Wind Tunnel and the Earth Simulator but all of these machines were actually purpose-built for a specific set of applications. The Earth Simulator was for climate research principally – of course, it was used for other stuff but principally it was designed and used for climate research,’ stated Matsuoka.
‘K was the first machine to be built as a general-purpose but of course, it couldn’t go that far. It lacked generality because the instruction set architecture was quite different. It was SPARC that was general-purpose but SPARC was already on its road to extinction, unfortunately. It had some industrial applications and objectives but not quite to the extent that the people involved would have liked,’ Matsuoka added. ‘While the K computer paved the way for Fugaku we decided to have a much stronger industrial program and much more extensive industrial adoption because at the end of the day you really want your machine to have societal impact – and that is very difficult if you just focus on basic science.’
Professor Matsuoka noted that Fugaku was developed with an ‘applications first’ philosophy. In order to realise this philosophy, the teams working on Fugaku set up a program to facilitate application co-design that runs in parallel to the development of the hardware.
‘If you look at the topical areas of the applications there are nine of them and only one is basic science,’ said Matsuoka. The remaining eight are all related to the industry whether it be pharmaceutical, medical, environmental, energy. There are also some topics directly tied to industry use of HPC such as the development of new materials and engineering systems.
‘From the onset, Fugaku has been designed with a significant industrial focus. Right now I think we are achieving that goal as we have generated significant attention from the industry users who to access Fugaku not only to incubate the far-off technologies but also to have an immediate impact in terms of developing products for industry,’ stated Matsuoka.
Matsuoka explained that there will be dedicated industry cycles but in order to reduce turn around times and not force industrial users to share intellectual property, they will be charged for time on the machine.
‘If you are doing basic research you will be subject to submitting a proposal and stringent reviews and justification. With industry, the turn around is quicker and you do not have to reveal all your secrets. We feel that this will be more or less the ideal model because industry will try to innovate and because they require a much higher turnaround time they will be willing to pay for cycles,’ added Matsuoka.
Supporting emerging technologies
In addition to this application first philosophy and co-design with existing applications, the RIKEN team noticed early on that they would need to accommodate new and emerging technologies such as AI and big data. ‘Fugaku is Mount Fuji as you know. Mount Fuji reflects our ideals in this machine in that not only does it have a very high peak – the highest in Japan - but also it is very broad,’ said professor Matsuoka. ‘We have to accommodate a new breed of applications such as big data, AI. We need to accommodate not just academic users but also industrial users. We need to support a broad range of apps for commercial ISV’s and so forth.
‘The best or easiest way to build a very fast machine would be to make it less general-purpose,’ Matsuoka added. ‘To have it be both fast, large and general-purpose was a very big challenge. We took on that challenge, which was extremely hard, so it became a moonshot-type effort. It was a moonshot and so we orchestrated the project to be like a moonshot. Just like the Apollo project included the entire space and aeronautics industry in the US – to focus on reaching this seemingly unobtainable goal of reaching the moon. It was the same for us, we involved everyone in this national project and we set goals that would be unobtainable by a single company’s R&D,’ stated Matsuoka.
‘It took 10 years to do it but we achieved it so the reflection of this is exhibited in the fact that we swept all the benchmarks. We are not going to make winning the Top500 title our goal, in fact just getting to the number-one position a the Top500 would be detrimental. It would prove the point that the machine we have built is pretty useless as a general-purpose system. We have to excel in all the benchmarks, including those new and emerging applications like AI,’ Matsuoka concluded.