ARM steps up HPC development
ARM has announced a new on-chip interconnect technology designed to increase data throughput for HPC, data centre and cloud applications.
This comes just over a month after the announcement that ARM and Fujitsu, the company developing the ‘Post K computer’ set to replace RIKEN’s K computer (ranked 5th on the TOP500 June 2016) unveiled a scalable vector extension (SVE) to the ARMv8-A architecture – also designed to accelerate HPC workloads.
Arm has designed its new interconnect technology, The ARM CoreLink CMN-600 Coherent Mesh Network interconnect and CoreLink DMC-620 Dynamic Memory Controller enable the latest ARM-based SoCs to offer a much higher level of data throughput – not only for HPC but also for data centre and cloud applications. The company has claimed that this new technology will provide 5x higher throughput than previous technology with more than 1TB/s of sustained bandwidth.
‘The demands of cloud-based business models require service providers to pack more efficient computational capability into their infrastructure,’ said Monika Biddulph, general manager, systems and software group, ARM. ‘Our new CoreLink system IP for SoCs, based on the ARMv8-A architecture, delivers the flexibility to seamlessly integrate heterogeneous computing and acceleration to achieve the best balance of compute density and workload optimisation within fixed power and space constraints.’
In addition to this new interconnect ARM has significantly stepped up its development of HPC technologies in recent years which has culminated in the selection of ARM processors for the ‘post K computer’ the first time ARM processors will be used in a flagship supercomputer.
Since the announcement of the post K computer at the International Supercomputing Conference ISC’16 Fujitsu and ARM have been working to further boost ARMs performance in HPC. At the Hot Chips conference, last month ARM and Fujitsu announced the scalable vector extension (SVE) to the ARMv8-A architecture.
Nigel Stephens, lead ISA architect and ARM fellow, followed up the announcement with some more technical information in a blog post on the ARM website. Stephens stated: ‘ARM is significantly extending the vector processing capabilities associated with AArch64 (64-bit) execution in the ARM architecture, now and into the future, enabling implementation choices for vector lengths that scale from 128 to 2,048 bits.’
‘High-Performance scientific compute provides an excellent focus for the introduction of this technology and its associated ecosystem development.’
Stephens explained that considerable research had gone into ‘determining how best to extract more data level parallelism from general-purpose programming languages such as C, C++ and Fortran.’ This has resulted in the inclusion of vectorization features such as gather load & scatter store, per-lane predication, and longer vectors.
The risk associated with any new technology in HPC is normally based around two main factors. The first is the risk investing in a new technology which may fail. In the case of ARM the company is well established although not specifically in the HPC market. This gives a certain amount of stability to the company in the eyes of potential users.
The second point is around adoption of the technology in the wider industry. A technology may be very well suited to HPC but if it does not see widespread adoption then ultimately it will fail.
However, ARM may have a solution to this. As Stephens explained in his blog post, a move to optimise code for ARM based HPC may be relatively straightforward. ‘Scientific workloads, mentioned earlier, have traditionally been carefully written to exploit as much data-level parallelism as possible with careful use of OpenMP pragmas and other source code annotations.’ Stephens explained that this makes it relatively easy to for a compiler to vectorise such code and make use of a wider vector unit.
‘Supercomputers are also built with the wide, high-bandwidth memory systems necessary to feed a longer vector unit’ said Stephens.
The decision to replace the SPARC processor used in the current K computer with ARM processors has generated a lot of interest in the HPC community. If ARM can continue to adapt the ARM architecture to provide specialised features for the HPC community then we may start to see a transition towards this architecture.