The engineering industry could get a boost to its simulation performance as the Centre for Modelling and Engineering (CFMS) has released the results of its independent product evaluation of IBM’s Power System S824L, which was assessed for readiness and utilisation in technical computing and HPC workloads.
The results from benchmarking and testing the S824L against a defined experimental plan have shown promise and the CFMS has released a report detailing its findings which is available from their website.
'The industry appetite for computing power is ever upwards, but simply building larger and larger clusters is already becoming unsustainable, due to rising energy prices. We need to take a step back and re-evaluate our approach if we want to approach a realistic solution for the future,' said Nathan Harper, Head of IT Systems at CFMS.
The report begins by highlighting current expectations of architecture but also the challenges associated with this approach, which have led to CFMS to explore new paradigms in technical computing and HPC.
The report states: ‘There is an expectation that an affordable route to Exascale computing will involve some disruption of the existing CPU+RAM+Infiniband architecture. Although accelerators like NVIDIA Tesla and Intel Xeon Phi have good positions in the Top500 List, actual industrial uptake is limited.’
The report continues that this path of technology iteration may not be enough to increase performance to the required levels within acceptable cost and power budgets so engineering and HPC in general requires a ‘unified re-architecturing of the software and hardware interaction.’
Harper gave the context to CFMS’ decision to benchmark IBM Power technology. He said: ‘It will be interesting to see the impact of IBM Power systems, combined with technology from NVIDIA and Mellanox, as a catalyst for disruption, and as a basis for the development of new technologies in the future.’
Product evaluation covered a number of industry standard tests such as system performance for technical computing and engineering workloads, quality of the toolchain (compilers, environment, etc), integrated energy efficiency and scale.
The report found that the Power system was easily reconfigurable and simple to use, which are key parameters when applying varied workloads or in cases where engineers may not be HPC experts themselves.
The IBM processor can be reconfigured while the system is running which allows multi-threading and other parameters can be changed in between drops without system downtime. This allows for the SMT configuration to be optimised for each simulation run, or even at each simulation job step. The report also found that POWER8 provides greater Non-Uniform Memory Access (NUMA) control, allowing the optimisation of the process layout on the hardware topology.
The report states: ‘In addition tools like xCAT (eXtreme Cloud Administration Toolkit) and IBM Spectrum Scale (formerly IBM GPFS) can be used to deploy and run mixed x86 and Power systems (equipped with NVIDIA GPGPUs) to accelerate specific workloads, while providing a consistent experience for end users.
In addition the researchers found that porting existing CUDA software to run on the S824L’s K40 GPUs was trivial, primarily due to the common interfaces provided by the CUDA toolkits on x86 and POWER8.
Benchmark runs were undertaken with one MPI process per NUMA node (2 MPI processes per POWER8 processor socket) with a processor and memory affinity set, the number of OpenMP threads were varied according to the SMT setting. The NVIDIA K40 benchmark was run with one MPI process per GPU. The results were compared to a dual socket Intel Xeon CPU E5-2648L v2 @ 1.9GHz system from IBM with hyperthreading switched off, and the software compiled using Intel 15.0 compilers and OpenMPI 6.5.
The report stated that the dual socket based on POWER8 is 1.2x faster than the Intel Ivybridge system when using the gcc 4.9 compiler. The extrapolated results from the limited runs using the IBM XL C/C++ for Linux Compiler shows a potential of over 2x speed up but this needs to be validated when the compilers are released by IBM.
The system meets performance levels for technical computing and is appealing in terms of its flexibility and simplicity. The commonality of tools and skills utilised within technical computing was evidenced by the ease of management of the unit.
However the report does concede that ‘system performance assessment focused on compute rather that I/O, as typical HPC workloads would exercise processors with storage being implemented as a shared parallel file system.’ Considering that any new architecture would likely involve a much more data-centric approach this may be an area that needs further investigation as some applications will be limited by data throughput.