Gemma Church explores applications at the forefront of HPC research
The term ‘best’ could be used to describe the biggest HPC application, the boldest, or the most innovative. This article will highlight those HPC applications where real progress has been made in the past 12 months and where challenges in this sector have been addressed.
These challenges are not necessarily application specific, according to Pak Lui, co-chair of the HPC|Works Special Interest Group at the HPC Advisory Council and Principal Architect at Huawei Technologies, who said: ‘In my opinion, the HPC community faces generic challenges that are shared by different applications. The performance of the typical groups of HPC applications are all bounded by the compute, network communication, and storage I/O infrastructure or hardware. The performance of the HPC applications depends on the hardware development cycles. But the HPC community is always coming up with tools and libraries to make such systems accessible for all applications to use.’
In other words, cross-application collaboration is vital for the HPC sector to progress. Such collaboration is prevalent at the Gauss Centre for Supercomputing, which combines three major German supercomputing centres, the High Performance Computing Center Stuttgart (HLRS), Jülich Supercomputing Centre, and Leibniz Supercomputing Centre (LRZ).
All three centres work with researchers across the science and engineering spectrum, however each one does have some specialisations. Jülich is renowned for its fundamental research, physics work, and neuroscience work, and environmental sciences; LRZ strongly supports projects in geoscience, life sciences, and astrophysics; and the HLRS has a very strong focus on scientific engineering and industrial applications.
Eric Gedenk, a science writer for the GCS, said: ‘Currently, we have about 300 research projects that use over 100 different applications. Many of them use community codes, but many of our research projects have in-house codes to suit their particular research needs.’
One of LRZ’s highlights this year is the earthquake/tsunami research done by Dr Michael Bader and his team, which is a finalist for a best paper award at SC17. It presents a high-resolution simulation of the 2004 Sumatra-Andaman earthquake, including non-linear frictional failure on a megathrust-splay fault system.
Last fall, the HLRS also helped Ansys scale the Ansys Fluent CFD tool to more than 172,000 computer cores on the HLRS supercomputer Hazel Hen, a Cray XC40 system, making it one of the fastest industrial applications ever run.
Dr Wim Slagter, director of HPC and Cloud Alliances at Ansys, explained: ‘To overcome large-scale simulation challenges, we established a multi-year program with HLRS and worked on improving the performance and scalability of our CFD solvers. Apart from improving the transient LES (Large Eddy Simulation) solver, we focused on a variety of aspects, including the enhancement of our advanced multiphysics load balancing method, the optimisation of file read/write operations and the improvement of fault tolerance for large-scale runs.
‘We started by further optimising the linear CFD solver, predominantly the AMG (Algebraic MultiGrid) portion of it. We optimised the partition and load balance capabilities to enable the good balancing at very high core count. To enhance the simulation throughput, we developed better reordering algorithms for improved memory usage, and we enhanced the transient combustion convergence speed. We also improved the parallel IO capability and developed better data compression strategies. Because of these very high core counts – up to 172,000 CPU cores – parallel solver robustness was obviously crucial here; we wanted to have a robust solver that can be “fired and forgotten”,’ Slagter added.
Another major research finding out of Jülich in the last year, which came from the Center for Theoretical Chemistry at Ruhr University Bochum, uncovered the previously unknown complexities in the relationship between sulphur atoms’ bindings. These bindings link long molecules together to form proteins and rubber. For example, if you stretch a rubber band again and again, the sulphur bridges will break and the rubber becomes brittle.
This rubber band example is familiar to most people, but a correct interpretation of the experimental data was lacking. However, this research found that the splitting of these bonds between two sulphur atoms in a solvent is more complicated than first assumed.
‘Depending on how hard one pulls, the sulphur bridge splits with different reaction mechanisms,’ Dr Dominik Marx, professor and director of the Center for Theoretical Chemistry at the Ruhr University Bochum, explained. In essence, the simulations revealed that more force cannot be translated one to one into a faster reaction. Up to a certain force, the reaction rate increases in proportion to the force. If this threshold is exceeded, greater mechanical forces speed up the reaction to a much lesser extent.
Previous simulation and modelling methods drastically simplified the effects of the surrounding solvent in order to reduce the processing power required. Work done at the Cluster of Excellence RESOLV in Bochum had already uncovered the key role the solvent plays in chemical reactions.
The High Performance Computing Center in Stuttgart, part of the Gauss Centre for Supercomputing
But correctly incorporating the role of the surrounding solvent requires immense computing effort. This computational power was made available to Marx and his international team by a special ‘Large Scale Project’ granted by the GCS on the Jülich Blue Gene/Q platform Juqueen. Without which, the detailed simulations to interpret the experimental data on sulphur atom bindings would not have been possible.
There has also been some fascinating extreme scaling work on the Juqueen supercomputer (based at the Jülich Supercomputing Centre) in the last 12 months at the High-Q Club. Dr Dirk Brömmel, senior scientist from High-Q Club, explained: ‘The club was set up to showcase codes that scale to the 450,000+ cores or (ideally) 1.8 million threads. This helps to identify where possible bottlenecks are in future systems and if a solution is found on Juqueen, many of the codes have also found that their scalability has been transferable to other machines.’
For example, the Model for Prediction Across Scales (MPAS) is a collaborative project that develops atmosphere, ocean and other earth-system simulation components for use in climate, regional climate and weather studies. The MPAS-Atmosphere (or MPAS-A) component is a member of the High-Q Club. The primary applications of MPAS-A are in global numerical weather prediction, with a special interest in tropical cyclone prediction and convection-permitting hazardous weather forecasting and regional climate modeling.
The extreme-scale performance of MPAS-A has improved greatly in the last 12 months, as Dominikus Heinzeller, a senior scientist at the Karlsruhe Institute of Technology, explained: ‘Our alternative I/O layer is based on the SIONlib library, developed by Dr Wolfgang Frings and colleagues at Research Centre Jülich, and integrated in the MPAS model framework in a completely transparent way: users can choose at runtime whether to write data in SIONlib format or in the netCDF format that has been traditionally used in MPAS.’
‘In numbers, we diagnosed a speedup of a factor of 10 to 60 when reading data in the SIONlib format, and a factor of four to 10 when writing data. Combined with a simplified bootstrapping phase, the model initialisation time, as one example, could be reduced by 90 per cent in large-scale applications on the FZJ Juqueen and LRZ SuperMUC supercomputers. This strategically important development work was supported by Michael Duda, a software engineer at NCAR (National Center for Atmospheric Research), and funded by the Bavarian Competence Network for Technical and Scientific High Performance Computin’, Heinzeller added.
The Hartree Centre
The Hartree Centre in Daresbury is home to some of the most technically advanced high performance computing, data analytics,
machine learning technologies and experts in the UK. Alison Kennedy, director at the Hartree Centre, said: ‘Our goal is to make the transition from pursuits in research that are interesting, to meaningful solutions for UK industry.’
A recent success is its cognitive hospital project the Alder Hey Children’s Hospital, which was awarded ‘Most Innovative Collaboration’ at the North West Coast Research and Innovation Awards 2017. Hospitals produce a huge amount of data, yet it is very difficult for clinicians and patients to use that data to improve their hospital experience. Using the power of the IBM Watson, an app has been designed so that children can engage and ask questions about the procedure that they’re about to undertake at the hospital. Answers are then given through a friendly avatar.
The IBM Watson cognitive computing system can process the huge amounts of data it receives quickly, extracting the most relevant and important parts. It can then transform this mountain of information into useful and personal insights that can be used to improve services or treatments at Alder Hey.
The Hartree Centre is home to some of the most advanced high performance computing in the UK
This information also helps the patient to understand and prepare for a procedure at home. ‘This could help to reduce the number of no shows and will help hospital staff address any questions the patient has before they get to the hospital, which frees their time too,’ Kennedy added.
Such large-scale data analysis has wider implications across the HPC sector, as Daniel Reed, vice president for research and economic development at the University of Iowa and fellow of the Association for Computing Machinery, said: ‘Traditional HPC applications usually start with a question and want an answer. Now, we are starting with a set of equations and we want to compute the implications. Such large scale data analysis has turned that concept on its head.’
The work at Alder Hey is the tip of the iceberg as the advancement of scalable artificial intelligence (AI) and machine learning (ML) applications has really stood out for HPC in the 12 months.
While the concepts of AI are not new, progress has been facilitated by HPC in this area as the increased volumes of data we now collect allow us to train computers to make decisions based on past examples. Faster network speeds enable us to move greater amounts of data around, and better compute elements for parallel execution of data.
Gilad Shainer, chairman of the HPC Advisory Council, said: ‘As these three conditions are now met, we can actually leverage AI. AI will impact nearly all aspects of our lives – from making better financial decisions, improving our security, developing self-driving vehicles, forecasting health issues and many other areas.’
Lui said: ‘HPC has helped AI/ML by reducing runtime to process a workload on a single workstation which may take months to complete; the people involved in AI/ML development learned to use HPC to deploy a scale-out approach which makes use of the hardware accelerators like Nvidia Volta GPUs, and EDR InfiniBand for high-speed, low-latency network in a HPC cluster environment to reduce the runtime of training a deep neural network.
‘The applications that use AI/ML that have really exploded in the last year include algorithms for detecting objects and lane in self-driving
cars, object classification in image recognition, fraud detection in financial transactions and speech recognition in videos,’ Lui added.
The accelerated use of HPC for AI has been, in part, facilitated by better hardware resources, as Scot Schultz, senior director of HPC/Artificial Intelligence and Technical Computing at Mellanox Technologies, explained: ‘High performance computing and artificial intelligence share similar hardware requirements and important to both is the ability to move data, exchange messages and computed results from thousands of parallel processes fast enough to keep the compute resources running at peak efficiency.’
For example, IBM Research just announced unprecedented performance and close to ideal scaling with its new distributed deep learning software, which achieved a record communication overhead and 95 per cent scaling efficiency on the Caffe deep learning framework with Mellanox InfiniBand and over 256 Nvidia GPUs in 64 IBM Power systems. Schultz said: ‘With the IBM DDL (Distributed Deep Learning) library, it took just seven hours to train ImageNet-22K using ResNet-101. From 16 days down to just seven hours not only changes the workflow of data scientists, this changes the game entirely.’
In May 2017, Nvidia also introduced Volta, the world’s most powerful GPU computing architecture, created to drive the next wave of advancement in artificial intelligence. The world’s first Nvidia DGX systems with Volta AI were recently shipped to the Center for Clinical Data Science (CCDS). Paresh Kharya, group product marketing manager at Nvidia, said: ‘More specifically, with this technology, CCDS data scientists can develop a host of new training algorithms to help them see medical abnormalities and patterns within medical images.’
The Tesla V100 GPU broke through the 100 Tflops barrier of deep learning performance. Kharya added: ‘Demand for accelerating AI has never been greater across every industry, including healthcare, pharma, financial services, auto, retail, and telecommunications. Developers, data scientists, and researchers increasingly rely on neural networks to power their next advances in fighting cancer, making transportation safer with self-driving vehicles, providing new intelligent customer experiences, and more.’
AI-based applications have certainly dominated in the last 12 months, and this trend shows no signs of stopping. We can expect AI to become increasingly integrated in the HPC landscape, as Shainer explained: ‘We will see continuous development in this area, from hardware elements to software elements and it will just keep progressing. Several years from now, we will probably be talking less about AI as it becomes mainstream and tightly integrated into more solutions.’