The drive for exascale and the power envelopes set for these systems, requires HPC hardware and software developers to think outside the box to develop more energy efficient technologies to support exascale systems.
LEGaTO (Low Energy Toolset for Heterogeneous Computing) is one such project with the lofty aims of developing a programming framework to support heterogeneous systems of CPU, GPU and FPGA resources that can offload specific tasks to different acceleration technologies through its own runtime system.
The project’s researchers aim to make a reliable, secure and energy efficient programming framework for HPC, that enables users to write a single code for multiple processing technologies.
The system also aims to deliver additional energy savings through reduction of processor voltage, while maintaining application stability.
Osman Unsal, group manager for the department of computer architecture and parallel paradigms at the Barcelona Supercomputing Centre (BSC), explains that the rise in AI and ML applications provides an opportunity to make use of FPGAs, which can run some tasks much more efficiently than other processing technologies.
‘For HPC we have scientific applications to run and we need good floating point performance. That had generally been the Achilles heel of FPGAs. They would do very well with fixed point precision, integer or DSP-type applications which are in a different category than HPC applications.
‘This class of applications potentially make much better use of FPGAs from an energy point of view. For example for the inferencing phase of neural network applications, where you need to recognise objects, these kinds of applications do not need the full floating point,’ said Unsal.
While GPUs can handle mixed precision workloads, Unsal argues that FPGAs can improve on this process from an energy efficiency standpoint.
‘FPGAs take that further, in the sense that you can go down to one-bit level. There have been neural networks called ‘binary neural networks’ that work with just one bit. They are quite efficient in their application domain,’ said Unsal.
‘FPGAs have this flexibility where, when you consider neural network applications, you can set your own optimal width of computing bits that you need. In this case, the optimum is the most energy efficient,’ added Unsal.
However, it should be noted that this is focusing on the efficiency of the application. That is not to say that a rack of GPUs might run the job faster, just that FPGA resources can be used to make the job more efficient. As reducing power budgets is a key objective to meet the energy requirements of exascale systems, this project could help to solve some of the challenges that face HPC users that want to run AI applications.
The LEGaTO toolchain backend, also called the runtime system, consists of the technologies that are deployed during runtime to support the programmer’s task, and to intelligently manage the resources of the hardware platform.
To help meet these goals, the LEGaTO backend is on a task-based execution model. Tasking, as opposed to a threading model, allows the runtime system to exploit higher parallelism and to perform the advanced scheduling necessary to effectively manage heterogeneous platforms.
But the energy efficiency increases targeted by the LEGaTO project extend beyond just the use of FPGA technology. The project is also researching the reduction of voltages sent to the processors, while maintaining stability of the application.
Advances in features such as error reporting on modern CPUs and hardware counters, allow the team to reduce voltages by enhancing the communication between processor, operating system (OS) and the application.
Unsal said that previous work done at the BSC looked at reducing voltages while maintaining stability of the HPC applications. ‘One need that we saw during that project was that the hardware and the software need to work together. That is to say the hardware and the software need to catch up with each other,’ said Unsal.
‘That was the crux of the idea for LEGaTO, so we got together with a couple of other partners that were in a different project called M2DC. We proposed that you have all these wonderful features in hardware to help save energy, and you have these frameworks in software to help save energy, but they do not talk to each other.’
He explained that in the past only errors that were detected but not corrected were reported. ‘We detected an error that is not correctable and we sent this signal “somewhere” because the software stack was not equipped to deal with the signal,’ said Unsal.
‘There is an error that was detected but could not be corrected, so what do you do with that on the software side? What we are doing in LEGaTOis we propagate this error to the proper place – in this case the application,’ he said.
These messages would be passed to the OS, at which point the error then stops. ‘In our case, we wanted to continue past the operating system to the application, because the application knows if the error is serious, or if the error could be somehow corrected or accounted for on the application side. Also, the applications can just disregard the error, because it is not important for the application,’ said Unsal.
Making the application-aware of these errors, and also errors detected and corrected, allows the researchers to manipulate voltages more carefully, without pushing too far and affecting the stability of the application.
‘Hardware manufacturers made the change so that these correctable errors are also reported. They are important because to save power, one option is to go below the safe operating voltage limits.
‘We are now able to use this information like a canary in a coal mine. We use it to tell the software that it doesn’t need to lower the voltage anymore, there has been enough of an energy-saving without compromising reliability,’ he added.
As HPC users face the challenges of exascale computing, one of the biggest stumbling blocks is trying to fit these colossal supercomputers to a power budget that is realistic and sustainable. So far targets have been placed at around 20MW, but this is still seen as ambitious using today’s technology.
The LEGaTO project aims to deliver an order of magnitude energy efficiency increase by combining the hardware and software developments alongside the programming framework which enables the combination of CPU, GPU and FPGA resources.
Commenting on the 20MW target, Unsal notes that ‘currently we are nowhere close to this target’, and to meet this goal software and hardware manufacturers must work together, as it will require all available optimisations ‘that we can manage to throw at this problem’.
Lowering the voltage without reducing clock speed is one of the avenues of research for LEGaTO. The researchers hope to go beyond what is possible with methods such as dynamic frequency and voltage scaling (DVFS), which has been used in the past.
‘DVFS worked quite well for sometime, but since we are now operating at voltages close to physical limits, the gains that could be possible from this more conservative approach is nearing its limits.
‘Another thing we want to do is select the most energy-efficient hardware match for the application. Sometimes an application could be best if it was run on a CPU, other applications would run better on a GPU and others still on FPGAs.
‘It is important to complete a complex optimisation process, where you have those applications and you want to steer them to the most energy-efficient hardware that you have at hand,’ added Unsal.
A place for FPGAs
It has been generally accepted that at least the first generation of exascale systems will make use of GPU technology.
Evidence for this can be found in the pre-exascale systems developed for the US Department of Energy and, of the top 10 systems in the Top500, six are currently using GPUs with the top two positions taken by the DOE systems both using GPU technology.
However, Unsal argues that the rise in the use of AI for HPC systems means that FPGA technology can be used effectively. Particularly for applications such as fixed-point precision, integer or DSP-type applications mentioned earlier.
‘There are emerging neural network applications that require a combination of training together with inference. For these applications it makes sense to run some part of the application – the training part on the GPU and another part on the FPGA,’ said Unsal.
Instead of running these applications on GPU resources, offloading inferencing portions to FPGAs could help to improve energy efficiency of the system.
The use of FPGAs in HPC has been evaluated before. The two main criticisms have been about the complexity of programming applications and the lack of floating-point performance, making the chips unsuitable for many traditional HPC workloads.
Most HPC users are not experienced with hardware description languages used for FPGA application development, making it much harder to use the technology without the use of a high-level programming language such as OpenCL.
However, LEGaTO researchers may have come up with a solution in the form of the ‘write once, run anywhere’ software paradigm. The runtime system uses hardware performance counters used in modern processors to see how much energy is dissipated through the use of a given application.
‘You run the application on the hardware platform, get the feedback about how much energy you are dissipating, and then it is the runtime’s responsibility to decide what are the optimal resources at the time, based on this closed-loop you get from the system. However, using this type of system requires that code can be run on any of these hardware platforms without sacrificing performance, which would affect energy efficiency.
‘The problem that we are facing is that if you want to have your application run on a heterogeneous computing platform, you have one version of your application for each hardware technology CPU, GPU and FPGA,’ said Unsal.
‘From a programmability point of view, that is not the best. We want to provide a programming model that makes it easy to write the application once, with some hints that this application may benefit from a GPU or an FPGA if there is one available. The runtime system looks to see if the resources are available, then sends the task to the GPU and FPGA. You do not need to write a special version of your code for your application for these devices,’ added Unsal.
There have been many debates about the use of high-level programming models, especially when it comes to the use of FPGAs. There is a belief that if you program with hardware description languages such as VHDL or Verilog, you would be more efficient. However, Unsal argues that this is similar to the discussions on the CPU side; that if you write in assembly, it will be more efficient than writing in a high-level language such as Fortran or C.
While Unsal notes that it is important for application developers to ‘develop with their application and algorithm in mind’, he argues that it should not require the kind of coding used in the past.
‘It is reminiscent of the times early on in the General purpose GPU (GPGPU) discussion, that it was very difficult to run things on the GPU because you would need to write explicitly for the GPU. And there was no support for that until CUDA and the others came along.’
‘We are at a similar inflection point for FPGAs. Maybe it is more difficult for FPGAs because they are much closer to hardware than the GPUs would be, but we are going through a similar discussion. I would say there are good software development frameworks for FPGAs. This is one of the aims of the project, said Unsal. ‘How can we combine an efficient programming framework for FPGAs and have energy efficiency and also reliability and security?’
The LEGaTOproject will finish its activities in December this year, but the team has already developed the framework, and is now trying to demonstrate use cases for the runtime system and highlighting their work to the HPC community.
‘It is always difficult to influence the general programming community in, let’s say, a new programming paradigm. But we want to show through our research that perhaps it is time now for hardware and software to work together, to ensure one level of energy efficiency gain for exascale and for data-centre efficiency.
‘We want to tell them we have made it much easier for you to run on these technologies. Why not give it a spin? Without having to change your application a lot, you can get much better energy savings,’ concluded Unsal.