Porting virtual screening applications
Whether it’s to determine the fate of the universe, model catastrophic weather conditions or find a new drug to save millions of lives, parallel computing using graphics processing units, or GPUs, is becoming more important and commonplace. GPUs have the potential to deliver higher performance at lower cost than traditional CPUs in a range of sectors, such as medicine, national security, natural resources and emergency services.
The process of drug discovery in particular is long and laborious. A chemical lead must first be found, often by computationally screening millions of chemical compounds against a target. Hundreds or even thousands of compounds must be synthesised around this lead to identify one with the correct properties to become a drug. This process typically takes about five years and, if a new drug is discovered, the US FDA approval process then requires another five years of clinical trials. The parallel computing challenge is to speed up the screening and lead-optimisation process, thus finding new drugs faster and increase the chance of saving lives.
There are many molecular simulation and modelling applications that scientists can use to make the drug-discovery process more efficient. Recent improvements have taken these applications a step further by allowing scientists to simulate protein-ligand interactions to a high degree of accuracy, or to computationally screen tens of millions of molecules. The downfall of these applications is that they require extremely high amounts of computational performance – in fact, very few can do anything useful on a single multi-core CPU machine.
A supercomputer consisting of many interconnected multi-core CPU machines is a solution to this problem, as it can provide enough computational performance to produce simulation results on a useful time scale. However, traditional supercomputers can be expensive to buy and operate – requiring continual hardware and software maintenance. With 10 to 20 million researchers and scientists that could benefit from the use of supercomputers, there is significant competition for the shared resources that are available – to the point where, in some cases, researchers have to book time on supercomputers a year in advance.
Highly parallel computing with GPUs is being increasingly recognised as an ideal solution to the problem of resource limitation and cost of traditional supercomputers. Their many-core architecture makes them ideal parallel processors capable of delivering a very appealing number of floating point operations per second. Also, their origin in consumer gaming means that GPUs deliver high performance at a very attractive price point. Originally designed for displaying three-dimensional images and outputting them to a display, the shader pipeline of the GPU is now used for general purpose computing, rather than being hardwired solely to do graphical operations. Powerful software development frameworks have emerged to support GPU-based computing.
The emerging standard is OpenCL, which is designed for portable, parallel programming of heterogeneous systems combining CPUs and GPUs, and has the potential to unlock the performance benefits of many-core hardware for scientific applications. The key advantage of OpenCL is that it allows you to utilise all resources available, maximising the compute performance and cost-to-performance ratio of your computational resources. OpenCL is also the only truly portable and vendor-neutral programming model for GPU and multi-core CPU programming.
The process of efficiently porting serial code to any parallel programming language can be challenging, and OpenCL is no different. It requires a programmer with excellent knowledge of the underlying hardware architectures and optimisation techniques for parallel computing. GPU computing is fairly new and device drivers can also be unstable. However, the support from vendors and developers is shaping these frameworks for use in commercial applications.
At Cresset, we have ported our virtual screening application BlazeV10 to OpenCL. BlazeV10 uses a molecular similarity algorithm to compare the 3D electrostatic fields of molecules, allowing the user to screen millions of possibilities to locate active molecules accurately and efficiently. The algorithms are computationally intensive, so it has always required a medium-sized computing cluster to be used effectively, and the port to OpenCL has provided a real-world, 10-fold speed-up when run on a latest-generation, off-the-shelf GPU compared to a recent quad-core CPU. The shift to parallel computing has allowed us to use a single workstation containing four consumer GPUs as a replacement for a 40-node quad-core CPU cluster.
Accelerating the pace of drug discovery is vital for humanity and for the pharmaceutical industry, and GPU computing has an important role to play. As GPU devices continue to increase in performance and ubiquity, they will become an essential tool in the future of sustainable scientific research.
Simon Krige is a software developer at Cresset BioMolecular Discovery in Welwyn, UK. He has a MEng in Computer Science from the University of Bristol and has been a high-performance computing enthusiast for more than two years.