Increasing the flow
Researchers at Harvard Medical School (HMS) have released an open source drug discovery platform, VirtualFlow, that harnesses supercomputing power to screen potentially billions of small organic compound structures in parallel, in the hunt for promising new drug molecules.
The VirtualFlow team, including Christoph Gorgulla, who originally developed the software as part of his doctorate program at Freie Universität Berlin, and Haribabu Arthanari, at the Blavatnik Institute at HMS, is already working with Google to harness the ‘unprecedented computational power’ of the cloud to hunt for potential candidates against multiple SARS-CoV-2 coronavirus targets.
‘We are now in the early part of a collaboration with Google, to screen more than a billion compounds for potential hits against SARS-Cov-2 targets, primarily viral proteins, but also the angiotensin converting enzyme 2 (ACE2) protein on human lung cells to which the virus attaches on the human host cells,’ Gorgulla said. ‘The screen will initially search for binders to 16 or 17 targets, but we expect this to increase as scientists learn more about this new coronavirus.’
Drug and vaccine development can cost upwards of $2-3 billion and take 10 years or more from early R&D through to regulatory approval and market release, ‘… with no guarantee that an initially promising candidate will make it through the labyrinth of laboratory, animal and human testing,’ noted Arthanari, ‘Traditional drug development has focused on small chemical molecules, although there is an ever-increasing raft of protein-based biologic drugs, such as antibodies or peptides, as well as a new generation of nucleic-acid based therapeutic approaches and strategies for genetic manipulation,’ he explained.
In contrast with drugs that are developed to treat the symptoms or mechanisms of disease, vaccines are designed to prepare the body to mount a rapid and effective immune response to the pathogen, say, a virus or bacteria, as soon as it infects the body.
‘Viruses are particularly tricky entities against which to develop drugs and vaccines, because some, like flu viruses readily mutate, so the target against which a molecule is developed may change,’ Arthanari continued. ‘In addition, viruses are composed of nucleic acid, surrounded by an outer layer, and use the host cells’ own replication machinery to multiply themselves, so any drug must be able to stop the virus replicating without having unwanted effects on the host’s own cells.’
Finding the key
Most drug development is still centred on organic chemical molecules. A small molecule drug can be thought of as a key – or ligand – that structurally fits a specific lock – the drug target – for example an overactive enzyme, or another protein that is involved in the disease process, and so deactivate it. Finding drug molecules that are specific to the desired target can be carried out through traditional experimentation in the lab, or through virtual structure-based screening approaches.
‘One of the drawbacks of traditional, wet-lab approaches is that this type of work is expensive and time consuming, and you can’t test that many compounds quickly – perhaps a few hundred thousand in a high-throughput screen, but not much more,’ stated Gorgulla, who is now a postdoctoral fellow in the Wagner Lab at Harvard Medical School, and also associated to the Department of Physics at Harvard University. ‘In fact, this number is relatively tiny when you consider that the total chemical space of small organic molecules suitable for drug discovery may encompass some 1,060 structures.’
Contrasting with experimental approaches to finding drug candidates, virtual screening approaches that use computational power now commonly available to labs, are now becoming increasingly routine. ‘The binding of a molecule to a particular protein is driven by energetics, and standard equations of thermodynamics, so we can approach this computationally,’ Arthanari noted. ‘If we have a high-resolution structure of a protein generated, say, by x-ray crystallography, NMR or cryo-electron microscopy techniques, we can use computational methods to screen huge libraries of virtual compounds for which structural data is available, to more accurately calculate if a drug candidate will bind in a particular pocket of the target protein.’
Until now, these computational platforms have been capable of screening libraries containing perhaps 106 or 107 molecules for which the structure is known, but this is still a relatively small number given that overall chemical space, Gorgulla noted. As Arthanari pointed out, ‘The more molecules you can screen, the more likely you are to find the ideal compound … It’s like throwing darts at a dartboard. You may not be a good shot, but the more darts you have, the more likely you will be to hit the bullseye.’
Democratising virtual screening
The VirtualFlow software developed by Gorgulla allows researchers or companies with access to sufficient computing power – which might typically be available to universities or even relatively small pharma companies – via the Cloud – to improve on virtual screening throughput by orders of magnitude. ‘It pretty much democratises virtual screening at a previously impossible scale,’ Arthanari suggested. Putting it into context, he continued, ‘… if one CPU takes about 15 seconds to screen a single molecule docking with the target protein, then it would take about 475 years to do a billion molecules. With VirtualFlow it’s now possible to screen billions of compounds in just days, by harnessing potentially hundreds of thousands of CPUs in parallel.’ The researchers published a paper describing the Virtual Platform platform, in a March 2020 issue of Nature.
‘In my dissertation I demonstrated mathematically that scaling up really does improve the hit rate,’ Gorgulla stressed. ‘What’s particularly nice about this is that the solution scales linearly, so if you double the number of processors, you double the screening power.’ Importantly, VirtualFlow has been developed to optimise functions, such as writing files to file systems, which might otherwise represent bottlenecks for screening on such a massive scale.
The VirtualFlow software can carry out screens in a series of stages. The first screening stage identifies compounds that bind to the designated target, and sequential screens can hone the list of initial hits by imposing increasingly stringent binding attributes, through calculations of exactly how each molecule will fit and bind to the target under different conditions. The program also assesses each potential 3D conformation of each drug molecule, which may change under different cellular conditions. The top few hundred hits can then be tested experimentally, which dramatically saves on lab time, and also improves the likely hit rate.
The team’s published paper in Nature describes development of the VirtualFlow platform and the mathematical model that demonstrates this ability to improve hit rate. The paper also outlines a use case through which the software was used to screen a ready-to-dock ligand library – also prepared by Gorgulla –to identify and experimentally validate compounds that would inhibit a target enzyme. The curated library contains more than 1.4 billion commercially available molecules for which the 3D structures have been calculated.
‘We validated the software on the Harvard computing cluster, which includes more than 30,000 processing cores, and it took around two weeks to complete the screen on this library of 1.4 billion compounds using around 8,000 of these cores,’ Arthanari said.