Drug discovery with data
In recent years the pharmaceutical industry has relied more and more on computational modelling to aid drug discovery and development.
Computer models are used in a number of ways to tackle more challenging problems, explained Mark Mackey, CSO of Cresset. ‘For pharmaceutical and biotechnology companies, drug discovery is getting harder and harder: the easy targets are mined out; the regulatory and safety environment is increasingly tough; and the more biology we know, the more things can kill a compound before it even gets to the clinic,’ he said.
‘In this environment, efficiency in getting compounds into the drug discovery pipeline is critical. If you can make compounds “smarter”, then you need to make fewer to get one to the clinic.’
Such needs have driven developments in modelling algorithms. ‘Understanding why compounds have the biological activities that they do is difficult, and for too long the prevailing view on activity was fundamentally 2D,’ continued Mackey. ‘Chemists viewed compounds in terms of how they were put together, rather than what their electrostatic potentials, shapes, and other 3D properties were.’
He said that Cresset’s tools help medicinal and computational chemists view the overall picture, from the point of view of their intended protein targets, of what molecules look like. ‘The extra insights that this view brings can be invaluable when looking at a series of compounds and trying to decide what to make next,’ he explained.‘We’re improving our core molecular alignment and scoring algorithms, which lets us determine more accurately when two molecules have similar 3D properties and hence are likely to have similar biological activities.’
The company’s recently released Activity Miner software enables modellers to analyse a dataset of molecules and their biological activities, to find pairs of molecules that are very similar to each other yet have quite different biological activities. ‘These pairs have a high information content; they give information about what a protein likes or doesn’t like, when it binds to these molecules,’ he explained.
Cresset is also working on an extension to enable analysis of more than one set of biological activities simultaneously. In addition, the company is looking at extending its similarity algorithms from ligand to proteins. ‘If this works, we will be able to compare and cluster the available human protein structures on their 3D electrostatic and shape properties, which would be a huge boost to industrial drug discovery,’ explained Mackey. ‘It would allow computational chemists to jump in at the very start of a drug discovery project and say “OK, we want a compound that’s active against protein X, but protein Y is very similar to protein X so we need to keep an eye on that.”’
He also sees a shift from ligand-based drug design (LBDD), where modellers look at the small molecules, analyse them, compare them, and extract as much information as possible out of them to guide a medicinal chemistry programme, towards structure-based drug design (SBDD).
‘Five or 10 years ago, vast swathes of drug discovery (ion channels, GPCRs, etc) were exclusively the province of LBDD. The increasing availability of high-quality X-ray data for many of these targets has given us all new insights into how these protein families work and how our compounds are behaving when they bind to them, and we need to be able to fold that data into our existing ligand-based expertise,’ he explained.
Adrian Stevens, senior manager, predictive sciences marketing (life sciences) for Accelrys, has also observed changes in the approaches and applications of computational modelling for more complicated drug discovery. One such trend is the increased interest in polypharmacology, the treatment of diseases by modulating more than one target.
He also said that as the same compounds are screened again and again over time, it is possible to build up more understanding about drug viability. ‘You can start designing a problem out before it gets to clinical trials,’ he explained, adding that modelling also enables researchers to try to new ways to repurpose research and new patent opportunities.
Timing is also important, he observed. He noted how the length of time to run simulations was a bottleneck for pharmaceutical companies.
In the past, he explained, ‘modelling required someone with deep statistical knowledge and also deep understanding of the product.’ He noted that two statisticians might have taken two weeks to do a model for each project but that a big pharmaceutical company could be running 150 projects at once globally, creating a huge demand for the statisticians’ skills and time. Meanwhile chemical developments in the lab mean there are always new compounds to screen. ‘The industry reached a point where the statisticians couldn’t keep up with all the products,’ he observed.
Accelrys therefore worked with Glaxo Smith Kline to develop faster tools for modelling. He described the result as user-friendly and powerful. The aim is ‘helping the expert modeller to be more productive in their work,’ he continued.
‘If new data comes in, you could just save the steps you have done and rerun with the new data. It has not taken the statistician out of the equation but allowed them to work more meaningfully with their colleagues,’ he explained.
‘They can work with local data and get more precise models and more sophisticated visualisation of ligand docking,’ he said. ‘Fifteen years ago, people wrote on paper and listened to modellers in meetings. Now modellers are integrated in the teams.
Relying on data
Such developments in modelling approaches and applications all rely on another important factor: the availability and quality of chemical data.
‘Computational chemists today wouldn’t dream of working on a project before looking at the available data,’ noted Stevens, who added that ‘if there are different standards of doing screening it hard to know if things are equivalent.’
Good data is something that Meeuwis van Arkel is focused on. He is VP for product development at Elsevier Information Systems GmbH. ‘Modelling is an increasingly important tool and the modeller wants huge quantity of data in terms of depth and breadth, including patents,’ he observed.
Elsevier’s Reaxys database indexes relevant organic, inorganic and organometallic data from across the industry and, according to van Arkel, captures on average 400 fields of information for each compound, including properties such as melting point, spectral data, biological activity and literature citations.
He said that the database currently contains 24 million different compounds and is focused on life sciences and drug discovery.
Such ‘real-world’ data feeds into pharmaceutical research models. ‘The data on which they model needs to be very highly-structured, uniform and high quality. All the information in our solution is from real experiments,’ he said.
‘A reasonably common use case is the predicted biological activity of an unknown compound,’ said van Arkel. ‘Customers also look at drug-to-drug interactions.’ This is interesting, for example, if a patient is taking a drug for a condition and a new drug is being developed that would be taken by people with the same condition you can model how the two drugs would interact.
‘The ultimate outcome really helps the life-sciences industry in general; where necessary a drug can fail quickly and fail early, which saves huge money on clinical trials,’ he explained. ‘Computational modelling is also becoming increasingly important in situations that in real life would be hard to replicate.’
Reaxys data is incorporated into computational models in two ways, he said. The company provides customers with either an API or a structure flat file. Customers can then integrate in their own data. ‘When we provide our data we provide a detailed guide to describe how our data is structured,’ he said, explaining that this is important because every dataset is differently designed.
Van Arkel has noticed a related trend happening too. ‘Because of our expertise in this area, we often see the life-sciences industry asking us to help them organise their proprietary data. We then also make sure this data is normalised to ours.’
There are some challenges in making data available for modelling, he observed. ‘It is not that difficult to make data available but to make it truly useful for modelling is more of a challenge,’ he said. ‘You want a huge quantity of data but you also need it provided in a highly structured way. There are not many data providers who have both lots of data and are highly-structured.’
There are some industry efforts to help standardise chemical data, which will help modellers and others (see box: Standardising data). Another issue for modellers handling big datasets is computing power. ‘I see still problems of modellers running into huge processing challenges,’ said van Arkel.
In fact, Elsevier has a solution for researchers in this situation. The scientific publisher has its own supercomputer in the USA that is configured to process huge datasets. The company uses this internally for text and data mining but is increasingly approached by customers to offer this as a managed service to researchers.
The most common use case, according to van Arkel, is that ‘we provide our data to researchers; they develop algorithms; and then they run into processing power problems, so we run their algorithms for them.’ He said that these are typically large projects, and that the company runs 10 to 20 per year of them in the life sciences.
Despite such developments, however, computer models on datasets do not give the whole picture. As van Arkel noted: ‘I’ve not seen a situation where modelling has taken over the whole approach to drug discovery. Ultimately the last component is still going back into the wet lab, to confirm that the model can be replicated in real life.’
Mackey of Cresset agreed: ‘Computational chemistry went through a big hype cycle in the last decade or so where all sorts of wild claims were made about the ability of computers to revolutionise drug discovery. All of that has calmed down now, and there has been a lot of effort in the last five years or so to reassess computational chemistry techniques: when do they work, when don’t they, and under what circumstances?
‘Some of this has been disheartening. It turns out that some techniques just aren’t as accurate or useful as we all thought they were, but I think that chemists are now much better served by their computational chemistry colleagues. The computational chemist’s toolbox is now much better understood, and a good computational chemist should now have a better feel for which modelling techniques to use under what circumstances, and what to expect in the way of accuracy.’
And the importance of modelling as part of the pharmaceutical story was reinforced with the award of the Nobel Prize in Chemistry 2013 to Martin Karplus, Michael Levitt and Arieh Warshel, ‘for the development of multiscale models for complex chemical systems’. As the press release for the award noted: ‘Today, the computer is just as important a tool for chemists as the test tube.’
High-quality data is an important requirement for chemical modelling. But it is not just the accuracy of the data that is important. To be able to draw meaningful conclusions from a range of datasets there is a need for good structure and appropriate integration.
‘Integration of different data sources is not a solved problem. We still have data sitting in different repositories but what’s better understood is how to use those data sources better,’ observed Adrian Stevens of Accelrys. ‘There is always a danger of disconnect if you don’t know the data source. You should always treat data as discrete groups but look for general trends. Until we get to the stage where everyone works to the same standards, we are going to have this problem.’
There are a number of initiatives to attempt to bring standardisation to the way that data is organised. One such initiative is OpenPHACTS, a cross-industry project to deliver an online platform with a set of integrated, publicly available, pharmacological data. The initiative promises that: ‘Throughout the project, a series of recommendations will be developed in conjunction with the community, building on open standards, to ensure wide applicability of the approaches used for integration of data.’
‘OpenPHACTS is trying to curate in a standard way,’ said Stevens. ‘With OpenPHACTS, the potential for impact is still being learnt. It includes data from a wide range of sources and gives you the chance of ask questions such as: “what other compounds might I hit similar to my target?” and “how do I get to druggable targets?”’
‘OpenPHACTS is helping us, as much as we’re helping them,’ noted Meeuwis van Arkel of Elsevier. ‘It is defining standards to make the sharing of data more easy. We always tend to participate in these initiatives; it makes our data more accessible.’
Modelling for manufacture
Computational techniques are not just used at the drug discovery stage of the pharmaceutical industry, as Kristian Debus, life sciences sector manager of CD-adapco, observed. CD-adapco provides tools for computational fluid dynamics and discrete element modelling, which, he said, are used during various stages of drug development and production.
‘In the pharmaceutical industry, we see a lot of usage in process design and optimisation,’ he explained. ‘With the trend to develop new devices and tools to move towards continuous manufacturing, modelling becomes an essential tool to bring these technologies to market fast, yet with the required, extremely high design quality. Other research areas would be crystallisation modelling, fermentation, or coupling of 1-D process with 3-D CFD or DEM models. In these areas we see a transition happening from academic research to commercially available tools.’
He continued: ‘The classic applications are scale up from lab, to prototype and production for batch processes, mostly mixing. Today, experts are looking at more complex mixing problems, but also into new application areas. Today’s engineers are required to look into new physics and processes, like particle-, power-, or liquid transport, filling processes, particle break up, spray analysis and so on.
‘A key factor here is also the improved understanding of the physics and processes as you look at a problem from a different angle. Analytical analysis, modelling, and experiment should always go hand-in-hand to produce the highest quality product. A world without modelling is hard to imagine today. Bench top experiments will always be essential, but with the growing use of modelling, and user expertise these experiments will become more directed, more effective and less costly.’