AI accelerates drug development pipelines
Bayer recently launched LifeHub UK, the seventh of the company’s global LifeHub centres, which are pioneering collaborative research to accelerate the development of new solutions to global health and nutrition challenges.
The new UK centre, sited at Reading’s Green Park, joins existing LifeHub facilities at what Bayer terms innovative hotspots in Berlin, Boston, California, Lyon, Singapore and Tokyo/Osaka.
One of the first companies to move into the new UK LifeHub is UK clinical AI firm Sensyne Health, which is working with Bayer to develop AI-enabled automated image analysis solutions for disease diagnosis. The project will combine Bayer’s existing expertise in the radiology field with Sensyne’s access to millions of anonymised UK National Health Service (NHS) records and imaging data, accessed through Sensyne’s partnership with NHS Trusts.
‘The ultimate aim is not only to use AI-enabled technologies to speed disease detection, but also to help provide informed insight that will help direct treatment decision making,’ commented Abel Archundia-Pineda, head of digital transformation and IT at Bayer Pharmaceuticals.
‘We are going to see an unprecedented positive technological revolution in healthcare in the next 10 years, and a large part of that will be driven by clinical artificial intelligence – the application of machine learning methods involving clinical expertise from researchers,’ added Lord Paul Drayson, Sensyne Health CEO. ‘The current medical discovery system involves interpreting enormous volumes of diverse information. By using clinical AI, we are able to analyse vast volumes of data in a short period of time.
This data-driven approach to discovery can help transform the future of drug development resulting in more coherent diagnoses for sufferers of poorly understood chronic diseases, identifying comorbidities and speeding up the pathway to treatment. It could also lead to personalised treatment plans with a much higher likelihood of success, based on analysis of, for example, the link between genetics and responsiveness to different treatments. Data can also be used to help find novel therapeutic treatments and uncover new uses for existing drugs, all while saving time searching for a diagnosis, reducing cost, decreasing the pressures on the UK NHS workforce, and ultimately improving patient outcomes.’
Data drives new approaches to medicine
The collaboration between Bayer and Sensyne builds on an initial agreement, announced in July 2019, focused on leveraging the Sensyne platform to develop new treatments for cardiovascular disease. Both partnerships embody the global drive to harness machine learning, and AI to develop more accurate, insightful diagnostics as well as to accelerate drug discovery and development, reduce pipeline attrition rate, and make the development of patient-focused precision medicines a tangible goal for many diseases.
While one-size-fits-all treatments have traditionally had to suffice for many diseases, the ability to use computing power to analyse data sets collated from different disciplines means it should be possible to develop more effective, safer personalised medicines. ‘The wealth of data now available through our digital lives – think smartphone apps and wearables – in combination with genetic and molecular data, and traditional blood and tissue-based laboratory tests and health record information, can feasibly be used to help more accurately understand the interplay between genetics, environment and lifestyle on health and disease,’ Archundia Pineda suggested.
It can take about 12 to 15 years for a potential drug candidate to make it to market, and less than one per cent of projects successfully negotiate early research, preclinical, clinical and regulatory milestones. ‘Developments in experimental techniques and high throughput and high content tests at the cellular and molecular level mean that we now have greater amounts of high-quality data, which, together with advanced analytics and computer algorithms, will help us to identify new targets for innovative medicines much faster, more accurately and more efficiently than ever before. This could dramatically cut the development timeline and attrition rate.’
Bayer has realised these opportunities, and is working with global technology partners to developing AI-enabled solutions for drug development. ‘Together with Budapest-based startup Turbine, for example, we have built an AI platform that models cancer at the molecular level, and tests millions of potential drugs in silico.’
The development of such models does rely on the quality, reliability and breadth of data, and this brings us back to the potential to exploit real-time, real-world information that may originate from the patients themselves. Today’s activity trackers contain biometric sensors that monitor exercise, heart rate and sleep, for example. ‘What’s important is that these devices are collecting all of this data moment-to-moment, on an online basis, and they are all connected. So all of this data that is being generated and stored is unique to each one of us. The big opportunity is to harness machine learning and AI to understand how it impacts on health and disease, so that we can identify patterns and relationships and uncover precision opportunities for maintaining and sustaining a long, healthy life.’
Data gives us a real opportunity now to drill down into the biology of disease, to start to understand why, for example, one person who has smoked heavily all their lives doesn’t develop lung disorders, but another does, Archundia-Pineda suggested. ‘And then we can harness machine learning algorithms to better predict who will develop diseases, use imaging and other existing techniques to better diagnose disease. Ultimately, this will allow us to develop better drugs for treating, managing or preventing diseases from developing.’
It doesn’t take a great leap of imagination to envisage a smartwatch that will be able to devise the best exercise and diet program for the wearer based on daily real-time metrics, or that tells the patient what the optimum dose of a particular drug will be on that day, based on their exercise, diet and other measurable physiological and biological parameters. ‘We can even start to think about a drug-delivery patch that dispenses the optimum amount of drug, automatically, based on a collection of day-to-day measurements.’
Upstream of the clinical impact of using machine learning and AI to maintain and even improve health at the level of the individual, such technologies are massively impacting on drug discovery and development, Archundia-Pineda explained. ‘Ultimately, we will link all the experiments that we design in the laboratory today to enable the faster, more informed development of precision medicines.’
In effect, leveraging huge datasets that are now being collected from patients – for example, through initiatives such as Sensyne’s partnership with the NHS trusts – and from clinical trials and preclinical research, will all feed back into the discovery and development engine to inform R&D at an early stage. ‘This is going to be revolutionary with respect to the way physicians can treat their patients.’
The massive computing power available today makes it possible to take all this data, and use it to model not just how, when and why diseases develop, but to rapidly identify molecular structures against optimum targets. ‘Given the right quality as well as quantity of data, computer algorithms can be trained to understand and simulate the way that a disease progresses, pick out the best molecular structures and properties for specific disease targets, and how best to deliver a molecule to specific cell types.’
We should also blur the demarcation lines that traditionally separate internal R&D from clinical partners, states Archundia-Pineda. ‘Scientists working on synthetic molecule discovery and optimisation might be steps removed from the clinical application of these molecules in clinical trials, or their ultimate prescription for patients, so it’s important that through our collaborations, the supporting organisations can work with key scientists, clinical practitioners and investigators together.’
‘We can plug huge datasets into these algorithms to test a model’s response, but that data has to be in a usable format, and complete and reliable. Our partner, Sensyne Health, has come up with a really interesting business model, which leverages large anonymised datasets that belong to the UK people, through the NHS.’
Lord Paul Drayson added, ‘It is important to recognise that AI is only as good as the data it analyses. Bigger datasets mean more robust findings, particularly when looking at rare diseases or analysing complex conditions. The UK has some of the richest health data sets in the world, collected across an extremely large national healthcare network. Sensyne Health believes NHS data is a sovereign asset that will help to make dramatic improvements in healthcare for patients in the UK and abroad, and sustain our NHS for future generations. We have developed a unique business model partnering with NHS Trusts where they receive equity and a financial return from the data we analyse.’ Sensyne Health currently partners with five NHS Trusts. ‘Our NHS partners remain the data controllers and all requests we submit to analyse the anonymised datasets are subject to the approval of an independent ethics committee. We strongly believe that patients have the right to expect that their data will be ethically sourced and responsibly analysed.’
Sensyne Health effectively acts as an enabler for the analysis of patient data on behalf of its commercial partners, Archundia-Pineda commented. While ownership of data remains with the Trusts, through Sensyne it can now be used to power and train the machine learning algorithms and AI technologies that are helping to develop the treatments and diagnostics that will hopefully ultimately be used to treat those patients. ‘Through its partnership with Bayer, Sensyne also provides a platform that allows researchers to assemble, leverage and track the data, so that the insights continue to improve over time. Ultimately it will help us at Bayer develop better drugs, faster.’
It might even be possible to carry out clinical trials in silico, to predict treatment effects on patients according to their disease specifics, and suggest optimum dosage. ‘We are working on an entirely new design of clinical trial, which is based entirely on computer-aided steps, with the expertise of principal investigators,’ Archundia-Pineda noted. ‘The internet of things and connected devices will enable decentralised clinical trials, where patients can stay at home instead of going to clinical research sites. While AI-based patient stratification can help us to easier identify the right patients for those trials, decentralised clinical trials will bring faster and more efficient results and even potentially allow us to predict real-life patient outcomes more precisely. If successful, this could allow the industry to cut development timelines by 30 per cent or more.’
Ultimately, the aim is to use AI and machine learning to improve diagnosis and also enable more broad adoption of precision medicine. One example of this is Bayer’s work to explore the possibility of developing an AI algorithm to support identification of cancer patients whose tumours express an NTRK gene fusion, which results in production of an altered TRK protein that leads to cancer growth. Although overall rare, this alteration can occur in varying frequencies across various tumour types, in both children and adults. The AI algorithm aims to help physicians identify all patients who are likely to have TRK fusion cancer by analysing routine tumour pathology slides.
‘We are training an algorithm with the goal to reach a high degree of precision in identifying correctly NTRK gene fusions, based on basic pathology images,’ Archundia-Pineda explained. A positive result can be confirmed by existing, genomic testing, which is not always used routinely.
‘Ultimately, the AI algorithm could help support consistent and widespread testing for TRK fusion cancer across the different tumour types, to help identify all appropriate patients who may benefit from a precision oncology treatment that is used to treat solid tumours caused by an NTRK gene fusion. The algorithm has been trained on an initial dataset. What we are doing now is looking to expand the dataset to further train and refine the algorithm and validate the test at a broader scale,’ Archundia-Pineda added.
Other initiatives in Bayer’s AI and machine learning pipeline include the development, in partnership with Merck and Co’s MSD, of deep learning-aided software to that can help radiologists identify signs of chronic thromboembolic pulmonary hypertension (CTEPH). The software received FDA Breakthrough Device Designation in 2018.
Separately, Bayer is working with Broad Institute researchers to develop an AI algorithm that can identify patients with a high risk of cardiovascular diseases, based on based on complex individual profiles made up of a unique combination of characteristics (demographics, clinical risk factors e.g. diabetes and genotypes).
‘Ultimately data science and digital technology will help us to transform patients’ health by diagnosing diseases earlier and better, developing new medicine faster and tailoring treatments to their individual needs,’ said Archundia-Pineda. ‘These digital capabilities are transforming the way the pharmaceutical industry brings innovation to patients and illustrate the convergence path of technology, science and medical practice.’
Redefining drug development
During the latter part of 2019 Argonne National Laboratory announced joining the ATOM (Accelerating Therapeutics for Opportunities in Medicine) consortium. ATOM is a public-private partnership between national laboratories, academic organisations and industry, which aims to transform cancer drug discovery through the combined use of high performance computing (HPC), biological data and emerging technologies. Argonne is initially bringing into the ATOM environment machine learning algorithms for designing, optimising and predicting efficacy, ADMET (absorption, distribution, metabolism, excretion and toxicity), and key properties of drug candidates.
The ultimate goal is to significantly streamline, hone and speed the drug discovery and development workflow, explained Rick Stevens, associate laboratory director for computing, environment and life sciences at Argonne National Laboratory. ‘It’s a high-level vision, but the basic idea is to turn the drug development pipeline upside down and try to compress dramatically – perhaps from six years down to just 12 months – the time it takes from identifying a new compound hit or lead, to starting a clinical trial. To do that we envision replacing most of the experimental steps in the pipeline process with advanced AI-enabled platforms and machine learning algorithms.’
Strip down this AI-enabled drug discovery workflow and we find it’s basically a two-phase process, Stevens continued. ‘In the first phase you’re using an AI generative network to create molecules. And then you take a molecule and put it into relevant models, so that you can start to predict properties, toxicology, and how different cancers will respond to that molecule. These in silico models are trained on existing datasets and then further refined using additional, independent datasets. These may be acquired from partners or collaborators, or generated de novo from continued laboratory experimentation.’
The predictive power of a machine learning algorithm will depend on the quality of the databases and data sources. ‘The trick is to keep track of accuracy,’ Stevens added. ‘Accuracy will be greater the closer the model remains to known data.’ Ask the model to move too far away from the training data and confidence will decrease, to the point that you then need more data to continue the training. ‘This cycle then continues, so you end up with a series of nested loops, where the core work is being done by machine learning, and you then use other techniques, perhaps classical simulation or high-throughput experiments, to derive more data on which to further train and improve the models.’
Argonne is bringing to ATOM molecular generators and molecular property predictors - particularly the efficacy models – for cancer. ‘We have additional technologies that are not yet in the hopper for our work with ATOM, including new approaches to accelerating simulations, and we are working on automated high throughput experiments.’
One of the nearer term goals is to build a common, model-based infrastructure with tools as pluggable components, Stevens said. The ATOM founders, including GSK, Lawrence Livermore National Laboratory, Frederick National Laboratory for Cancer Research, and the University of California, San Francisco, as well as the National Cancer Institutes, and startups, are all contributing their expertise.
‘What ATOM allows us to do is to integrate everything into something that is greater than the sum of the parts. It’s trying to build a working infrastructure that organisations can plug into, for the benefit of all members. This will result in a scale of endpoint that is well above what most groups can hope to achieve on their own.’
Scientists aren’t short of computing power, but one of the main bottlenecks is having enough of the right sort of data to generate true AI-enabling technologies for drug discovery.
‘In fact, data is just as important as money when you work with AI in drug research. For some areas of drug development money is probably easier to acquire than data, and we are developing our algorithms and AI-enabled technologies faster than we can access enough data on which to train the resulting models. For example, some types of tumour haven’t been screened against enough drug types to generate sufficient data on which to build thorough efficacy models,’ states Stevens.
There’s plenty of cell line data, Stevens said, but cell lines don’t behave exactly like tumours, and while there are increasing amounts of data from human cancers transplanted in animal models, this is also not necessarily easily accessible, non-proprietary data.
‘So, creating the right databases is still a key objective, and this will require experimentation. We can use simulations to predict some properties, but for things like measuring toxicity, or evaluating binding affinity on real targets, we do need to have more open or semi-open databases, and this will require experimentation,’ Stevens added.
The situation is different in the clinical arena, Stevens pointed out. In countries where there are nationalised systems, such as in the UK, some very large datasets are already available. ‘It’s a little bit easier to integrate data across the enterprise, although there still won’t be molecular and genetic data for every patient. We don’t have genetic data for every patient or their tumours, for example, and this is partly a cost issue.’
‘There does need to be a step change if we are going to be able to harness data from national healthcare databanks. And that will involve making molecular tests cheaper and faster, so that they can become a part of routine patient care. We have to rationalise healthcare data exchange models, while still protecting privacy.’
Making big, high-quality databases available springboards possibilities for machine learning and AI, Stevens said. He cites the UK Biobank as one example. ‘The Biobank has accelerated a lot of machine learning-based, clinically relevant research because its a large enough, high-quality dataset to work with. It’s not tied to a company’s drug development programs, it has been generated purely out of the clinical space. The U.S. Department of Veterans Affairs has a similar mindset and is collecting a large dataset across the entire population. These kinds of datasets aren’t just relevant to clinical research, but to drug discovery and development as well.’
So what will likely emerge as the first tangible benefits and game-changers for drug development? Stevens suggests that work by Argonne National Laboratory, ATOM and other pioneers will generate solid proof-of-principle evidence that it is possible to optimise drug leads using existing machine learning technologies.
‘We need to run through multiple cycles of generating leads, and working through simulation and/or experiments, which will enable you to further refine those compounds. It’s still a relatively slow process, but that’s because we need to work through the mathematics of uncertainty in these models, and how to optimise simulations and experiments,’ added Stevens
Optimise these cycles and you are then presented with a raft of compounds that will need prioritising. ‘You will always will have more targets than you can do experiments on, or that you can do simulations on, so another opportunity is to use optimal experimental design mathematics to prioritise not just the compounds from the biology standpoint or from the drug-like standpoint, but from the practicality of actually doing high throughput measurements on them. And that is what we hope to accomplish during this next year: to get that entire end-to-end system working’.
Stevens says the ultimate goal is to dramatically cut the drug development timeline, and costs – and to generate candidates that are more likely to show improved efficacy and safety. Importantly, AI-enabled refinement of the process could have an impact not just on the development of drugs for major diseases such as cancer or cardiovascular disorders, but on drug development and clinical research for orphan diseases with smaller patient populations.
Orphan diseases have traditionally not been a major focus for mainstream pharma due to much more limited potential revenues. ‘If we can reduce development timelines (and so potentially patent-protected time on the market) and also development costs, then developing drugs for orphan diseases, as well as neglected tropical diseases, become more attractive propositions commercially, as well as from a discovery and development perspective,’ concluded Stevens.