The future of discovery informatics platforms
The drug discovery process produces data at an astonishing rate, and the ability to organise this information across organisations has until recently been, at best, limited. An overarching, commercially-available informatics platform for drug discovery exists today only in partial implementations – why? Generally, installations consist of vendor or in-house applications specific to only one or a few functions of the drug discovery process, and are typically based on vendors’ internal expertise. Today’s informatics (cheminformatics, bioinformatics, etc.) organisations face the challenge of providing a platform that can efficiently integrate disparate data sources and various applications to effectively manage discovery projects and workflows. Is this feasible considering the massive diversity of the input data sources, from simple to complex and multidimensional?
In order to achieve this, informatics applications must not only assist in aggregating data, but must also provide the ability to help assess whether ‘critical attributes’ are being met. In the discovery process, this list of attributes can be very long, and for informatics, the ability to evaluate and make decisions in real-time, and in relation to each other, is key. For example, upon registration, every compound is exposed to a variety of assays, each generating a point of data. These results generally come from different groups across the drug discovery organisation, and in many cases, from different geographical locations or from contract research organisations. An effective informatics platform is responsible for delivering not only a ‘balanced’ view of this information to project teams, but also an assessment of the lead compound’s performance relative to other compounds tested within the same project. To enable this, the appropriate analytics and visualisation applications must be in place to facilitate:
- Data aggregation – automatically accepting raw data from networked instrument acquisition consoles, and copy it to the appropriate network appliances for processing;
- Data processing and analysis – applying robust processing and analysis routines, automatically, then store the results in a relational, indexed database; and
- Visualisation – displaying aggregate results, and visually assess assay performance.
The need is present to aggregate other project-related activities such as screening/assay experiments (pharmacology, toxicology, DMPK), building in silico predictive models, and performing both in vitro and in vivo model studies, etc. The critical capabilities of workflow management systems include:
- Submission requests – with connections to orthogonal data sources;
- Submission tracking – the ability for both project teams and support groups to track the progress of individual requests; and
- Result dissemination – ensuring that results generated by the service organisation are integrated into data warehousing and visualisation applications.
External data sources – collected for a variety of purposes including literature information on specific reaction protocols, biological assessment of compound classes, and patent assessment for freedom to operate – must also be considered.
Similar to the disparity among internal data sources, external data sources exist in many different formats, so that integration into a single informatics platform has historically been difficult. However, today’s modern web architecture provides some effective integration opportunities:
- RSS feeds – applying business rules to filter ‘off-the-presses’ literature publications for project teams to incorporate reaction information on demand into synthesis planning workflows;
- Web services – integration of external databases via web services protocols afford internal and external data to be aggregated, mined, and disseminated in real time;
- Subscription-based content access – informatics organisations can license access to relevant data, and integrate external content into project team decision-making; and
- Open access initiatives – databases of public-access data, including structure and biological assay results, can be integrated into internal decision-making capabilities.
Yet another source of disparate information, the product of systems biology, must also be considered when discussing an informatics solution for the future. Systems biology poses an incredible challenge due to the nature of the data involved. For example, in classical lead optimisation, data collected exhibit a one-to-many relationship (e.g., one compound with the results of many assays). For a systems biology approach, results exhibit a many-to-many relationship – for an individual compound, not only are there many assays to perform, but there may be many specific biomarkers to account for in results. How can informatics organisations effectively identify the meaningful result elements from such diverse interacting data pools?
Simply adding more information to existing data streams will result in inefficiency and slower decision-making abilities. In addition, how can critical elements be defined for acceptance criteria? Therefore, systems biology essentially adds a whole new dimension to the manner in which project information is managed. An effective solution is not yet available, but existing capabilities can be enhanced to accommodate this new dimension.
The challenge facing informatics organisations in the near future is to be able to tie together all of this information with further external information, and the products of systems biology, resulting in a complete system for data management and effective decision-making. The capability of promising new informatics systems to achieve this harmonisation will help to promote even more effective decision-making in the future.
The author would like to thank Andrew Anderson, a former colleague at ACD/Labs, for his insights and help in the authoring of this article.