ChemAnalytics offers access to chemical structures and analytic data
For almost two decades, chemical and pharmaceutical organisations have had access to software systems that enable them to manipulate and manage their chemical structure collections. All chemical matter, whether real or theoretical, can be represented at a molecular level of detail via a chemical structure, which is typically the first step to occur before any chemical matter is actually made. However, merely drawing a chemical structure is not proof of its existence, and instrumental analysis techniques are invaluable for confirming the presence or identity of chemical matter.
Historically, organisations have had to separate the structural entity from all its defining data, simply because appropriate software systems were unavailable to manage the diversity of data that could be associated with a compound. Although the problem has been acknowledged, the general solution has been to reduce the rich analytical data to meta-data that can be easily stored, searched, and retrieved. For example, an LCMS dataset could be reduced to a series of components identified by mass, retention time, and integrated peak area. Chemists could have their data stored in notebooks, filing cabinets, or scattered across hard drives on the network. These legacy data are often not stored in a way that would permit easy access by anyone in the organisation who might need to review them. Access to data on demand is a long-term need that has remained unfulfilled. While a chemist can use a simple web browser to search for information across the World Wide Web, they have only limited access to data, information, and extracted knowledge within their own organisation.
New chemical entities are characterised using a combination of analytical techniques. For structural verification, such techniques typically will include chromatography (simply for separation or as part of a hyphenated system), mass spectrometry, NMR, infrared, UV-visible spectroscopy, and others. The aggregate value of such instrumentation, which may be in the millions of dollars, pales by comparison to the overall value of the data that is generated. For each species characterised using the integrated analysis of one or more of these technologies, the interrogation of the data to produce a single chemical structure can consume many hours for a skilled scientist. The resulting combination of characteristic identifiers, such as retention times obtained under different chromatographic conditions, representative molecular fragments from mass spectrometry, characteristic chemical shifts and coupling constants from NMR, vibrational frequencies from specific functional groups and lmax value from optical spectroscopy, all help to characterise the molecular framework and electronic nature of the atoms and bonds defining the structure.
- ACD/Web Librarian is a web-based program that helps chemists and analysts gain unified access to structures, spectra, and other analytical data collected throughout the enterprise
- Software is now available which allows the integration of various forms of analytical data
- Analytical chemists search ACD/Labs databases by spectroscopic and chromatographic features to aid in the process of structure verification and elucidation
When one considers the broad array of analytical technologies that are typically employed for structure elucidation, in conjunction with the time spent by scientists interpreting the resulting data, there should be little surprise that thousands of dollars, if not tens of thousands, could be invested in a single compound. Although this cost is dwarfed by the final development costs of a product, the attrition rates from candidates to commercial entities are enormous. Billions of dollars per year are spent in generating analytical data that characterise the candidates. These chemical structures enter a chemical registration system for all to search. The fundamental data used for identification commonly remains disconnected and distributed, despite the recent availability of knowledge management and archival systems. Although such systems can store virtually any type of binary data, they are not structurally-enabled and merely storing binary analytical data, without a structural context, limits the usefulness of such data. These systems are, at least, addressing the need of gathering the data into a single container for text-based interrogation.
Organisations have come to recognise that analytical data is more than just a collection of objects containing peaks, bumps and curves to be stored indiscriminately. These data represent greater detail than a simple connection table of the many and varied characteristics of the molecules. With the ability to mine these data using various forms of searching algorithms, it becomes possible to identify unknowns more easily and to identify more quickly chemical classes, functional groups, and molecular properties. With a continued push for faster turnaround for structure characterisation, the foundation of legacy data generated previously in an organisation can become catalytic in improving the throughput for an analytical laboratory. This effect becomes most beneficial when analytical data are associated directly with the resulting chemical data using an approach recently coined as ChemAnalytics¹.
Software is now available that allows the integration of various forms of analytical data. However, hurdles remain for any laboratory wishing to bring together data from different analytical instruments. File format heterogeneity prevails, based simply on the needs of instrument vendors to protect their market and proprietary details of their techniques. Even though there has been partial success in terms of generic file formats such as JCAMP², these are not lossless conversions. Efforts in this area continue today, with the intention to deliver an XML-based homogenising format known as AniML³ (Analytical Markup Language). Homogenisation of file formats is a step towards resolving the issue but a single homogeneous platform for processing, viewing and storing disparate data, and specifically to integrating these data to their associated structure is surely the ultimate solution.
These systems allow the processing of binary file formats directly imported from the instruments without the need for generic translations. The resulting processed data can be stored in a database for future reference. Most importantly, these data are directly associated with one or more chemical structures with each of the techniques associated with the structural detail extracted from the technique in question. Every retention time, chemical shift, vibrational band, and molecular fragment is associated with the molecule directly and is indexed into the database and available for searching, With this warehouse of ChemAnalytical knowledge extracted from the data and enabled through the platform, the value of an integrated system is obvious. Chemists search for associated analytical data through structure-based enquiries. Analytical scientists can use spectroscopic and chromatographic features to search databases to aid in the process of structure verification and elucidation. Classical text-based searches via registration codes, LIMS identifiers and other meta-data allow integration to the sample management systems presently deployed in many companies.
Integrated access to data
The challenges outlined above have focused specifically on the characterisation of chemical structures. However, it is commonplace in the pharmaceutical industry simply to assume the integrity of a chemical entity while screening against specific targets. Terabytes of analytical data can be generated in screening compound libraries or traditional singletons, and access to these data on demand is of similar value to the investigating scientists.
The obstacles preventing integrated organisational access to chemical entities and their associated analytical data are no longer technological in nature. Great efforts have already been expended to automate the generation of analytical data and laboratories now run 24/7. However, both immediate and long-term access to this data has generally not been considered to be a part of the strategy. The primary issue in this regard is recognition of the value of providing access to analytical data throughout the life-cycle of a chemical entity. The associated capital investments required to deploy an integrated system should include optimisation of the business processes and workflows. In order to catalytically shift the information and knowledge management for an enterprise, an organisational champion who recognises the value of ChemAnalytics, necessarily needs to drive the process.
Antony Williams is VP Scientific Development and Marketing at ACD/Labs