Informatics is just the tonic

Greg Blackman looks at some of the trends taking place in the pharmaceutical industry and how data management software is being used

The pharmaceutical industry has always been a competitive and fast-changing field. Drug discovery, which traditionally was based around small molecule chemistry (a classic drug like aspirin is composed of 21 atoms), now has to deal with the much larger proteins produced by biotechnology. Monoclonal antibodies, which fall under the classification of biologics, can be in the region of 20,000 atoms, three orders of magnitude larger. Laboratory practices are changing, with the move from paper to electronic records. There is an increase in laboratory automation, with robotic systems churning out terabytes of data. There’s also a cultural change, with more work outsourced, largely for economic reasons, rather than conducting everything in-house.

All of the top pharma companies work to some extent with electronic systems and all have evaluated, or are in the process of evaluating, how effectively these systems are deployed. Being able to share information within the company is one of the big benefits of recording data electronically and the systems should be integrated in such a way as to facilitate data sharing as much as possible. Trish Meek, director of product strategy for life sciences and informatics at Thermo Fisher Scientific, points out that it’s important to connect up the various electronic platforms. She comments: ‘If information is siloed in any one of these solutions, whether it is the ELN, LIMS, or an SDMS, then the greatest benefit of using electronic systems has not been realised.’

This also applies at a company level in that having silos of data at different sites is less effective than pooling that data and making it accessible throughout the enterprise. The drug discovery group of GlaxoSmithKline (GSK) has developed its Global Analytical Data Repository (GADR) using Waters’ NuGenesis Scientific Data Management System (SDMS), the main function of which is to store and retrieve all analytical data from the discovery sites worldwide and enable that data to be viewed throughout the company globally.

‘To have all the discovery data in a single location is critical,’ states Gregory Murphy, director, global informatics business development at Waters. ‘Prior to this, analysts were e-mailing or faxing information backwards and forwards between different sites and it was very difficult to exchange information or ideas about different analyses. Now, anybody in the company has access to the analysis of a given compound, and all the research for the entire GSK discovery organisation is contained in a single repository.’ Murphy adds that one of the extra benefits was that, once scientists knew that their data would be accessible to everyone within the company, the quality of the data increased.


As mentioned at the beginning of this article, traditional pharma is based around organic chemistry, which is a much older and more mature science than biology – there are naming conventions for compounds, companies have compound registries, etc. Biotech, on the other hand, is a less structured scientific discipline and operates via different processes and presents different challenges for an informatics platform. Dr Hans Peter Fischer, head of Phylosopher business unit at informatics solutions provider Genedata, likens comparing small organic molecules with large proteins such as monoclonal antibodies as ‘a bit like comparing a bicycle with a Boeing 747’ in terms of complexity and the number of building blocks and connections within the molecule.

Genedata has developed a Biologics Data Platform (BDP) to support biologics data management and analysis at Bayer Schering Pharma R&D sites worldwide. Biologics drugs include monoclonal antibodies and recombinant proteins, and lead generation for these large molecules is very different to that of small molecules, explains Fischer. This is due, in part, to the process: ‘There are different screening technologies, molecule registration is different, the assays are different, the development and lead optimisation is different, process documentation differs, even toxicology and safety assessments are different. Expanding existing small molecule informatics solutions to make a full biologics system through incrementally modifying functions is, therefore, difficult to achieve.’

There were three major issues that Genedata tried to address with the BDP: data volume, process complexity inherent to the R&D processes, and integration with existing systems. The platform is based on a scalable software architecture, capable of handling huge volumes of data. Business logic was built into the system to address process complexity, and system integration was addressed by Application Programming Interfaces (APIs), which allowed easy integration with existing infrastructure.

‘Pharmaceutical companies are trying to industrialise their discovery pipelines and there is a strict division of labour into specialised groups,’ Fischer says. However, there are many interfaces between these groups: one group, for instance, produces a DNA vector, which needs to be handed over to the expression group. At the same time, this group receives specialised cell lines from another department to produce recombinant cell lines and these are passed on for further downstream processing. ‘This level of process complexity cannot be handled without a sophisticated supporting IT infrastructure,’ states Fischer. Process complexity is not only internal, but also reaches outside the boundaries of the company, as many steps in drug discovery are outsourced for economic and efficiency reasons.

Most data is produced by instruments and, therefore, scalability is another dimension. ‘It’s no longer a biologist noting down results in a notebook, but machines writing terabytes of data to disk from which the company needs to make sense,’ Fischer says.

Another aspect was integrating the system with the existing IT infrastructure. Bayer Schering Pharma already had a significant chunk of IT infrastructure available, along with specialised pieces of software that had been developed in-house – all of which had to be integrated with the BDP system. ‘Many companies are very traditional in terms of IT; even big pharma, and many of the biology and biologics groups are still using Excel-based solutions,’ comments Fischer. This leads to highly inconsistent nomenclature, with regards to experimental results, and data is not automatically updated or centrally stored.

The BDP is supporting the full drug discovery process on the biologics lead finding side, from compound library generation to screening to protein production. Genedata plans to release a biologics platform in late 2010/early 2011, using the knowledge gained from a consortium of companies involved in biologics.

Elsewhere, a major US biotech company, which prefers to remain anonymous, is in the process of implementing an ELN from IDBS for its bioprocess division. The production version of the software went live on 15 March 2010 and the company currently has around 150 users, with the aim to roll out the ELN to more than 700 users in its bioprocess division by the end of the year. The bioprocess group develops the clinical material and carries out any characterisation of the protein, so aspects such as cell culture, purification, drug formulation and analytics are all part of the division.

IDBS’s E-WorkBook suite will firstly act as a direct replacement for paper laboratory notebooks, with analysts using the software to record experimental data. IDBS’ E-WorkBook for Biology will also be used in areas where large amounts of data are generated, driven in part by the increase in laboratory automation. A spokesman for the company said: ‘There are a lot of robots involved in purification or developing formulations, etc, and that automation generates large amounts of data, which requires more standardised processing methods. That’s an appeal of the E-WorkBook for Biology component and is one of the distinguishing features of IDBS’s ELN compared to other ELNs.

‘The traditional paper lab notebook is akin to a personal diary in the way it’s used and the idea that experiments might be designed with collaboration in mind from the beginning is a relatively new concept for our company,’ he continues.

The ELN will improve how accessible the data is and make it searchable, which should ease communication between the company’s two sites, as well as make handoffs easier between early-stage (phase II clinical trials) and late-stage pharmaceutical development (phase III clinical trials), and handoffs to manufacturing once the drug is approved.

In some instances, large pharma companies can be running too many software platforms over their various sites and departments. Laboratory automation and data management solutions provider Xyntek is currently midway through a three-year implementation programme of an ELN for a major biotech company, which wanted to reduce the total number of software applications operating. Elliot Abreu, vice president at Xyntek, says: ‘Some of these large pharma companies are running up to 20 different laboratory informatics platforms within the enterprise, support and maintenance of which can be very costly. In addition, integrating numerous platforms can be complex.’

One of the objectives of the installation was to find one solution that would fit most of the operations. However, the recommendation was to install two different products, one covering the discovery/clinical pharma area and the other covering the method execution and analytical testing areas of pharmaceutical development. ‘It’s one global ELN project, but with two technology platforms that are set up to communicate with each other,’ he says.

Manufacturing processes

GlaxoSmithKline’s manufacturing group also wanted to retire some of its LIMS servers housed at each of its eight or 10 manufacturing locations and implement a centralised data management architecture. The LIFT (Laboratory Information for Tomorrow) project looked to put in place a centralised architecture with LabVantage’s Sapphire LIMS as the central platform installed at GSK house in the UK.

In order to aggregate the data and harmonise it in a common format, GSK employed Waters’ NuGenesis SDMS. The SDMS parses the data and uploads it into LIMS, which ultimately produces certificates of analysis that are sent back to the manufacturing site to release the product. The LIFT architecture means it is virtually impossible to have a transcription error, because it’s all carried out automatically. ‘GSK wanted to avoid risk in the manufacturing process and one of ways of doing this is to reduce transcription errors through automating the process,’ comments Murphy, of Waters.

Manufacturing is a highly regulated environment and one of the major challenges facing the São Paulo manufacturing plant of Global Pharmaceutical Supply Group (GPSG), in Brazil, was ensuring compliance with standards. GPSG Brazil is part of Janssen-Cilag Farrmacêutica and a member of the Johnson and Johnson family of companies. It processes more than 10,000 analyses per month to ensure the quality of nearly 2,000 samples of raw materials, packaging materials, semi-finished and finished products.

The laboratories at the manufacturing plant of Global Pharmaceutical Supply Group (GPSG) Brazil in São Paulo (Image courtesy of GPSG Brazil and Thermo Fisher Scientific)

The company installed Thermo Fisher Scientific’s Atlas chromatography data system (CDS), validated to meet standards imposed by regulatory bodies. The CDS is integrated with 17 high-performance liquid chromatography (HPLC) instruments in the chemical laboratory, enabling all the chromatographic data to be accessed via one central server.

The CDS solution was implemented alongside Thermo Scientific’s LIMS, which is integrated both with a corporate enterprise resource planning (ERP) package, SAP R/3, and with a Janssen-Cilag proprietary documentation system. Ronaldo Galvao, quality operations director at GPSG Brazil, comments: ‘Between the production plant and the laboratory that analyses data from production, there is a need for regular exchange of information about quality and analysis values. By interfacing the LIMS with its ERP, GPSG Brazil can expedite the data flow between the lab and the manufacturing functions, streamline data handling and integrate data collection and reports.’

Cultural changes

According to Meek, of Thermo Fisher Scientific, one of the biggest changes for pharma companies is their dependence on external organisations. ‘The burden 15 years ago was to ensure the flow of information within the company and with external groups, but the majority of the work was done internally,’ she says. ‘Today, research organisations are also confronted with the additional challenge that much of that information originates at other facilities that are outside of their control, whether it is a contract research organisation, the clinic, or with an academic partner. It is not uncommon for companies to have 20-plus partnerships within a single therapeutic area.’ The key is connecting all of the organisations involved, and Thermo Scientific’s Connects product provides a framework for information sharing and collaboration.

The importance of electronic records for pharma organisations lies in preserving a company’s knowledge base. One comment from the contact interviewed from the US biotech company was: ‘If we’ve spent money developing a product or technique and it’s buried in somebody’s lab notebook, then it isn’t fulfilling its value to the company.’

And these systems need to be connected and integrated within a more centralised informatics architecture so that departments or sites are not operating in isolation. Web technology has facilitated these centralised architectures, as has the speed of the wide area networks (WAN). ‘Historically, the challenge of managing huge amounts of data from multiple sources in a centralised way really wasn’t possible,’ says Murphy, of Waters. ‘Now, having a single database that everyone can access is a powerful solution compared to the situation five years ago, in which bandwidth and the software applications available limited large pharma companies’ ability to set up centralised software architecture.’


For functionality and security for externalised research, software providers have turned to the cloud, writes Sophia Ktori


Robert Roe looks at the latest simulation techniques used in the design of industrial and commercial vehicles


Robert Roe investigates the growth in cloud technology which is being driven by scientific, engineering and HPC workflows through application specific hardware


Robert Roe learns that the NASA advanced supercomputing division (NAS) is optimising energy efficiency and water usage to maximise the facility’s potential to deliver computing services to its user community


Robert Roe investigates the use of technologies in HPC that could help shape the design of future supercomputers