Software to promote knowledge transfer

Share this on social media:

John P. Helfrich, from NuGenesis Technologies, believes that a new class of IT product can capture and manage scientific data to improve drug discovery

The biopharmaceutical industry faces the challenge of maintaining an annual double-digit growth rate. Yet, its pipelines of new products are plagued with too much attrition at too late a stage, due to concerns over efficacy, safety and business viability. In addition, 42 of the top 52 blockbuster drugs will come off-patent by the year 2007, producing a huge drain on the innovator companies as generics enter the market. For the innovator companies, all these issues mean that they have to double the number of new leads entering clinical trials phases, and to reduce the time from discovery to market from the traditional 12 to 15 years to less than 10 years.

'Fail fast and cheap' has become key to optimising R&D, and indeed it is in this pre-IND (investigational new drug) area that approximately 40 per cent of the sector's total R&D budgets are spent. The drug discovery arena is being streamlined into parallel processes using high-throughput techniques, yet this generates huge amounts of data that must be analysed for critical-path decisions on potential drug candidates.

The advent of modern proteomics-based target initiatives, in combination with more efficient combinatorial chemistry programmes, increases the target pool for potential new drugs from a few hundred proteins to more than 2,000. This is a true data explosion. The goal is to double the number of high-quality new chemical entities (NCE) going into clinical trials, while reducing the time to IND submission by half. To accomplish this, the control and management of scientific data used to make decisions must be dramatically improved.

High-throughput discovery
For decades, research has been a sequential operation where, after many months of target validation, the process would lead to assay development, followed by high-throughput library screens for hits, and then on to lead optimisation. Today, these processes are co-mingled within a framework that encompasses early clinical development and ADMET profiling in a high-throughput manner. The intent is to establish a sound 'therapeutic relevance' as early in the process as possible, to confirm the lead compound as a high-quality NCE and to drive it into clinical trials within a timeframe usually allotted for lead identification only. This reduced cycle time meets the 'fail fast and cheap' mantra. The output is early attrition of low-quality leads.

The flip side is the exponential increase in the number of data sets that require interpretation for decision support. The range of instruments associated with this research includes micro-arrays, 1D and 2D gels, HPLC/MS, MALDI-TOF MS, high-throughput screening plate readers, and a host of spectroscopic methodologies. Efficient, comprehensive, automated data collection, archival, and retrieval is a requirement if the modern discovery and development process is to produce the real knowledge needed to make informed decisions. These processes traditionally have been a cut-and-paste paper-trail of research notebooks and periodic supervisor-witnessing of work. The current trend is to minimise paper output and change to an electronic process that will assure the accurate and timely capture, approval, and secure archiving of all the data. Along with this, ways of communicating the data efficiently are being developed, so as to facilitate collaboration on a global basis.

The modern biopharmaceutical IT landscape consists of several, if not dozens of, unique purpose-built platforms - such as Laboratory Information Management Systems (LIMS) and Electronic Document Management Systems (EDMS), enterprise resource management systems, departmental-level data systems for specific instruments, and various desktop products. These products are purpose-built and require an enormous investment in interoperability and ongoing maintenance and support. Key to this patchwork quilt of software is the interconnectivity needed to communicate data for further processing.

A platform for scientific data management
The off-the-shelf NuGenesis Scientific Data Management System (SDMS) provides a base-layer infrastructure with direct connections for raw data capture and export to other common systems and software programs. The application-independent SDMS platform provides a foundation for a high-performance IT environment, and eliminates the arduous task of point-to-point integration with each data source within the enterprise. Data of all types and formats is captured direct from the source, catalogued and securely stored in a central repository. SDMS provides tools for finding, viewing, communicating and utilising all this disparate data on a global basis. It provides a secure method for automatically archiving and retrieving raw binary data files within minutes of creation or change. A database catalogue is created automatically and captured without any intervention by the analyst. The raw data is stored on safe, secure media (e.g. optical disks, RAID, CD jukeboxes, SAN etc.) and can be retrieved and restored with just a few mouse clicks at the scientist's desk.

  Data pieces can be extracted and sent to other software programs. A summary document created in Microsoft Excel is easily created through a cut-and-paste routine by the scientist. Each data piece contains a hyperlink to the original data that were captured in the database. Click on the image to see a larger version in a new window

Most analytical instruments and PC office software programs have the capability to print a report to a local printer. In fact, most scientists actually work with the printed reports generated by the highly-automated and sophisticated instruments in the lab, and rarely with the actual files. It is primarily the printed reports and documents that are used to define success, failure, and future experiments within the lab.

The SDMS has as one of its data capture agents a NuGenesis print driver that is placed on the networked data source workstation or PC. This allows the SDMS to capture the actual content of the reports (not just an image) and place the data into an Oracle relational database with all pertinent metadata captured for subsequent data mining (Figure 1 above). Both vendor-specific instrument-generated reports (e.g. LC/MS printout) and scientist-generated reports (spreadsheets, word documents and presentations that can contain parts of the machine-generated report) can be archived and stored securely.

Sophisticated mining and extraction tools browse the database by sorting and filtering human-readable report data, using tags assembled in an Oracle database. Added power is provided through the ability to find text, even embedded within graphics. Data can be selected and copied to electronic lab notebooks or presentation documents that can be assembled in minutes, then saved and communicated to team members throughout the organisation, anywhere in the world. Each data set contains a hyperlink back to the original data file for fast and efficient restoration if needed.

Relations with LIMS
A LIMS installation is a purpose-built instrument-interfacing solution that manages the flow of very specific data elements (not all data) between instruments, instrument data systems, the LIMS, and data management systems. A typical LIMS provides an interface that can collect instrument data from many sources (RS232, analogue, ODBC, etc.) then split and parse that data into meaningful components for display, further calculations and reporting. The interface between NuGenesis' SDMS and LIMS can work bi-directionally, acquiring work lists, creating sequence files and transferring the sequence files to instrument systems. The LIMS-configurable display consists of: a raw data display presenting the data as initially received; a single sample spreadsheet containing the detailed information for each individual sample; and a worksheet display providing an information summary of all the samples as they are processed.

Until now, labs that implemented LIMS still needed to capture certain information using paper worksheets and notebooks: standard and reagent preparations; instrument calibrations; review and revalidation details; sample-batch environmental and preparation summaries; observations pertaining to the testing; calculation worksheets; and miscellaneous checklists required by good laboratory or manufacturing practice and ISO. The LIMS worksheet mimics paper forms and is embedded within the LIMS application. Analysts document instrumental and manual results directly to their workstations, rather than filling out paper worksheets and transcribing the data. A LIMS worksheet can now access the content of a SDMS report - with all text, tables and graphics - to provide support for results coming from the LIMS. NuGenesis provides a Software Development Kit (SDK) and template technology that utilise an open architecture for automated import/export. Serving up this information through the LIMS worksheet enhances its value and makes the user experience easy, convenient and productive. Access to the entire data report in the SDMS is provided by a hyperlink that is present on the LIMS' user interface (see Figure 3 below - LIMS GUI with preview of data in SDMS). The combination of these two products represents a major step on the road to the paperless laboratory.

   Through hyperlink technology, a user button on the LIMS GUI allows access to the full report in the NuGenesis SDMS. Click on the image to see a larger version in a new window

The SDMS can serve as a centralised, common format repository for all computer-based analytical and summary data generated in the laboratory. It connects directly to data sources and captures both raw file data as well as instrumental data reports that might be printed to a lab printer. Its architecture provides a secure, paperless system that meets 21 CFR Part 11 and intellectual property requirements.

The system also addresses the need for seamless integration between instruments, LIMS, and the higher-order IT infrastructure already in place in many organisations. It streamlines and enhances the communication and efficient flow of data throughout the global enterprise.

The interoperability of the NuGenesis SDMS platform is invaluable in providing access to all the relevant data on a particular NCE programme to the people who need to make the critical decisions. When weighing the advantages of moving a candidate forward in the light of the further investment that would then be required, or killing the programme and diverting the money to more productive candidates - the more information available upon which to base the decision, the more likely it is to pay off.
John P. Helfrich is Director, Research Programmes, at NuGenesis Technologies Corporation, Westborough, MA, USA.