Thanks for visiting Scientific Computing World.

You're trying to access an editorial feature that is only available to logged in, registered users of Scientific Computing World. Registering is completely free, so why not sign up with us?

By registering, as well as being able to browse all content on the site without further interruption, you'll also have the option to receive our magazine (multiple times a year) and our email newsletters.

Information overload

Share this on social media:

Topic tags: 

Trevor De Silva and Geoff Parker, from the consultancy Scimcon, discuss the causes of the data explosion in the laboratory, and suggest ways to tame it

Across the science, biotech and pharmaceutical industries, managers and workers are finding themselves wading through what is rapidly becoming a torrent of information. Competition, scientific advances, regulation and computerisation have all played a part.

While Information Systems (IS) make it easier to manage the volume of data spawned as a result of regulatory and competitive pressures, they also make copious amounts of data available in quantities that many laboratories find difficult to manage.

The laboratory has felt the growth in data acutely, particularly as the pharmaceutical and biotechnology industries strive to bring a greater number of drugs to market more quickly. Consequently, laboratories are under pressure to develop procedures that facilitate the screening of an increasing number of prospective drug candidates at the beginning of a new product development cycle.

Because of this competitive pressure, the number of new chemical entities entering the later stages of development each year is significantly increasing. Each new development attracts a huge amount of data, particularly as techniques that acquire 3D data, or ascertain genetic data, push volumes up even further.

An extra dimension to new product development is brought by the need to extend patents through the use of new delivery mechanisms. With many of the major blockbuster brands approaching the end of their normal patent life, manufacturers are looking for new ways to deliver the same products. This activity puts further stress upon laboratory systems and the development process as a whole.

Looking specifically at the development part of R&D, organisations need to make complicated decisions about what to do with data describing entities that don't make it to the next stage. Increasingly, they may want to look at alternatives to successful products to see whether they would succeed using a new variant or a new delivery mechanism.

This goes against the traditional approach of identifying a successful candidate and taking it forward, then discarding the rest. It means that many organisations won't have decision-making processes in place to enable them to pin down which candidates (and associated data) should be stored, for how long, and who should be involved in making those decisions. The cost of data collection, storage and archiving always needs to be balanced against the potential value of that data to the company.

In addition to competition, regulatory pressures also have a significant effect that further fuels concern regarding data explosion. The move from paper to electronic record-keeping causes issues in itself, as it not only implies the need for better ways of proving authenticity within data, but also introduces a series of decisions that need to be made about how data will be stored and read in the future. Will the software applications used to read and store data today, be the same ones in use in five years' time when a product is released onto the market, for example?

One reason for the introduction of 21 CFR Part 11 by the US Food and Drug Administration (FDA) was to find ways to document and prove audit trails on electronic data. While a paper document can be analysed to see when details have been changed, it is far more difficult to look at original raw data and prove exactly when it was captured, stored and altered.

IT and IS directors and laboratory managers are all developing different approaches to this challenge, and these may involve electronic signatures, audit trails and hybrid approaches to paper/electronic documents. Whichever approach wins out, it will almost certainly have an effect on the way organisations manage their growing data volumes.

Technology itself is also playing a part in the size and scope of the sea of information that scientific computing must conquer. The implementation of technology, such as automated testing, systems has made laboratories significantly more productive over the past five years. But ironically these improvements in productivity are now contributing to the data explosion, which will only multiply in complexity and volume over the next few years. So, bearing all of those factors in mind, where should organisations start when developing a strategy to conquer the mountain of data they produce and must store? In our view, they must begin with the short-, medium- and long-term business plan for the laboratory, and for the organisation as a whole.

There is no point in developing an approach that enables laboratories to provide data, information and knowledge, only to find that it is not in line with the business plan and therefore is not required. If the business aim is to develop a specific number of new products and derivatives in a set period, then a data strategy should deliver this aim. It is not just about making lives easier for people who work in the laboratory, though this is clearly important.

The next stage is to analyse what technology is in place already: what systems are in place; what is working well; what is not working well; and, where do gaps exist? The investments required in information systems to plug those gaps need to be prioritised in line with business aims and objectives: what investment will enable the business to move forward most quickly? The costs of implementing a new data system for a laboratory can run into hundreds of thousands of pounds. It is important to ensure that this investment is channelled into the most appropriate direction for the business, not just for the laboratory.

Implementation of those new systems comes next. Without introducing bureaucracy, a steering group of high-level executives from the business and the laboratory environment is needed to make decisions about when different projects within the overall IS strategy should take place.

Seniority is required at this stage, not just so that funding can be secured, but also so that appropriate resources can be allocated to each project. These project teams need to represent all of the stakeholders involved, ensuring buy-in and understanding. The steering group must also monitor progress of each project against original targets and business objectives. Navigating these requirements, and ensuring that all stakeholders' interests are represented, is not always easy. There are some historical and cultural barriers that need to be addressed, particularly when it comes to making decisions about keeping data on unsuccessful candidates, for example.

For this reason, it makes sense to work with external experts who have experience and know-how in building IS strategies that deliver business objectives. Such partners have first-hand experience with many different organisations, and can help companies recognise the need for an inclusive approach. They can take a fresh view of a company and its existing systems, and help set priorities for investment that deliver value to the business as well as laboratory managers, directors and their staff. Scimcon's core expertise, for example, includes LIMS consultancy, regulatory compliance, and IS strategy. Since its establishment in 1987, Scimcon has worked with pharmaceutical, biotechnology, petrochemical and utility groups all over the world.

A case in point
Bayer Pharmaceutical's biotechnology division is a good example of a company rising to the data challenge. The biotechnology site analyses thousands of protein samples each year in its quest to develop new drugs. To harness the wealth of information now available for drug discovery, Bayer Biotech has embarked upon a five-year plan to redesign its information management to support the research and development processes more effectively.

Scimcon joined the effort as a strategic partner and became involved in a comprehensive IS strategy review, including the installation of a new candidate tracking system. This has enabled Bayer to streamline and simplify its data management processes, making these far easier for users to handle and understand.

Ken Kupfer, Head of Biotechnology Scientific Informatics at Bayer Corporation, says: 'Two years ago, our concept of information management was bioinformatics. But now we see the value of an integrated approach to information management that supports our entire R&D process. There has been an immediate business improvement, in that vital information supporting the drug discovery process is now stored in one central, automated system, which has replaced the mishmash of Word and Excel documents which we've since been able to rip up and discard.'

The amount of data that any laboratory has to deal with is only going to grow over the next few years - albeit incrementally. The cost of technology will continue to drop, which will enable more processes to be automated and more data to be generated at an ever-increasing level of granularity. Competitive pressures spurring data-production will increase, and regulatory requirements are unlikely to subside.

This means that organisations in the scientific arena will need to include data management into their ongoing strategy. Companies will be required to invest more resources into laboratory information systems to enable them to store more complex and voluminous data.

IS strategies will continue to absorb an ever increasing share of corporate budgets. As this happens it is critical that expenditure directly addresses the real needs of the business. This cannot be simply a question of throwing money at a problem: a pragmatic approach based on a real understanding of the business, its goals and its priorities going forward must be adopted. If it is not, companies run the risk of being crushed by the data mountain, and failing where more nimble, efficient, data competent competitors succeed.