DATA MANAGEMENT: BIOBANKING
Paul Schreier examines which functions are being added to LIMS to make them suitable for biobanking and points to several reference projects.
‘Medical researchers generally believe that for their studies to be credible they need a primary group of nearly 4,000 patients, and for validation they must replicate the work with roughly 20,000 patients. Where else but with a biobank can they find this number of samples?’ This summary of the need for biobanks comes from Professor Joyce Carlson, laboratory manager for clinical chemistry and pharmacology, University Hospital in Lund and a member of the planning committee for the Swedish LifeGene national biobanking project. The need to help researchers later locate the most useful samples and get data from them has emerged as a major challenge, one that developers of LIMS software have taken on.
What is a biobank?
A biobank is a repository for biosamples, whether taken at hospitals or during research projects conducted by government-sponsored projects or by commercial companies. The samples in a biobank are useful, for example, in helping identify a potential population base for a specific trial or to discover new biomarkers. The cost for storing tissue and fluid samples has dropped tremendously, and the number of biosamples being taken has exploded. Literally thousands of biobanks of all sizes – from a few thousand samples for a specialised project to many millions for international projects – are being set up to store these samples for later use.
While the value of high-quality biospecimens with well-annotated data is tremendous, sometimes initial data is sufficient to facilitate or advance a research project. Lisa Miranda, technical director of the Tumor Tissue and Biospecimen Bank at the Hospital of the University of Pennsylvania, explains that a researcher might require retrospective or downstream data as a premise to do another study. Biobanks can serve a supporting role as data warehouses. For instance, the researchers might desire data about the types of specimens collected along with associated demographic data; some might want data from downstream applications such as tissue microarray, RNA/DNA analysis. This type of data could be helpful in providing prerequisite scientific rationale for biomedical research studies.
Keeping track of biosamples goes far beyond locating them to a particular refrigerator, drawer or vial. Considerable other data is often collected with each sample. For instance, what was a donor’s age, gender, place of residence, state of health and were there any specific diseases? Some biobanks even collect data concerning lifestyle (how much exercise? smoker? alcohol?) and socioeconomic background. In regard to this extra information, thorny ethical and legal issues surrounding the identification of donors by name and having their consent to use their samples in various studies have cropped up.
Where are the biobanks?
With the large number of biobanks popping up everywhere, how does a researcher know if there is a suitable biobank that could provide samples? Unfortunately, comments Miranda, there is no central global biobank registry. Further, some biobanks are proprietary, particularly those in the pharmaceuticals world, and information/samples are shared only with selected partners. Some online material locators and groups exist where researchers can note what they are looking for and be referred to different biobank networks. Another method is word of mouth, and there is an emerging business in third-party ‘specimen brokers’ who look for collections on behalf of another party.
Even today, biobanks generally operate independently and there are no de-facto standards that indicate which data to collect or how to collect it. And while some have websites that list the types of information available, comparing data among them can still be difficult. How do you find a universal method of quantifying the amount of physical exercise a donor gets? How do you describe nutritional habits that vary widely among countries or continents?
Karolinska Institutet is embarking on one of the most ambitious biobanking projects ever in the hope of creating a federated database involving multiple projects.
There’s clearly a need to organise these vast amounts of data. LIMS vendors indicate that they can handle biobanking tasks, but what are the key differences between LIMS and biobanking software? In short, a LIMS manages the execution of tests on samples and tracks them through the testing process; biobanking software shows where a biosample has come from, where it’s been, how it is stored, and how to get at it later. Users need a front end that allows them to search through a biobank database to see which samples are available based on flexible search criteria.
Sometimes biorepository functions focus solely on logistics and legal issues and are separated from research. Meanwhile, though, the trend is that data from sample analysis is being put back into a biobank, and that calls for merging the capabilities of biobanking software and a LIMS system. That’s one thing that makes it difficult to develop homebrew biobanking software, says Gonzalez, technical director for LabWare Europe, because there are now strict regulatory requirements on biobanking data.
What, specifically, are some of the things being added to a LIMS package to make it suitable for biobanking applications? Suppliers have lists of requirements that differ somewhat, but most generally agree that biobanking software must be able to:
- Handle storage locations throughout the workflow, including volume tracking and the chain of custody with electronic signatures;
- Work with various storage containers so as to manage temporary and transit storage locations and report a complete history of transfers, custodian and location changes;
- Manage a complete genealogy by tracking aliquots, derivatives and pooled samples and record a complete genealogy for each sample;
- Manage complex biographical information about each sample’s donor, which can be of a massively sensitive nature;
- Handle consent management. The software must comply with regulatory and legal requirements because the use of samples can be governed by narrow consent and contractual restrictions; samples might be available for one type of study but not for others; and
- Have an electronic lab notebook that handles data capture from many sources and various documents that might be in an unstructured format.
Researchers from many sites, even around the world, might want to explore which biosamples are available in a biobank. Thus, software architectures are moving away from systems run only locally in favor of web-based architectures, which are becoming standard. Early web applications were limited and customers found them lacking in functionality, but today’s web technologies let users harness the full power of a desktop LIMS application via a comprehensive web solution, says Thermo Scientific’s Doug Holbrook, product manager of the company’s Nautilus LIMS.
In such a scheme, the actual database containing all the biobank information runs on an SQL Server or Oracle server. The biobanking software itself, which implements a customisable interface that allows users to interact with the database, generally runs on a web server so that any user with access rights can then view and interact with the biobanking screens from popular web browsers. In fact, adds LabWare’s Gonzalez, it is possible to use low-end devices like a PDA or iPhone to interact with some software because there is no biobanking code running on the local computer or device.
Meanwhile, most major LIMS software supports biobanking functions, whether with a standalone product, an add-on module or with functions built into the core product. In fact, some companies with impressive reference projects don’t even mention the term biobanking on their websites or in their literature. However, the major suppliers do cite reference projects to illustrate their software’s appeal in that market.
David Sanders, LIMS manager at the UK Biobank, sitting in front of a user-interface screen for the Nautilus LIMS from Thermo Fisher Scientific.
One of the most famous biobanking projects in the world is being set up at Karolinska Institutet (KI) and is the result of Swedish biobank legislation from 2002, which states that samples collected for research must be traceable and managed with quality. The implementation at KI not only involves products from IBM, LabWare and LabVantage, it is making some of the first efforts towards a federated database of information from multiple biobanks.
The relationship among these three companies is explained by Dr Jan-Eric Litton, director of informatics, and Dr Anna Beskow, project manager for LIMS. Roughly five years ago, KI entered into a strategic relationship with IBM to set up a BIMS (biobank information management system). The BIMS doesn’t import any data, but makes it possible to search several sources at the same time. For instance, with Sapphire from LabVantage, users can look for samples in the core facility and also the Swedish Twin Registry and the MIR database. The core facility uses Sapphire to help researchers with sample collection, approvals, logistics, referrals, receiving and storing samples, DNA extraction, withdrawal of samples for external analysis, plus study donor management to track information about donors and their study IDs; this database, however, includes no research data for the samples. Note that BIMS users provide their own data sources and users can only search for data to which the owner permits access.
Most researchers in Stockholm don’t have a LIMS today, adds Dr Beskow; they handle biobank data with paper or Excel spreadsheets and don’t have access to BIMS as a research tool. For such groups, KI and Stockholm County Council selected the LabWare LIMS for sample management, storage-location management, and other required operations. The project has just begun; two groups are up and running, and 30 more groups/departments have signed up. In the following years, the project hopes to include most research groups in Stockholm, of which there are approximately 5,000 potential users, while groups in other Swedish cities have expressed interest. Users can search their own data within LabWare LIMS, and when that LIMS is fully integrated into the BIMS, researchers will be able to search through their own data directly within the BIMS in combination with data from other sources.
The Storage Location Manager is an integral part of the biobanking facilities within the LabWare LIMS, and here you can see just some of the data associated with a given sample.
Example of integrated analytics by accessing InforSense from within LabVantage in the Clinical Data Analysis view.
A further KI-coordinated pilot programme is LifeGene, which plans to include 500,000 Swedes in families and follow them for decades using both questionnaires and in-person testing (IPT). If the big rollout of this gigantic project gets financed, a new biobank – perhaps managed by KI Biobank – will be built to handle the samples and will use LabWare LIMS for this purpose.
At an even higher level, the Swedish government is working on NAT-RBR (NATional-Regional Biobank Registry) where all LIMS in Sweden will report on the biobank samples they have. In this case, both LabWare LIMS and Sapphire will report to it using XML format with information about sample donor, personal number, name, samples, sample types, sample data and consent.
Another world-famous project is the UK Biobank, which is gathering samples from 500,000 people along with extensive medical and family histories. That project will follow their medical and other health-related records over the next 30 years. UK Biobank expects 15 million aliquots resulting from the fractionation of blood and urine, and to handle the complex management of these samples UK Biobank selected Thermo Scientific Nautilus LIMS.
Since 2007, more than 350,000 blood samples have been fractioned and split into more than 1,000,000 aliquots for storage. An automated blood fractionation system from RTS Life Science automates this process, allowing the reliable tracking of samples from source container to storage cryovial. For the UK Biobank this data is then transmitted to the Nautilus LIMS.
The UK Biobank has standardised on the Nautilus LIMS across all of its sites allowing for biological analysis results to be automatically entered into and processed by the central repository. That LIMS serves as a comprehensive inventory for the researchers who need to use the results. Moreover, the UK Biobank participates in a LIMS user group with two other biobank organisations using Nautilus LIMS, the Hunt Biobank in Norway and the Singapore Tissue Network. ‘The three organisations are doing similar work using many of the same tools and intend to collaborate on methods involving Nautilus LIMS,’ explains David Sanders, LIMS manager at UK Biobank.
Thermo Scientific’s Holbrook explains that as the biobanking field started to emerge, the company’s LIMS software capabilities, which already included sample-tracking, were extended to include the unique requirements of biobanking organisations. Thermo Fisher has a full roadmap to continue to address biobanking needs with Nautilus as the market develops. Holbrook adds that these features are built into Nautilus rather than in a separate product or add-on module.
As another reference project, LabVantage points out its involvement with the Multiple Myeloma Research Consortium (MMRC), which is currently comprised of 11 academic institutions and whose heart is the MMRC Tissue Bank with ongoing collection of tissue samples at various member institutions.
The ‘dashboard’ features of Starlims give an overview of work in progress.
As noted earlier, researchers are now using a LIMS/biobank software environment for more than just tracking sample storage. To give users even more flexibility, in its Sapphire software, LabVantage implements what it terms integrative analytics. It is based on data from multiple sources such as Excel spreadsheets or other databases and coordinating the invocation of analytical tools from various suppliers, all within the Sapphire environment. The company creates wrappers that allow a direct connection to analysis programs such as popular statistics packages or mathematical programs such as Matlab. Most recently, the firm entered into an exclusive agreement to resell the InforSense workflow platform with the Sapphire LIMS. Keith O’Leary, director of product marketing, explains that if a scientist wants to run a microarray data analysis using data from Sapphire as well as other clinical data, that person executes the InforSense workflow and, using integrated analysis tools such as the R stats package or Affymetrix Power Tools, can visualise and analyse the data. The end results of the analysis are published back into Sapphire for further experimentation.
The Rutgers University Cell and DNA Repository (RUCDR), the largest universitybased cell and DNA repository in the world, has selected the Starlims Version 10 software. The web-based system will enable hundreds of research sites to obtain biobanking data in real time using standard web browsers; at the moment RUCDR serves more than 500 research sites worldwide. Starlims provides biobanking functionality through its optional Clinical Solutions ‘dictionary’, which includes a biorepository module. One interesting feature is the software’s Scientific Data Management System (SDMS), whose unique parsing and recognition technology transforms documents or files in a variety of formats into query-able structured information. Often details associated with a sample are not entered into a database in a structured form, but rather exist as separate documents created during sample collection and analysis. Another feature is the ‘dashboard’ view which gives a quick overview of work in progress.
Ocimum Biosolutions has successfully deployed its Biotracker LIMS to support and automate the biorepository at one of the world's largest pharmaceutical companies for its three geographically dispersed biorepository locations on two continents. Biotracker at this customer supports the global molecular histology and pathology departments by providing the management of biological samples such as tissues, tissue sections on slides, bio-fluids, tissue and cell microarrays, etc. Information tracked includes clinical, donor, pathology, complete genealogy, location, and availability.