Biobanks have their own unique requirements in terms of data management and, with increasing amounts of molecular data being generated around biobank samples, sophisticated informatics solutions are essential, as Greg Blackman finds out
The increased focus on translational research has placed greater emphasis on biobanking as a source of samples for researchers to tap into. The value of these repositories doesn’t lie solely in the samples themselves, but in the metadata associated with each specimen. Researchers studying diseases like cancer can access samples based on criteria specified in the metadata, such as age and health information about the patient, as well as the type and stage of tumour tissue stored, among other information.
Facilities will use Laboratory Information Management Systems (LIMS) firstly simply to book samples in and out of the bank, as Nial Hodge, IT support at the Roy Castle Lung Cancer Foundation, University of Liverpool Cancer Research Centre, comments. He says that LIMS improved the centre’s bookingin process and a proper booking-out system did not exist prior to LIMS. The foundation conducts lung cancer research and is using Autoscribe’s Matrix Gemini LIMS to manage its tissue bank samples.
Tissue samples will often be sourced with the cooperation of surgeons and hospital staff, as Lorrie Perpetua, coordinator for the University of Connecticut Health Center Research Biorepository, explains: ‘There are several surgeons that participate and identify appropriate candidates (for tissue). The consent of the patient is obtained and through the cooperation of the surgeon, the operating room staff and the pathology staff will coordinate with the centre’s [University of Connecticut] biorepository for the collection of tissue.’ Any leftover tissue after pathology diagnosis is stored at the biobank.
The University of Connecticut’s biobank currently stores around 2,600 tissue and blood samples, as well as patient data obtained from consented individuals, as a source of material for researchers within the University of Connecticut.
The biorepository uses LabWare LIMS as an inventory program, to track the sample movement, and to attach clinical and laboratory information that researchers might require, such as age, type of tumour, and crosssectional images of tissue. ‘A really important aspect for the researchers is the clinical information provided with the sample and the ability to store this information is something that standard inventory programs typically don’t have,’ says Perpetua.
Trish Meek, director of product strategy, life sciences, informatics at Thermo Fisher Scientific, specifies that a biobanking LIMS ‘needs to be able to simplify inventory management and ensure that associated sample data, like chain of custody and patient consent, are available at every level of the hierarchy’. She also notes that LIMS should integrate easily with instrumentation and automation systems.
As well as storage, scientists need to be able to search for samples to make sample requests. ‘Creating a request is only the start of the process,’ says Meek. ‘[Using LIMS] customers of biobanks will be able to track the progress of their request from creation to fulfilment.’
Thermo Scientific’s Nautilus LIMS not only facilitates data capture and specification management, but it also facilitates chain-of-custody and handling assurance needs that improve operational efficiencies.
One of the requirements of a LIMS platform for biobanking that Tom Kent, president and CEO of LIMS provider Sciformatix, identifies is to maintain a chain of custody indicating who handled the sample, what procedures were carried out and when processing occurred. Sciformatix’s SciLIMS is a software-as-a-service platform providing a flexible solution that allows organisations to tailor the storage configuration setup to their needs. FreezerPro software from US-based Ruro also allows repositories to track samples and retrieve associated metadata. The software supports high-throughput 2D barcode scanners, RFID readers and vial robots.
Bill Greenhalf, operational director of the Liverpool ECMC-GCLP facility at the Royal Liverpool University Hospital, states that the LIMS software installed at the facility has two main purposes: ‘It is there firstly for quality control, but also to leave a trail to show quality control procedures have been followed.’
The Royal Liverpool University Hospital houses three repositories: the Liverpool tissue bank, the CLL (Chronic Lymphocytic Leukaemia) biobank and the ECMC-GCLP (Experimental Cancer Medicine Centre-Good Clinical Laboratory Practice) biobank, the latter of which stores samples from clinical trials. Each of the biorepositories uses Autoscribe’s Matrix Gemini LIMS to manage the data associated with storage of biological samples.
Greenhalf notes that the LIMS enforces the user to leave a trail and that trail defines every step of the process. He says: ‘Other users would potentially install LIMS to manage a high throughput of samples. This is certainly not the case for us. LIMS isn’t used to increase throughput, but rather to increase traceability for audits and quality control.’
There are two bodies in the UK that regulate the processes involved in sample storage: the Human Tissue Authority, which ensures regulations laid down by the Human Tissue Act of 2004 are followed, and the Medicines and Healthcare products Regulatory Agency (MHRA), which ensure clinical trials are conducted properly. ‘Both bodies carry out inspections; the MHRA in particular can visit the site at any time to conduct a longitudinal study of a sample, i.e. determining where the sample was at all points throughout its lifecycle and what was done with it,’ Greenhalf states.
The laboratories at the Roy Castle Lung Cancer Foundation, University of Liverpool Cancer Research Centre.
The Royal Liverpool University Hospital has one of the largest clinical trials units in the UK, which is involved in the administration of a large number of trials. ‘There are two conflicting responsibilities,’ explains Greenhalf. ‘One is to the patient, and inspectors need to know that the participant fully consented to the study, but there is also a need to show that researchers are not influenced one way or another in showing that a particular agent or procedure is working. Therefore, it is very important that the analysts working in the labs have no idea to which participant the sample belongs.’ Biobank administrators try to keep consent forms and evidence for that consent very much separated from the research data, Greenhalf says. Tissue is stored and used by the ECMC-GCLP facility, but the details of the patient are kept in the Liverpool Cancer Trials Unit or on-site in the hospital.
Biorepositories provide a resource for researchers to increase understanding of complex diseases. Studies such as the Lung Genomics Research Consortium (LGRC), a two-year project launched in October 2009, are going a step further than standard biobanking practices and characterising the samples with their molecular makeup. The molecular data can then be mined along with the clinical data.
Led by National Jewish Health and funded by the National Heart, Lung and Blood Institute, a division of the National Institutes of Health (NIH), the LGRC project consists of five institutions, including Dana-Farber Cancer Institute. Collaborators in the project work with samples banked at the Lung Tissue Research Consortium (LTRC), which houses tissue samples and blood from lung disease sufferers, primarily chronic obstructive pulmonary disease (COPD), along with a rich set of clinical data from patients.
The LGRC will take a subset of the LTRC’s samples (around 500-1,000) and characterise their DNA, RNA and methylation profiles. All the sites involved in the project will generate molecular data that will be passed to Dana-Farber. The centre’s role is as a data coordinating facility to combine the clinical data captured by the LTRC with the molecular profiling and make this available within the consortium and the scientific community at large.
‘One of the most valuable resources researchers have is well annotated tissue samples and in cancer research, scientists have been able to develop much more fine-grained characterisations of disease based on molecular profiles,’ remarks Mick Correll, associate director of the Center for Cancer Computational Biology at Dana-Farber. ‘We’re trying to do the same thing for lung diseases.’
IDBS’s ClinicalSense, a web-based clinical cohort selection tool, was used to create an online, searchable data catalogue of all the clinical information in the LTRC to help initially with design of experiments. Using the software, the thousands of clinical attributes were built into an ontology, which was made accessible through a web portal. This allowed collaborators to acquire summary information about the number of participants, the number of samples, and the type of samples that were available based on given criteria.
‘Ultimately, we want to move beyond just capturing information and give clinicians and researchers a way to analyse the information that’s been collected,’ says Correll. ‘The information we’re working with is quite complex, it’s highly dimensional data, and in order to present this to a non-expert audience, visualisation is key.’ IDBS’s VisualSense software, a web-based data visualisation tool, means the data is presented in a meaningful way and allows researchers to find elements they want and make selections between different assay data.
One of the issues surrounding biobanking that John Quackenbush, director of the Center for Cancer Computational Biology at Dana-Farber and one of the principal investigators (PIs) on the LGRC project, identifies is how one collects and manages the clinical data associated with the samples, in terms of informed consent and dealing with ownership of data and samples. ‘There is a tremendous amount of clinical data available to the LGRC project from the LTRC,’ he comments. ‘However, once genomic data is generated, even though these are de-identified samples, there are all sorts of issues surrounding how that data is made available to the community. There are concerns about privacy, because the genotype is a very precise way of identifying an individual.’
One of the solutions Quackenbush proposes to the privacy issue is to send the data to dbGaP, the database of Genotypes and Phenotypes maintained by the National Center for Biotechnology Information (NCBI), which has a structure in place for controlling and monitoring access. However, he states: ‘We believe that we can add significant additional value to the data beyond what dbGAP can provide because of our ability to integrate the expression, variational, and clinical data that the LGRC will be generating.
Elsewhere, the Canary Foundation, a non-profit organisation dedicated to early detection of cancer, in connection with the National Cancer Institute’s Early Detection Research Network (EDRN) has commissioned a translational research informatics platform to provide integrated translational research data (clinical and omics datasets). The cloud-based platform will be built by GenoLogics and NASA JPL and will aim to support research activities from discovery to biomarker validation to clinical results.
Another challenge faced within the LGRC project that Correll identifi es is how to catalogue the large amounts of molecular data generated in such a way as to make it useful, not only to the expert users – the biostatisticians – but to the researchers themselves. Dana-Farber’s contribution to the LGRC is trying to take all the genomics data and put it in an intuitive format for the broader community of users. ‘What’s going to describe whether our efforts in using biobanks to do genomic profiling are successful, is whether or not this has any kind of clinical impact,’ says Quackenbush.
‘There are the ethical, social and legal problems, but there are also some serious technological problems for which we have to find solutions,’ he added.