Greg Blackman looks at some of the data-management challenges facing biobanks and how these repositories of material are being used to advance life-sciences research
The concept of personalised medicine, of tailoring medical care to an individual’s needs, is generally considered to be the goal for how we practise medicine. Translational research, a so-called ‘bench-to-bedside’ approach to life-sciences research, where what is discovered in the laboratory is translated into practical applications (in the case of medicine, to a clinical level), is enabling the shift to personalised medicine. Genomic technologies are also playing a vital role in determining how patients are treated. Individuals could potentially be prescribed drugs based on their genetic profile in combination with traditional clinical signs and symptoms. Oncology already contains a degree of personalised medicine, as samples from an individual’s tumour will be tested for biomarkers (indicators of a biological state) associated with a particular treatment option.
‘The ultimate aim in patient care is personalised medicine, but we’re not quite there yet so we have to deal with populations of people,’ comments Dr Rick Maguire, director of business development – LIMS East at Ocimum Biosolutions. Storage facilities, known as biobanks, that house patient samples, are an important source of material for life-sciences research as it is only through these large repositories that the number of samples required for a meaningful study can be obtained. ‘Acquiring meaningful results for genome-wide association studies requires at least 2,000 patient samples and double the number of controls,’ states Cecilia Kim, laboratory manager at the Children’s Hospital of Philadelphia (CHOP). The Center for Applied Genomics, a research centre within CHOP, operates a biobank managed by a customised laboratory information management system (LIMS) and to date has collected and stored more than 50,000 blood samples. The centre has been operating for approximately two and a half years and aims to improve the understanding of the genetic causes of common childhood diseases, including ADHD, asthma, autism, obesity, diabetes and cancer. ‘The repository ensures there are sufficient specimens to allow scientists to conduct valid tests without having to return to the patient for further blood samples,’ says Kim.
Cheryl Michels, president at Dataworks Development, a provider of data and sample management software solutions to research and clinical laboratories, defines a biobank as comprising of two components: a store of biological material and the supporting information that accompanies the samples. ‘The stored material is only as good as the information that backs it up,’ she says.
Pharmaceutical, biotech and medical are the main users of biobanks, but disciplines ranging from forensic science to natural history museums will have biorepositories. Larger institutes will have in-house biobanks, but not every small biotechnology company will be able to afford them, so samples are often outsourced to and requested from contract research organisations (CROs). There are also networks of companies associated with a particular area, such as Alzheimer’s, which have dedicated biobanks.
Maguire explains that in a pharmaceutical or biotechnology environment, biobanks are used to store specimen data in support of clinical trials. A pharmaceutical company will use a biobank in concert with various genomic techniques to identify and validate biomarkers for a population of patients. In a medical research institute, while the clinical trial context may also apply, the primary focus of a biobank is to serve Principal Investigators (PIs), which could be clinicians or PhD researchers. Requests for tissue must be in accordance with the PIs’ Institutional Research Board protocol; that is, does the PI have rights to see patient health information?
‘Pharmaceutical companies depend on biobanks for research, largely through clinical trials, but there is also an historical element to storing material over time,’ notes Michels. As viable treatment methods are developed, or as new strains of the disease are isolated, it allows researchers to go back and test the original samples laid down when the collection was created. Work carried out on the AIDS virus 10-20 years ago, for example, in which circumstances changed rapidly as the virus mutated, can be compared against samples drawn today.
‘At its simplest level, software employed at biobanks will show the location of samples,’ explains Maguire. ‘Most organisations, however, want to go beyond that and be able to search within the meta [supporting] data, because it’s that annotated information that gives intrinsic value to the samples.’ Such supporting data includes information from patient records, such as smoker/non-smoker, disease diagnosis and stage, family genetics, as well as biological assays carried out on the samples, location of storage, whether the material has been thawed and refrozen, whether aliquots have been made and where they are stored, and so on. A robust query tool is then required to allow researchers to interrogate the system based on the supporting information.
Kristian Spreckley, product manager at UK-based RTS Life Science, notes that the ability to track the sample is vital, as is ensuring the data attributed to a sample is kept up to date. RTS provides automated sample management systems used, for example, for DNA storage, where samples are tracked by the RTS SIS (Store Inventory System). ‘Samples and their identity are incredibly valuable and 100 per cent assurance of their integrity needs to be known,’ he says. The UK Biobank, for instance, costs in the region of £60m, and if samples were mixed up or lost, it would reduce the trust placed in the institute.
‘Tracking the sample is only one small part of maintaining an effective biobank,’ comments Pete Tryner, senior implementation consultant at global LIMS supplier LabWare. The repository acts as a library of samples that researchers must be able to search in order to source material. Many biobank repositories are implementing structured consent as part of LIMS to improve the ability to search within the system. ‘Consent to use samples in research is typically in the form of a textual document, which, on its own, is difficult to search within,’ he says. ‘Within the LIMS this consent can be interpreted into a set of structured fields, which may then be included in a search.’
Ocimum’s Maguire also makes the point that querying a biobank can be extremely difficult if free text is used. Most biobanks will use a controlled vocabulary (CV) or ontology such as SNOMED, WHO or caBIG’s Thesaurus, because all associated specimens can be returned based on a query, not just a subset, he says.
Inbiomed, a non-profit organisation based in San Sebastián, Spain, that maintains an adult stem cell and primary cell bank, is using LIMS software from LabWare. The biobank, called Inbiobank, is capable of storing 10,000 vials and has been running for four years. Scientists here carry out preclinical research work for pharmaceutical use of stem cells.
LIMS is used to track samples and to handle the information surrounding the material, as well as relevant clinical information from the donor, the final destination of the vial and the projects it is used for. Angel Garcia Martin, director of Inbiobank at Inbiomed, notes: ‘Being able to trace the history of samples and to retrieve information associated with the vial becomes crucial if problems arise with the supplied cells, in that they aren’t viable for instance. Then a full history of the material, from the origin of the sample through processing and storage, is required to help determine a root cause of the problem. In addition, the information is required for audits of the repository.’
HUNT, one of the largest population-based health studies ever, maintains a personal database of approximately 100,000 people from Nord-Trøndelag County, Norway. This includes a repository of blood and urine samples. Image courtesy of Thermo Fisher Scientific.
The H. Lee Moffitt Cancer Center and Research Institute, an organisation working towards the prevention and treatment of cancer, has established a biobank as part of its Total Cancer Care (TCC) project, which aims to obtain a genetic fingerprint of different tumour types. LIMS is used to manage the data surrounding the storage of tissue samples, from collection to the final research project. Patient data is recorded, as well as details about the tumour – when it was removed, size, weight, number of slides cut – and storage location. ‘When setting up the biobank it became apparent that it would be too difficult to keep track of samples using Excel and that a more comprehensive sample tracking system was needed,’ explains Robert Sprinkle, director research IT at the Moffitt Institute. ‘LIMS software allows researchers to track where the sample is, who held it last, and tests carried out, among other criteria.’
The biobank is designed to hold samples from up to 100,000 patients. Each patient typically gives three vials of blood and, in some instances, a tumour can be sectioned into 10-15 tissue samples. Sprinkle says that 100,000 patients translate into 400,000-500,000 samples and once those are expanded further into individual aliquots, the number of vials stored would reach into the millions.
The TCC biobank has been in existence for two years and holds approximately 26,000 samples, but with the potential to store half a million samples, LIMS software needed to be scalable. The Moffitt Institute is currently using LabVantage’s Sapphire LIMS suite, which includes a biobanking module, to manage the storage of material and its supporting information.
‘The system had to be flexible and had to cope with adding data elements as they were required,’ notes Sprinkle. ‘There are several hundred fields in which data can be entered and a unique system was required to categorise the data.
‘Managing the data is the biggest challenge of this project,’ he continues. ‘Collecting the tissue has never been particularly problematic and most patients will consent to the samples being used in research. Simply possessing the tissue, though, has limited value and it’s the supporting data that is important. Sapphire LIMS puts all the data in one place, making it easy to access and search.’
A biobank facility is an integral part of one of the largest population-based health studies ever performed. The Nord-Trøndelag Health Study (HUNT), which spans almost 25 years and represents an integrated family and personal database of approximately 100,000 people from Nord-Trøndelag County, Norway, maintains a biorepository of material from participants. The latest study, HUNT 3, took place from October 2006 until June 2008, with 110,000 individuals invited to take part. Data was collected by means of questionnaires, clinical examinations and collection of blood and urine samples. Thermo Fisher Scientific’s Nautilus LIMS was used to gather, store, manage, track and retrieve material securely, and to yield realtime, dependable analysis and reports.
Prior to the LIMS implementation, the biobank database consisted of different types of files, such as Excel spreadsheets, CSV (comma separated value), and texts, among others. According to Thor Gunnar Steinsli, LIMS manager of HUNT Research Centre and biobank, by deploying Nautilus in HUNT 3, the HUNT biobank immediately gained traceability previously unavailable.
Doug Holbrook, product manager at Thermo Fisher Scientific, comments: ‘Biobanks often grow out of research centres that use manual laboratory notebooks for tracking scientific information.’ This is one of the challenges faced by biobank institutes – the transition from paper-based record-keeping to electronic methods and updating those records. In the future, Holbrook feels that biobanking will become more virtualised. A typical scenario might entail a central data repository hosted by a research centre with the physical samples spread over a number of smaller biobanks in different geographical locations. ‘This requires a data management system that is easily accessed and easy to use without weeks of training for the biobank staff, while being robust enough to handle multiple sample types and their associated data,’ he says.
One virtual biorepository is caBIG (cancer Biomedical Informatics Grid), an information network sponsored by the National Cancer Institute (NCI), which enables members of the cancer community to share data. It is made up of more than 50 cancer centres, various NCI research endeavours and 30 not-for-profit federal, academic, and industry organisations. Among its workspaces, (virtual communities that develop technologies in specific areas), is the Tissue Banks and Pathology Tools workspace, which is concerned with developing tools to inventory, track, mine and visualise biospecimens from repositories.
‘The challenge for setting up these large consortiums is that technology changes so rapidly that it may be difficult for projects to keep pace,’ says Michels of Dataworks Development. ‘The overarching aim is for researchers to share the data with the wider scientific community. Current technology allows this to happen, but it is not perfected yet.’
Maguire from Ocimum believes, ideally, the parent sample should be traced as it’s distributed to the principal investigator and then sent on for further analysis, and there should be the ability to track genealogy of all downstream samples back to the parent specimen. ‘That is what translational research and medicine needs in order to track specimens from bedside to research and to associate analysis back to bedside for therapeutic input or for dosage response parameters. Without an ability to track the lifecycle of a specimen with all downstream sample genealogy and analysis, there can be no closure of the loop back to the patient.
‘Biobanking is already evolving from a simple location-management focus to one that supports a much wider and often institution-wide imperative, such as translational research. The biobank, in the context of translational research, is really the lynchpin for the whole process. If one fails to “get it right” at the biobank level, with all requisite meta [supporting] data association and interfacing, then the promise of what translational research offers cannot be met. So, in medical research, the biobank is a centrally supported initiative, even if the bank actually lives in a virtual or physically disparate network, as it is critical to delivering not only specimens, but the associated meta data.’