MODELLING/ENGINEERING: SYSTEMS BIOLOGY
Paul Schreier examines progress towards accurate models built on the systems biology approach
Scientists have made remarkable progress learning about the tiniest components of living things. But the more we learn about individual processes, the more we understand that we need a new approach – a systems approach to biology. Here scientists seek to explain biological phenomena not on a gene-by-gene basis, but rather through the interactions of all the components in a cell, organism, or organ. Today’s computational tools and platforms are joining hands with improved experimental techniques to enable us to make progress towards the goal of having computer models of complex biological systems, including the human body, so we can get more in-depth knowledge that will improve our lives.
The shortcomings of genetic determinism
For several decades, researchers worked under the assumption of genetic determinism, whereby they felt with knowledge of all genes they could describe all biological processes, for instance to identify the specific gene responsible for a specific trait or disease. This was the motivation behind the Genome Project, the biggest project ever undertaken in biology.
To see the shortcomings of genetic determinism, consider some points from Ruedi Aebersold, chairman of the Institute of Molecular Systems Biology at the ETH Zurich and a member of the Scientific Executive Board (SEB) for SystemsX, a Swiss-government funded collaboration with the goal of establishing a world-leading initiative in quantitative systems biology. Following its logic, if you compare the complexity of man to a simple creature such as a roundworm, wouldn’t it be natural to assume that we have many, many more genes? To the contrary, we have roughly the same number: a roundworm has approximately 19,500 genes, and the human body has in the range of 20,000 to 25,000. Clearly, something else beyond the raw number of genes must account for our complexity.
Professor Aebersold next points out another factor that led scientists to believe they needed a new approach. He refers to what the company Burrill and Company has named the ‘pharma innovation gap’. The amount of money spent on pharmaceutical R&D has risen steadily whereas since 1999 the number of new drugs approved has been dropping (see Figure 1).
Figure 1: Despite steadily increased spending, the number of new drugs is flat at best. Could pharma researchers be looking at the wrong things?
Why? Systems biologists think the answer is that drug R&D has focused on the molecular level without taking into account systems effects.
Basic elements create new functions
Indeed, the core thesis of systems biologists is that sophistication arises through the complex interactions among the basic units of life. As is often the case, the whole is much greater than the sum of its parts.
Consider an analogy from electronics. When you wire two well-known components – a capacitor and an inductor – in parallel, you get something very different from either one, and something extremely useful: a tank circuit with a specific frequency of oscillation. Meanwhile, we have models at all levels of electronics, from the level of control systems, where the way poles and zeros are implemented is unimportant, to circuit-level simulations with programs such as SPICE, and today we have models that study the movement of atoms moving between tiny transistors.
We can also integrate the results of one type of model into another, such as to incorporate a multiphysics model of a component into a SPICE model to account for changes in its nonlinear characteristics as it heats up.
The ultimate goal of systems biology is to create analogous mathematical models of biological systems at all levels. ‘We’re making models that integrate experimental data, and we use these models to predict and explain results, where the explanation part is perhaps the most important,’ says Pedro Mendes, chair in computational systems biology at the University of Manchester Centre for Integrative Systems Biology and also a principal investigator at the Virginia Bioinformatics Institute. Today, though, we’re far from reaching the final goal.
Joerg Stelling of the ETH Zurich, whose group at the Institute of Computational Science focuses on biological modeling, says that today even the some of the simplest biological elements are poorly characterised, so finding a model that accounts for their complex interactions is that much more difficult.
An interesting perspective comes from Vassily Hatzimanikatis, head of the Laboratory of Computational Systems Biotechnology at the Swiss Federal Institute of Technology in Lausanne and also a member of the SEB for SystemsX.
He explains that progress arises when experimentalists and analytical people work together. The jagged path in Figure 2 indicates there comes an advance in technology for measuring experimental data; then new algorithms, codes and morepowerful computers become available and lead to a jump in knowledge; and meanwhile new measurement techniques are developed – and the cycle continues.
Figure 2: Reaching the goal of predictive, quantitative biological models will take both new analytical and measurement techniques along with new algorithms, codes and more computing power (diagram courtesy V. Hatzimanikatis).
One thing holding us back, adds Hatzimanikatis, is our inability to access information at all levels of biological activity. In some cases we’re dealing with such small levels of a given chemical or a reaction that it’s almost impossible to quantify everything; there might also be interactions that we yet don’t know exist.
Another challenge is that these are multiscale problems with organisms from the size of a gene to an organ and in timeframes from microseconds to years. It’s impossible to build a model that covers every level of detail at every possible instant, so a key job for systems biologists is determining where to focus their efforts both in experiments and in modelling. He states: ‘Industry would love to have complete predictive capabilities but realise they’ll never get a 100 per cent solution. Our goal with modelling is to give them good guidance in a reasonable amount of time.’
Countless projects at all levels
There are hundreds, if not thousands, of research projects focusing on various areas of physiological modelling, and this article can touch upon only a few. But these examples should provide an idea of the type of work going on and the need for a wide range of numerical tools, algorithms and computing power.
Furthermore, projects come and go. For instance, BioSPICE is no longer an active funded project states Adam Arkin, former project director and now director of the Virtual Institute of Microbial Stress and Survival based at the Lawrence Berkeley National Laboratory. He adds, though, there are many other toolsets – most focusing on simulation as opposed to data/model interaction – including GEPASI, Cell Designer, ECELL, Virtual Cell, SBW, Systems Biology Workbench, and XPP to name a few.
Most important, he explains, are the emerging standards for biological model representations. Chief among them are CellML, being developed by the Auckland Bioengineering Institute at the University of Auckland, and the internationally supported Systems Biology Markup Language (SBML). CellML describes the structure and underlying mathematics of cellular models in a very general way and has facilities for describing any associated metadata; SBML is aimed at exchanging information about pathway and reaction models among applications. Both have three main purposes: to enable the use of multiple software tools without rewriting models; enabling researchers to share and publish models in a form others can use in a different software environment; and to ensure the survival of models beyond the lifetime of the software used to create them.
Meanwhile, the number of numerical techniques in these models is likewise expanding. These start with ODEs (ordinary differential equations) that simulate chemical reaction networks, and go on to linear programming for optimisation and parameter estimation, and to randomnumber generators and stochastic simulations. Not only are these simulations becoming extremely large, involving millions of equations, but these systems are very unintuitive being nonlinear with many feedback loops and they are almost impossible for a human to understand merely by looking at a flowchart or gene diagram.
Linear programming, for instance, plays a large role in the work of Professor Andreas Wagner of the Dept of Biochemistry at the University of Zurich. His group operates in the belief that any efforts to understand biological systems cannot be successful unless we understand evolution’s role. For instance, he looks at specific metabolic nets such as those that process nutrients. How many nets could do the job? One approach is to look at similarly functioning networks that have done the same job in various organisms, looking for certain chemical reactions that haven’t changed in more than half a billion years.
Another approach is to use computational methods by generating networks to see if they could do the job. In nature, these networks change during evolution because mutations can destroy enzyme reactions so they drop out of the network, or new networks can arise due to horizontal gene transfer. He has also found that nets can share few reactions but can still produce the same chemical compounds. With computations Wagner wants to explore all these network architectures, looking at the design space of all possible networks. For this he has used the linear-programming software CPLEX from ILOG, but some experiments run in clusters from 60 to 2,000 CPUs, and purchasing a license for each one gets very expensive. Thus, his team also uses the LP code from the opensource GNU library which is only 50 per cent as fast, but is free.
In other cases, such as when studying the robustness of systems, you work with concentrations where there are just a few thousand, or fewer, molecules, which are almost impossible to measure exactly. But each molecule is important, so such reactions can’t be studied as a deterministic system and it is necessary to work with stochastic effects, looking for probabilistic averages rather than absolute numbers.
Many researchers use the Gillespie algorithm developed 40 years ago, but it is still effective in modelling chemical reactions of a small number of molecules. For further statistical analysis, Wagner’s group often works with ‘R’, an open-source version of S-Plus.
Matlab and systems biology
One commercial tool often applied to systems biology is Matlab, which itself includes a large number of numericalanalysis tools. The MathWorks also offers SimBiology (Figure 3), an add-on that allows users to create models containing compartments, reactions, species, parameters, events, rules and units using either a block-oriented user interface, the Matlab command line, or read in SBML models.
Figure 3: On the left of this SimBiology diagram is a 2-compartment pharmacokinetic model of a central compartment (the liver), and on the right a tissue compartment, into which is incorporated a portion of an apoptosis pathway model.
Users can then simulate the model using either Matlab’s stiff and nonstiff deterministic ODE solvers or SimBiology’s additional stochastic solvers. Further, parameter estimation in SimBiology can take advantage of advanced optimization algorithms from the Optimization Toolbox or Genetic Algorithms Toolbox. Also, scientists whose models require long simulation times can use the Parallel Computing Toolbox to execute simulations in parallel in a cluster-computing environment.
Those who prefer open-source software, in this case primarily a library of Matlab scripts, might prefer the Systems Biology Toolbox for Matlab. Developed by a group of Swedish researchers at the Chalmers Research Centre for Industrial Mathematics, it has some overlap with SimBiology, but not the extensive user interface. A similar program is PottersWheel from the Freiburg Centre of Data Analysis and Modelling, supported by the German HepatoSys initiative.
Consisting of more than 100,000 lines of Matlab and C code, it is a multi-experiment fitting toolbox. It helps to create ODE-based models and can fit a model to several datasets at once.
Metabolic pathways or networks are sequences of chemical reactions, catalysed by proteins, to transform compounds and in this way transfer information; the small molecules being transformed correspond to voltages in an electronic circuit. One tool for studying them is COPASI (Complex Pathway Simulator) an opensource program based on work done by Professor Mendes of the University of Manchester. The code is based heavily on ODEs, but the development team has added stochastic capabilities; they have also added a function that in many cases can successfully convert a pathway model from a deterministic form to a stochastic form and vice versa. They are also working to parallelise the code so it can handle larger problems more quickly.
High-performance computing is something the Edinburgh Parallel Computing Centre provides as support for research at the university’s Centre for System Biology. One project involves trying to identify at least one gene responsible for the circadian rhythm in plants. When they started developing algorithms to study this large global optimisation problem, people could investigate millions of different parameter sets on a desktop PC, but the EPCC meanwhile provides access to the first IBM eServer Blue Gene system in Europe, delivering around 4.7 TFLOPS of performance; and by parallelising the problem, they can now hope to handle problems 100 times larger in the same amount of time.
One of the most advanced and successful models, one actually used for virtual testing of new drugs, is the ‘virtual heart’ developed by an international group of researchers with a base at the University of Auckland (Figure 4). The cell model it uses was developed by Professor Denis Noble from the Department of Physiology, Anatomy and Genetics at Oxford University. He relates that he wrote his earliest programs for cell models in machine code for tube-based computers, moving to Algol and later to Pascal for DOS and Windows. These have been replaced by COR (Cellular Open Resource), a Windows environment for cellular modelling built around CellML. It offers ‘out of the box’ access to a large database of single-cell models. The programs for multicellular and whole-organ work were first developed with Raimond Winslow at Minnesota (using Fortran on a Thinking Machines 64,000-processor machine) and now at Johns Hopkins University, then with Peter Hunter at the University of Auckland using CMISS (Continuum Mechanics, Image analysis, Signal processing and System Identification). Professor Noble adds that the group is currently working with Fujitsu to prepare codes for the 10-petaflop computer being constructed in Japan. Even using current computer power, this model is so refined that it is being put to use in drug tests.
Figure 4: The virtual heart has developed to the point that it is being used for drug testing. (Image courtesy Nic Smith, Oxford University)
Systems biology has seen enough progress that commercial companies are being developed around it. For instance, Entelos Inc offers a ‘virtual patient’, which was developed to encompass the variations required to study a broad range of patients. A modelling approach and technology platform known as PhysioLab allows companies to screen genes of unknown function for their potential as drug targets, determine the impact that specific genes would have on a diseased state, leverage knowledge from failed compounds to optimise the next generation, and understand why a treatment that worked in animals would fail in humans prior to clinical trials.
Genomatica’s Integrated Metabolic Engineering Platform encompasses the use of detailed metabolic models and simulation algorithms for the optimisation of high-performance biofactories. The key component is SimPheny, an application that enables the development of predictive computer models of organisms from bacteria to humans. SimPheny can build virtual cells from their basic molecular components and can simulate the activity of the cell’s complete reaction network.
The software platform from Gene Network Sciences Inc (Cambridge, MA) consists of three modules called the VisualHeart (model editor, visualisation), DigitalHeart (tissue simulation on a cluster of CPUs) and DigitalCell (cardiomycocyte simulation).