Computing challenges at ISC: Unveiling the secrets of life
How would you describe your work?
Over the past decades and centuries Biology has studied the components of living systems in isolation. This meant studying individual plants and animals, or studying molecules, genes, proteins and so on over the last couple of decades. An individual protein is not life, nor is a gene, but when they all come together, they form life. Systems biology is the science that seeks explanations of how this happens. This is, of course, a very hard problem, because the interactions are non-linear. We have millions or billions of components that interact in such a system. Interactions are regulated across a wide variation of length scales and time scales from molecular and intramolecular interactions, all the way up to interactions between animals or between ecosystems. And we are dealing with physics that is only partially known.
There you want to leverage the power of computing, on the one hand to simulate these systems on the computer, so using large HPC-type simulations where you take a hypothetical interaction – how you believe such a system works – and reconstitute that in a large computer simulation to show that it produces the behaviour that you see. One of the more famous examples of this is the Human Brain Project, where the goal is to simulate a human brain, or part of a human brain, neuron by neuron, or even ion channel by ion channel, to answer the question of whether the collective interaction of all these neurons is enough to generate intelligence.
How do these challenges differ from classical biology?
Systems biology started out around 2000 by systematic experimentation. A lot of robotic experimentation, robotised microscopes and high-throughput experiments with pipetting robots and liquid-handling robots that were able to do thousands of experiments in an automated and very systematic fashion – producing enormous datasets.
The second point now, where computing becomes important, is in analysing these datasets and extracting potential rules and laws, even novel laws of physics. There is a lot of new physics to be learned in living systems from those measurements.
It doesn’t matter if we are looking at the embryo development of the fruit fly, or liver regeneration in humans, the tools that we need to explain emerging behaviour from interacting parts are always the same.
The second thing is the advancement on computing power. We have nowadays computing platforms that are powerful enough to actually do these simulations and to deal with this amount of data.
Third is the increasing realisation that – in terms of the education of students – one discipline alone will never be enough to explain it. You need computer science but you also need physics, biology and mathematics; chemistry to some extent. Our curricular have become much more multidisciplinary. Even 10 years ago it was entirely uncommon for a biology student to know computer programming. This has completely changed. Now, basically every biology student is taught computer programming.
Now, in most computer science curricular, you can find minors in life-science topics. That started with bioinformatics, but it is now also computational biology and medical computer science. Now that people can talk to each other and collaborate, we can see progress, because really explaining life is one of the most complicated phenomena that we have ever attempted to study.
What do you hope to achieve?
In my group [at TU Dresden] we are doing a lot of research into abstraction layers for HPC, programming languages and run-time frameworks that reduce development times to the range of days to weeks. We want to allow people with a little bit of programming background, but no parallel or HPC background, to use the systems at peak performance rates that are comparable to the best handwritten codes.
My group is primarily a computer science research group, so our core expertise is developing such frameworks and technology that enables the more biological scientist to benefit from not only HPC, but another area where we put these frameworks is machine learning and AI, and also virtual reality.
We would like to push HPC and the software frameworks even further, we would like to have an artificial intelligence that – from the data – can extract mathematical models, partial differential equation models that describe the processes that have produced the data.
Your team developed OpenFPM, what is it used for?
Open FPM is a simulation framework that is relatively new. There is a publication about it just now, but it is based on about 20 years of experience we had from a project called the Parallel Particle Mesh (PPM) library. The PPM library was an HPC framework for computer simulation codes, based on particles, meshes or a combination of the two.
It has been used for fluid dynamics and aerodynamics and astrophysics. The idea behind it was to implement a distributed data structures, and operate on these data structures, that are sufficient to express the business logic of any computer simulation within the framework of particle mesh methods.
There are a set of about 12 data structures and operators that you need to implement. Once you have implemented them in a transparently distributed way, the user doesn’t need to know they are distributed. The user can just allocate a set of 10 million particles and its unimportant to know which particle is allocated on which node of the machine, or how the particles are distributed in a cluster.
You can also have particles interact with one another according to a certain interaction function which I provide. Again, you would not need to know what network communication is required, or how the results are aggregated across the computing system – this could all be taken care of by the library automatically.
And so the PPM library has achieved a number of world records. It was used by a team that won the Gordon Bell prize, not us, but it is an open source project and people have used it for their own research.
With OpenFPM we are able to run simulation in arbitrary dimensional spaces, and before it was limited to 2D and 3D. With Open FPM we can have arbitrary data types, so the user can define any C++ object or C++ class and then objects of that class will transparently work with OpenFPM, without the library being extended in any way.
Then, on top of OpenFPM, with Professor Jeronimo Castrillon, the chair of programming languages and compiler construction at TU Dresden, we are developing a domain-specific programming language and an optimising compiler.
At the end of the day, you have something that looks like Matlab or Python in terms of syntax, in which the biologist or the computational scientist can implement the simulation logic and then the compiler produces all the OpenFPM code and the OpenFPM library does all the data handling, parallelism and distribution. It works with CPUs, clusters of GPUs or mixed environments. You don’t have to worry about it. Porting a code now from a CPU version to GPU version is a matter of minutes, because we just have to compile it for a GPU target, everything else is automatic.