How connecting data may lead to discoveries in medical research

Share this on social media:

Dr Alexander Jarasch and Professor Martin Hrabe de Angelis explain that novel research methods produce tremendous amounts of data that cannot be analysed with classic analysis tools – so scientists need to look for new approaches, such as graph technology

With its ‘Grand Challenge’, the UK has set a target of using data, AI and ‘innovation’ to transform the prevention, early diagnosis and treatment of diseases like cancer, diabetes, heart disease and dementia to prevent a potential 25,000 deaths a year.

Similar ambitions exist in many other countries – this is also a major question for all researchers worldwide at the moment, including in Germany. If we are serious about dealing with the challenges they represent for patients and society and healthcare systems as a whole, we need to study these diseases in much more depth, in order to provide novel methods for prevention and treatment of diabetes.

I believe these new technologies will be crucial in gaining new insights into the workings and causes of these chronic conditions and diseases. The problem faced by everyone trying to do this is that the analysis methods we have been relying on may have reached their limits due to the vast amount of data produced by novel research methods (e.g. omics). The really promising avenue is to use big data levels of data, so as to combine and better connect data.

Integrate and link together more and more data points

That’s complicated by the fact that nowadays, research – especially in Life Sciences – is not limited to one technology or one discipline.

The German Centre For Diabetes Research, where we work, is a multi-centre organisation that combines all the different data that originates from different studies, reports, surveys and research projects from different locations in Germany. So we have masses of data from clinical trials and patient information, and our data covers various disciplines, from studies on molecular level to pathway analyses and animal models.

To answer the interesting and suggestive biomedical questions about diabetes, we have to connect this data and look for new insights, patterns and correlations. That’s because we realise it is no longer enough to answer a biological or medical question from one direction, we need to integrate and link more and more data.

This is the next step, not just in biomedicine, but also in the healthcare sector, which is increasingly turning from general blockbuster drugs to individualised treatment or precision medicine. For this to progress, it is necessary to network significantly more and, above all, look at as many aspects of the problem as we can. This is why the DZD and other researchers think graph databases – the technology that powered the Paradise Papers investigation – could help in the prevention, discovery of new subtypes, early diagnosis and treatment of major illnesses.

It’s important to know that we aren’t just using Excel or standard business relational (table) databases any more – we add a whole new layer here with graph databases. The standard technology we use in each of our research locations in Germany is a relational database, as well as spreadsheets and documents files. But once we realised more and more of that data is connected, we started looking for a solution to bring our data in relation to each other, and create an overall context for our research.

Relational databases have their merit. However, we needed something to bring these data silos together and uncover connections – to be able to jump from one data point to another is crucial for us. That’s why we turned to graph technology.

Diabetes is a metabolic disease, but it’s not sufficient for researchers to only look through metabolic data. They also have to take into account data of other disciplines, such as genomics or proteomics. In the human body, everything is connected in metabolic pathways; a gene encodes a protein that is active in a metabolic pathway and metabolises a metabolite, which in turn is able to regulate another gene. In a way, our metabolism is a network of thousands of components that are connected with each other, which is a graph data model.

Link diabetes research with Alzheimer’s

That’s why it’s so important to be able to uncover these connections and to create a new layer of analysis on top of this data, so we use technology from the graph database world called Neo4j. The great thing about Neo4j is that it has a visual interface we can use for queries and experimentation. We are using it to deepen our ‘map’ of diabetes – to uncover hidden relationships and pursue the resulting new questions.

For example, we do a lot of important research on animal models to study processes and then compare them to humans, so there is a lot of animal data from mice and pigs. This can generate a hypothesis we want to pursue – for example, ‘In the pig model is the prediabetes type X due to causes A and B?’ Is this regulated similarly? Are there similar processes?

We think we can link the molecular human data from the basic research with the highly standardised animal model data. In a graph representation, abnormalities, patterns or connections can then be recognised, which will then lead to further research questions. In the long term, it would also be interesting if data from diabetes research could also be used in other areas, such as cancer or Alzheimer’s research, to uncover possible connections.

Graph isn’t the only advanced technology we see as being useful. For example, we will definitely use machine learning techniques with graph software to identify unknown patterns, for example to try to identify (new) subtypes of diabetes we find discussed in the literature. Another example is Natural Language Processing. We’d like to build a system that automatically reads scientific texts from literature databases, analyses them and together with our research data generates hypotheses that can be evaluated by DZD scientists. Also conceivable: predictive models that can prescribe the course of the disease to a certain degree of probability.

This is all coming, and we are certain that our data management and analysis approach will take us to the next level in precision medicine, prevention and treatment of diabetes. In general, technology and data absolutely have a central role to meeting the Grand Challenge the UK wants to take on.

Dr Alexander Jarasch is head of data and knowledge management at Munich’s head-office of the German Centre for Diabetes Research, the DZD.

Professor Martin Hrabe de Angelis is speaker and member of the DZD board

Exclude from view: