Improving reproducibility in science

Share this on social media:

SciNote CEO Dr Klemen Zupancic discusses the challenge of improving reproducibility in science using ELN technology

SciNote recently updated its brand and software, what was the reasoning behind this?

When we started in 2016, when we launched the product, it was a small percentage of people, 7 to 8 per cent of our target audience was using ELNs. Those people were really excited about new things, they wanted to spearhead and experiment and be a little bit more radical with their approach to science.

I am not talking about big pharma because they have been on ELN for some time, but I am talking about academia and small to mid-size companies that have been doing research traditionally.

It was very nice for those people to see that we are a new company with different ways of thinking about how to accomplish scientific research. This approach served us well, but in that time the market has grown in terms of expectations and we have as well.

Whereas now people are tasking this undertaking of ‘going digital’ very seriously. They want to know where the data is going to be stored and under what conditions so it’s not just a focus on functionality and being cutting-edge, it is more along the lines of ‘is this a sustainable way of managing my data – does the company behind this have the experience and knowledge? Is their software sophisticated enough and secure enough to be up to the task?

In the beginning, we were focused on feature development but now we are focused on usability, security, the safety of the data – so we thought it was appropriate to update the look and feel of our brand to reflect the maturity of the product and the overall market.

What does this mean for your customers?

We see gradual shifts in the questions that we are being asked and the demands that we have from our customers. It is a natural way of going from something that is very good for early adopters to something that is suitable for the commodity market.

Two to three years ago the aim of our customers was to try something new. They did not mind implementing something and then moving away from this after a couple of months. They knew that they were going into this process experimentally. It is not a 180-degree change but gradually we see this shift in the market to the point that now it is about moving to a digital platform. We want to go digital because we see the benefits from all other parts of our lives but we are not sure how to start this, we want to do a lot of research beforehand and then make a decision, not only for personal use but for lab teams or entire university science departments.

We see requirements from our customers, not only on having a product, having a tool and then inventing a way to use it but also providing a full service that starts with planning out a strategy and then finally on-boarding everybody and training them to use the software for their specific use cases.

How can your software help increase reproducibility?

Reproducibility is one of the key reasons that we go to work every day. Just from personal experience and how I feel about science, so much work is being done and so little of this work can be redone and properly re-purposed for us to achieve a broader understanding.

It is changing for the better, there are lots of initiatives now and this is not an issue that just affects ELNs but I hope that ELNs can play a big role in increasing reproducibility.

First of all having data stored digitally increases reproducibility because it is going to be easy to find. Secondly storing data systematically further increases this because there is a system so even if I do not explain how I stored something you will have a general idea of the system that I used so it will be much easier for you to find, not only the relevant data but also the context or metadata.

Going forward from this idea if reproducibility is reliant on metadata because this provides a very detailed description of under what conditions I received a certain result. The problem with this is that if I am doing some research and following a set of protocols I have no idea what metadata will be interesting for someone that will try to build upon my research later on.

There is data that has been recorded but not published, so you will never know about the context and you will not be able to reproduce the experiment. In this context, metadata holds a lot of potential but this is a very complex challenge – if you measure blood cholesterol, for example. Whether it comes from a fasting person or not would be contained in the metadata. I would argue that all of that is data but the contextual links between these data points is the metadata.

This is the area that I see ELN technologies providing a crucial role, as they can help to create links between different pieces and that is a core functionality of ELN.

The lab notebooks are places where you make those links but in theory, an ELN could help reproducibility by allowing future scientists to access data on protocols or instrument parameters that are not published in a scientific paper.

This is what drives the development of our product in the future. I would like to use the example of my car. I have a very old car that I drive because I do not need to go very far, I typically bike to work and so on. But I have this very old car, it is comfortable, I am used to it and so on but if I order an Uber and they pick me up in that type of car I would give them a very poor rating. What I am trying to say is that in science it is the same person asking the question who is also providing the answer. It is really this bias that the single person has. It is the same person that asks the question who then goes to the lab and finds the answer.

This affects the quality of the data because we are recording it for ourselves primarily. What we are trying to do in SciNote is to decouple the interface. The part where you ask the questions, organising samples and so on when you are looking for the right protocols should be separated from the part where you are doing the experiments and trying to figure out the answer and explain how something works.

We are trying to create user interfaces that are suited for the two different mindsets that scientists use be it the same person or lab manager or professor that is asking the question and the lab technician that are doing the work.

We want to create an interface for them to have efficient communication and automate as much of the linking of data as possible. We want to automate the recording of metadata as much as possible.

How do you implement this in software?

This is the questions that we ask ourselves every day. I do not have a final answer but it drives us philosophically towards where we want to go. Standardisation is a key part, we have gone towards standardisation but we quickly come to a trap where we want to standardise data formats and this is something that is very different from what we are talking about now.

Really the only standard that all scientists agree on is that if you want to communicate a scientific discovery you need to write an introduction, material and methods, results, discussion and literature. That is fine because it encapsulates science very well. Understanding this structure and working our from this structure so we can standardise the way that questions are asked and defined so we can then measure quality and notify users about what is happening in the lab as results are being produced.

I do not think there is a simple answer to this question, it relies on a lot of detail and we need some time to pass for a lot of users to give feedback.

We are not there yet but this is the driving force behind where we want to go.

Other tags: