Scientific Computing World gathered a panel of leading experts to discuss the impact that effective data management can have on driving research and laboratory efficiency, as well as enabling future capabilities. Our panellists suggested five crucial priorities that they feel should be top of the agenda for vendors of data instrumentation and associated data platforms.
Read the full report from our panel here |
As data volumes increase and discovery processes become more complex, it is inevitable that there is no single platform that could possibly address every data challenge – at least not without heavy investment in customisation. So, what should vendors consider?
1. Vendor lock-in
“Vendor lock-in is a big issue – as in instrument vendors insisting you use their proprietary data formats,” says Birthe Nielsen, a consultant with the Pistoia Alliance - a global, not-for-profit alliance of life science companies, vendors, publishers, and academic groups. “I think there will be a bigger push going forward with standard RFPs demanding that suppliers use open standard APIs to ensure interoperability.
“Cloud storage is also important. Locally stored data has all sorts of associated issues, from inaccessibility to cybersecurity risks – we’ve all unfortunately heard about the disruption caused by cybersecurity attacks on their locally stored data.
2. Data ownership
“Data ownership also needs to be clearer," adds Nielsen. "Do you own all of your metadata or are there any restrictions on its use because of the environment or instrument in which it was created? How accessible is it without the need for specific software to unlock it?”
Lars Rodefeld - a scientific consultant who spent 27 years at Bayer CropScience - echoes the sentiment about data ownership. “We talked earlier about the need to combine legacy data sets following acquisitions of companies,” he says. “There are also complications if one divests of a company. All the annotated data will go if you sell that asset. That creates an issue for the machine learning models left behind – can they still use the data that has now been sold as part of the trained model, or do you have to retrain it once that data has been removed? It’s an interesting legal question that I’ve faced several times in my career.
“Can we still use the data in that way? Would we have to pay to use it? What happens if we develop an outcome that is in the interests of the new owner of the asset? Do we have to share data in return?
“These are the sorts of questions that let scientists think they shouldn’t share data too early. In agronomy, legacy assets can be viable for up to 50 years, so the question of asset ownership in a volatile market is very real.
“So, you really need to know that your models are trained on data that you own and have under control.”
3. Cost-effectiveness - with accurate results
Sebastian Klie - CEO of biotech company Targenomix (now part of Bayer) - says it’s all about accuracy and cost. “As users, ultimately, we’d love for novel technologies to be out there that enable us to get mass spec data at higher resolution and at lower cost,” he says. “Even this would create further issues, when it comes to centralised data storage, but we would have more data to train better AI models.
“The ‘little brother’ of AI is, in my view – and I believe in the general public perception – quantum computing. Whereas AI excels at extracting insights from enormous amounts of data, mimicking human reasoning or behaviour, many mechanistic models in molecular biology remain computationally intractable. For tasks like high-fidelity simulations of drug target interactions, quantum algorithms offer a path towards tackling these problems at their native complexity.
“For drug discovery in particular, the synergy of quantum computing and MS-based analytics is something to watch out for in years to come. It’s presently not gaining as much of the spotlight as AI/ML – and it may even be more complex – but I do think it will become more common and propel our research.”
4. More complex AI adoption - with multiple data sources
At LifeMine Therapeutics, Genomic Informatics Lead Kevin McConnell believes AI adoption will be integrated at various data levels. “I think we’re moving towards using AI to help predict multi-step processes, of which the mass spec data is just one part,” he says. “You also want to understand target association, patient stratification, target binding and so on. That means we’re going to be integrating multiple models or building larger models on more abstract information. We’re trying to model a biological system through AI and that will involve bringing in multiple modes of data to generate those predictions.”
5. Setting scale expectations
The last word goes to Principal Scientist at animal health drug company Zoetis Lisa M Bacco, who advises that there are wider considerations beyond the tools themselves - particularly when it comes to scale. “This is an evolving space that needs to consider scalability and the increasing use of AI and ML,” she says. “That means, for me, I must contextualise the scale at which we operate in the proteomics laboratory in order to facilitate wider institutional investment in high-performance compute resources, whether that be on-premise or in the cloud. As scientists, we need to communicate to those infrastructure departments the value of the data and the scale of it. That’s an upstream challenge that’s entirely separate from the bioinformatics challenges we’ve been discussing.”
Effective tools require effective data sources. Our panel also discussed the importance of identifying data challenges - and how to effectively overcome them from the outset. To read more, download the full report here. | ![]() |