As laboratory science enters an era defined by automation, AI, and data-driven decision-making, many organisations find themselves constrained by legacy informatics systems built for a very different world.
Burkhard Schaefer has been in the laboratory informatics community for more than 25 years. He serves on the Board of Directors of the SiLA Consortium and leads the AnIML Task Group at ASTM, where he focuses on advancing interoperability and open data standards for laboratory instruments and scientific data. He is also an SLAS Fellow in recognition of his long-standing contributions to the field.
Drawing on his leadership roles within the SiLA Consortium and ASTM’s The Analytical Information Markup Language (AnIML), Schaefer explains why open standards and lightweight metadata are essential to making laboratory data usable, reusable, and future-proof.
Rather than chasing the next technological wave, Schaefer advocates for a pragmatic approach that prioritises context, discoverability, and integration across instruments, informatics platforms, and downstream analytics. The result is a roadmap for laboratories seeking to remain agile and competitive over the next decade.
Schaefer earned his diploma in computer science from the Technical University of Kaiserslautern in Germany and the University of New Mexico. His career spans senior roles in data management and scientific computing at Los Alamos National Laboratory and NIST, as well as co-founder of BSSN Software and Head of Core Technologies and Partnering at MilliporeSigma. He is currently Managing Director at Splashlake, where he works on next-generation approaches to multimodal scientific data management and laboratory interoperability.
How can organisations balance investment in digital tools with the need for long-term financial and technical sustainability?
Schaefer: It’s a huge question. One thing that strikes me is that it’s 2026. The first major wave of lab informatics tools entered the market around the turn of the century, roughly 25 years ago, perhaps even earlier in the mid-1990s. That’s when you saw LIMS conferences start to grow and peak, followed by the ELN wave, driven in part by IP discussions around first-to-file versus first-to-invent and regulatory pressure from the FDA.
That period gave us the first generation of informatics platforms. Now, 25 or even 30 years later, some of those original players are still in the market, but they carry the legacy burden. You have organisations with enormous experience, but products that are effectively dinosaurs. And this isn’t just a vendor problem. Early adopters came on board as these platforms matured from 2000 to 2010, and now they’re facing a shift. These platforms either evolve or become legacy, leaving users wondering what to do next.
We’re no longer in a greenfield situation. It’s a brownfield. Some systems are actively used, others are collecting dust. Organisations are asking whether they should keep legacy systems alive, invest in migration, or leave things as they are because, frankly, the old tools still work. That creates tension for vendors as well. They have to balance innovation against a large installed base that expects continuity.
At the same time, strategic technology decisions made one or two decades ago are now colliding with rapid change. Entire categories, like some SDMS players, have come and gone. The landscape keeps shifting.
On the capability side, we’ve seen successive waves. Cloud computing, big data, and now AI and closed-loop experimentation. Each wave comes with the same message. If you don’t jump on this, you’ll be left behind. Then the wave subsides, and we all ask how to use it properly. That historical perspective, where we’ve been and how we got here, is critical when we talk about what the next decade might bring.
One thing many organisations still haven’t gotten right is the data foundation. All data from instruments, LIMS, ELNs, and other systems must be accessible and usable. Every transformative technology, including AI and closed-loop processing, depends on data availability. We can’t expect AI to magically infer meaning from poorly structured data. That’s where interoperability becomes essential, and why initiatives like SiLA and AnIML matter so much as enablers.
What are the best practices for creating a solid data foundation?
Schaefer: The data foundation means that all the data from the different systems, from instruments, LIMS, ELNs and so on, need to be accessible in an easy way, so I can create value out of it, and whatever fancy tool I introduce today, whether it's AI, whether it's closed loop processing, all of these things depend on data being available.
We can't expect AI to receive a snapshot of the data package and infer what it is. That is obviously where interoperability is, that's where things like SILA and like AnIML are extremely important as an enabler.
Too often, organisations try to jump straight to the next step. They say they want closed-loop experimentation, or they want AI but haven’t considered where the training data comes from. Suddenly, they’re writing custom ingestion pipelines just to make their models work. Eventually, they realise they may not need to build bespoke models at all; instead, they should use off-the-shelf solutions and focus on making their own data usable. These are all symptoms of the same underlying issue. The data foundation isn’t right.
That’s riskier now more than ever, because these new technologies promise real productivity gains and competitive advantage. Organisations think first about transformation, but the foundation is missing. Meanwhile, across the ecosystem, there are pre-competitive efforts such as standards bodies, common terminologies, data catalogues, and metadata governance, all aimed at helping organisations understand what data they have and where it lives. We’re living in the century of the haystack. There is more and more data, more hay piled on top, and more needles hidden inside. And we keep hoping some magic, often AI, will solve the problem, when the real challenge is controlling the pile itself.
This leads to a second question. What do tools need to look like in this world? They’re very different from traditional tools. Scientific data management used to mean managing files. SDMS platforms were essentially sophisticated file repositories. That’s no longer sufficient. Today, we have time-series data, continuous sensor streams, and instruments that never stop producing values. You can’t meaningfully store that as files.
Even simple instruments introduce complexity. For example, a pH meter has no idea what the sample is you're measuring. There is no value in simply recording that the pH was 7.6; you have no idea where it came from. You need to think about chemical structures. To consider using those structures to tag and annotate items, enabling you to search your data by structure. You need to consider metadata and a common naming scheme, which indicates the need for multimodal scientific data management.
Splashlake has been working with multimodal data management for the past couple of years because we see a gap: it's more than just files; it's contextual information that supports your experimental data. It's more than just files, if you're looking at these downstream consumers, like AI, analytics and so on, you need to bring together systems that generate the work - LIMS, ELN and so on with the last mile to the instrument, and you want to bring all of that together, and I think this is hard to do without a multimodal approach and without communication standards for the last mile.
You don't have to build bespoke interfaces for every piece that you're trying to integrate. And I think that's really the risk of falling behind, and we're not realising that yet.
How can existing laboratory standards cover new multimodal data types?
Schaefer: That naturally raises the question of standards. Do we need more of them, or can one organisation cover all these data types? The temptation is to apply the same pattern everywhere, but that doesn’t work. There’s also the new hammer syndrome. Once you have a solution, every problem looks like a nail. Different domains need different representations. You wouldn’t store microscopy images the same way you store chromatograms, or balance readings the same way you store qPCR data.
Rather than forcing everything into a single format, the key is metadata. Lightweight metadata that allows you to link things together without building massive ontologies upfront. Over-modelling stops you from getting started. Many ontologies don’t exist yet, and waiting for them means paralysis.
What you need instead is the ability to say this is batch 100. For that batch, here’s the freezer data, the QC analytics, the reactor data, the ingredients, lots, and formulations. These are different data types, possibly in different systems, but they’re tied together by consistent metadata. You then pull them together, ideally through a single interface, without replacing specialist tools. There is no longer a single master system.
The tooling landscape evolves too fast for that. Every few months, there’s a new AI breakthrough. Conferences are full of talk about dark labs and full automation. Betting on one horse will fail. Agility and navigability are now essential, and that calls for new tools and new thinking.
Does that make interoperability more important than ever before?
Schaefer: Interoperability becomes central, not just technically but economically. Organisations will run ecosystems, not monoliths. Best-of-breed tools tied together by a common data backbone, appropriate metadata, governance, and pragmatic use of open formats where they make sense. Vendors need to engage here, and users need to demand it in their RFPs.
There’s been a mindset shift. Instruments exist to produce data. That’s the reason customers buy them. If an instrument doesn’t integrate into a digital ecosystem, it increases the total cost of ownership and becomes less attractive. Openness now creates stickiness. In the lab, if two instruments perform the same task but one integrates seamlessly while the other doesn’t, technicians will choose the integrated one. That drives utilisation, consumables sales, and future purchasing decisions.
Historically, labs purchased instruments based on being fit-for-purpose, not vendor lock-in. That logic is returning, especially as new assay types and biologics tools emerge. Suppliers increasingly depend on being easy to integrate, and standards are becoming table stakes.
From a data foundation perspective, it’s important to distinguish between payload and context. Systems like LIMS are context-rich but data-light. They know samples, provenance, and results, but not raw data. Instruments produce raw data but have no context. Other systems, such as ERP, ELN, and MES, hold different parts of the story. The value emerges when raw scientific data is combined with its context.
When building a data foundation, consider how you want to navigate your data. Navigation is driven by context. Samples, batches, products, compounds, and cell lines. Payload data, such as measurements and reports, comes second. You need clear sources of truth for each domain, strong metadata, and governance that ties it together across modalities.
Finally, you have to accept that you don’t yet know all the questions you’ll want to ask. That’s where context really matters. You may not be able to normalise or analyse everything today, but you must be able to find it five years from now and view it with fresh eyes. Future-proofing is about findability. Knowing what data you have, where it lives, and how to rediscover it. Data cataloguing and discoverability are what allow you to reuse data to answer questions you haven’t even thought of yet.
Burkhard Schaefer is a Director and Head of Partner Management at the SiLA Consortium and the Managing Director at Spashlake.