Supporting standards

Share this on social media:

Sophia Ktori discusses the implementation of data standards for laboratory informatics

There has been a noticeable leap in the realisation that data formats, and the application of open standards in particular, will play a huge role in the seamless integration and future-proofing of the digital laboratory. That’s the view of Daniel Juchli, chief technical officer at SiLA Consortium, and head of Lab & Research Informatics at life science consultancy wega Informatik, based in Switzerland.

‘In order to achieve the goal of generating FAIR (findable, accessible, interoperable and reusable) data, and enabling true, plug and play interoperability, labs must look to open, community-driven data formats and instrument communication standards, such as AnIML and SiLA.’

A true standard must be widely accepted in industry, but to be so it should also be open and royalty free. SiLA (standards in laboratory automation) is an open standard that is represented by a non-profit organisation, but more importantly, also by a community of end-users, software and instrument vendors who are driving SiLA’s evolution and trialling the tools and utilities that will be required to make instruments compliant with the standard, Juchli stated. ‘Parallel to the SiLA organisation, industry is accelerating uptake and optimisation of the standard and its utilisation. And this industry drive has become self-propelling, which further aids uptake. Standards such as SiLA are not enforceable.

But if potential users can take part in development and witness the benefits first hand, then they jump on board.’ Juchli suggested that just in 2019, more companies were working on SiLA projects than had been doing so over the rest of the past decade.

Increasing adoption

The release of the instrument communication standard SiLA 2 last year has made it easier for people to work on real-world projects, Juchli explained. ‘We now have vendors knocking on the SiLA door, asking to implement the standard. It will only be a matter of time before the first commercial products are out there with full SiLA 2 support. Encouragingly, SiLA and AnIML [analytical information markup language] fit well together, so there are now robust, vendor-neutral standards for managing how instruments connect and communicate, and for husbanding and viewing the data that they produce.’

Critically, the SiLA organisation has made great strides to ensure complete transparency in development of the standard. ‘Everything has public visibility, you can see the complete process of development and decision making. This transparency aids uptake, because potential users can see how the result is derived from input of potentially hundreds of companies to refine and hone.’

Understanding and uptake have been aided by the simple fact that interoperable technology is now so much a part of everyday life – think how smartphones and tablets connect with each other, and to printers, headphones, speakers etc. ‘Plug and play is expected, and this expectation of interoperability has started to penetrate the laboratory environment.’

SiLA enables this communication between different service systems in the lab, such as balances, LIMS or ELN systems, Juchli noted. ‘SiLA represents a kind of microservice architecture. To do this SiLA introduces the concept of a feature definition language, to describe those services, in terms of their abilities, what data they may consume or produce, which interaction models they support, and even the messages, commands and actions that they provide, in a localised user language. Importantly, these feature definition languages can be generated by the average scientist in the lab, and are still machine readable. They don’t require any IT experience or knowhow. It’s kind of the glue between the business and the IT worlds.’

Ultimately, this means that the product manager of a spectrophotometer or chromatography data system can easily create feature definitions that describe the services that the system is offering. ‘Importantly, this isn’t confined to the life science laboratory,’ Juchli said. ‘You could feasibly create feature definition languages that would match workflows and data in disparate, non-life science industries.’

Combining standards

It’s another tier to the overall goal of improving automation, and in the laboratory this isn’t just robotics, it’s about workflow and data flow automation. ‘Add in AnIML, and you not only get instruments talking to each other about what they are doing, but you then have a standard that describes the data that they generate and send to other systems in the lab. AnIML’s technique definition functionality defines how the data looks, in a similar way to how the SiLA feature definition defines how to speak to the system,’ Juchli noted. ‘Then you can imagine introducing two systems into the lab that have not been connected to each other before, but because of the self-describing nature of the SiLA features, they can effectively discover each other, and start to interact.’

Based on HTTP/2, SiLA 2 concentrates on functionality and device behaviour, rather than device type, the SiLA organisation claims. ‘SiLA 2 essentially views everything in the lab as a service,’ explained BSSN Software president Burkhard Schaefer. ‘So, an instrument could be a service, or a LIMS could be a service, and using SiLA you can access and communicate with these services through the network. The first stage is a scan of the network to discover what services are present – in the same way that your Apple device can look for printers or speakers in its environment. Those services then communicate what features they have and can offer.’

A feature is a set of commands that belong together – say, a weighing balance can be instructed to prompt a user to weight a certain amount of sample within a given tolerance. Equally, a LIMS could send a list of samples to a chromatography data system, monitor the measurement progress, and receive the results back in AnIML format. ‘Importantly, its two-way communication between instruments and software through SiLA 2. The instrument responds when the command and action have been completed, and there is also an error-reporting function for when things go wrong.

‘It’s no longer necessary to write files on a PC and pass them from disk to instrument, because the systems intercommunicate using the same language. The actual data generated on an instrument as a result of the instructions can then be wrapped up and posted back across the network.’

Managing data

Developing a FAIR data infrastructure using open standards for data management and archiving, and for instrument communication may be the ultimate goal for data usability and longevity, but this doesn’t mean that existing proprietary instrument software should become obsolete, commented Schaefer suggested. ‘Data in its original format still has enormous value, and there are things you may be able to do with data generated in proprietary instrument software that you may not yet be able to do with the open data formats. There is still a valuable business case for the vendors, who have designed and enabled valuable functionality into their instrument software, and which users will have configured and set up to match their own method and workflows.’

This may create something of a dilemma for organisations who are driving to implement open standards, but who don’t want to lose sight of familiar data. And, as Schaefer noted, ‘We are not out to fix something if it isn’t broken.’ Ideally, industries and the standards developers will ‘embrace, rather than replace,’ when it comes to adopting standard data formats and communication protocols.

So, in an ideal world organisations would be able to retain their original data side-by-side with data in standard format, in easily accessible, but also standard packages. It may seem like a tall order, but this concept of ‘container formats’ is gaining ground, and the technology to enable that goal has, in fact, been available freely for years, although it’s only now being harnessed for data corralling within the lab informatics field,’ Schaefer noted. ‘The Open Packaging Convention, or OPC, originated from Microsoft, and is, effectively, a sip file technology that has now been documented in ISO/IEC 29500 and ECMA 376 standards.’ Look it up on Microsoft website and Open Packaging Conventions (OPC) is described as a file technology for designing file formats with a shared, base architecture. ‘The OPC integrates elements of sip, XML and the Web technologies into an open, industry standard that makes it easier to organise, store and transport application data,’ the website says.

Think of OPC as a kind of filing cabinet, but not the individually arranged drawers of a personal filing system. OPC container packages must conform to a predictable, conceptual organisational system – called the logical model – and predictable physical characteristics, called the physical model. It’s these standardised conformance requirements that are described in the OPC.

OPC makes it possible to parcel up all of your proprietary and standard data and associated files. ‘OPC effectively provides a convention for how you prepare a sip file, so that it contains all of the data, including metadata, appropriately prepared and parcelled up,’ Schaefer said. ‘It’s kind of a revelation for anyone working with proprietary and standardised data, because it allows users to keep all of their data, standard, original, metadata and workflows, in these discrete, also standardised ‘containers’.

‘At its most basic level, the idea is you take your instrument data and place it in the sip file. You add to that XML metadata, and then you add your open data formats, such as AnIML, to the same file. You can then keep all of your data together in a very accessible format that keeps the link between the open standard and the proprietary data. People can then dip into and out of one data format or another.’

Importantly, OPC hasn’t necessitated a major reinvention of the wheel. ‘sip files can be read on just about any computer, and we’ve all been using them every day for decades. Microsoft is even using this format as the base for their DOCX or PPTX Office file formats. One of the great things about OPC is that because Microsoft supports it, there are libraries for pretty much any platform. So it’s very accessible and fits into that spirit of open accessibility and FAIR data.’

Data standards pioneer BSSN Software has been working with customers to create an implementation for OPC, which is an ideal fit with the company’s data converters, which are now available for upwards of 200 instrument models. ‘Using OPC companies can archive all data into OPC sip containers that will, we can assume, still be readable in the next 20 years. This can only help to lower the hurdle for adopting standards, as it follows that ‘embrace, not replace’ mentality.’

BSSN is a leader in developing data management and integration software that facilitates the seamless interoperability communication and flow of data from informatics systems, scientific and instrument software. The firm has championed the XML-based AnIML standard for data reporting, storage and sharing, and Schaefer is also on the board of directors of SiLA, a non-profit consortium that is developing standards for defining how information is transported and communicated from one laboratory system to another – effectively how systems talk to one another – rather than how the data they generate is structured.

OPC fits in with the concept of data standards being developed and adopted as non-competitive, community initiatives for the benefit of scientific discovery and development. ‘And importantly, OPC doesn’t tread on the toes of either AnIML, which is representing the data, or SiLA, which is developing the communications standards,’ Schaefer noted. ‘Our vision is to make standards accessible and understood by every scientist, and without forcing them to abandon their existing data formats.’ OPC helps to do this.

The vision to give every lab an understanding of and access to open standards is one shared by Merck, which acquired BSSN in June. Through the merger, Merck says it will combine BSSN’s technologies with its own market access and laboratory domain knowledge to develop and commercialise an open and interoperable platform for laboratory data.

‘With the power and market reach of Merck, we can achieve this vision of developing an ecosystem of standards that all play together, and which can be accessed globally, by labs and organisations of any scale,’ Schaefer stated. ‘Merck is neither a major analytical instrument vendor nor a traditional software developer, so we can exploit their global reach on a more impartial basis, non-competitively, to evolve this open system, building on AnIML and on the concept of community-driven standardisation protocols.’

What’s also becoming clear is that these open standards are not static, but, as with any form of software, will continue to evolve to support increased functionality, Schaefer noted.

Community cooperation

As a global LIMS and ELN provider, LabWare is championing the uptake and real-world implementation of standards as a priority for laboratories, explained Jim Brennan, technical sales specialist. ‘When LabWare started in 1987, the firm didn’t offer LIMS or ELN systems, but operated as an independent interfacing company delivering bidirectional interfaces to connect third-party commercial LIMS to laboratory instrumentation. During the last three decades we’ve been at the front end of developing solutions for managing that transfer of data from instruments into other systems. It started with simple electronic balances and then progressed to more complex analytical systems, such as chromatographic data systems, DNA sequencers and their associated software.’

Even during the early days there were standards available, which were incorporated into LabWare solutions, Brennan said. ‘Some of these older standards are still used, such as the ANDI (Analytical Data Interchange) protocol for chromatographic data, and ASTM for clinical instrument communication. Over recent years we have seen new standards emerging, including SiLA and AnIML, which are being developed through non-profit consortia, with industry input and co-operation.’

LabWare’s support of these emerging standards has prompted the firm to build an AnIML module directly into its core software. ‘This module simplifies the ability to build an interface for different manufacturers, different software or different versions of that software,’ Brennan said. ‘It provides a single point of entry into LabWare for any data from an instrument that is accessible or available in AnIML.’

For end-users, companies and laboratories there are some challenges associated with adopting a standard such as AnIML, Brennan acknowledged. While the benefits of having laboratory data in a standardised, human readable, easily archived, searched and secure format are evident, companies will have the challenge of converting often terabytes of legacy data into AnIML. This may seem daunting – ‘and its not something LabWare undertakes,’ Brennan noted – but there are specialist companies that will undertake that task, and while there are costs involved, the benefits are manifold.

Importantly, AnIML is an ideal data format for archiving, as it secures future accessibility, he continued. ‘This means companies don’t have the expense of maintaining outdated, proprietary software, just so that data can be viewed years down the line. We should remember that data belongs to the organisation that generated it, and so they should have it in a format that is accessible, readable and usable with tools that they are using now, but also that they might be using in the future.’

And this is, at least in part, why AnIML is based on XML, because there are so many tools for viewing and reading XML, he noted. ‘AnIML may not be the first standard to be based on XML, but it has been developed to take full advantage of what XML can offer. Coupled with this, AnIML also supports data integrity, as it is built on base64 encoding, which provides evidence of modified data while it’s in transit through different systems.’

Another key benefit of AnIML is that it’s a free and open standard, said Brennan, mirroring Juchli’s comments about SiLA. ‘There’s no cost of entry, and its supported by the community.’ In fact, AnIML supports many industries, not just pharmaceuticals and the life science sector, Brennan suggested. Importantly, it also simplifies interface maintenance. ‘As instrument control software is updated repeatedly, an interface can become fragile. By adopting AnIML and having a single point of entry for it, there isn’t the need to completely rebuild an interface each time its updated or modified.’

Encouraging uptake of standards such as AnIML and SiLA is largely a matter of education, which LabWare promotes at its customer conferences, not just for end-users, but also instrument vendors, Brennan said.

‘The instrument vendors are very much part of the overall conversation.’ And once potential users become aware and understand the benefits of AnIML, then they typically start to think about how to incorporate the standard into their organisation. ‘This will inevitably flag up the issue of converting legacy data, which is probably the biggest challenge up front.’ It’s important to make customers aware of the downstream benefits that outweigh the up-front costs, not least because outdated systems, and the costs of IT support and maintenance that they require, can be negated.

‘We hear stories about old PCs sitting in basements running old Windows operating systems with an old piece of software, which have to be maintained just so that data remains accessible. This is a huge financial and technical burden, and not something that you want to rely on for important, valuable data.’ But there is no easy road to conversion, Brennan noted. ‘On a practical level, this goal of data conversion is often tackled in manageable chunks.’

Encouragingly, LabWare sees community co-operation in the development of standards in real-world settings. ‘Some of our customers are interested in setting up a consortium to further look at data standards and how they can best be applied. We hope we will have a part in that, but it’s the customers that drive the effort, and they are, ultimately, guiding us in everything we do. We have brought data standards to their attention, and are making them accessible, and now they are going to explore and guide us on how we can work together to put them into action.’  

 

Exclude from view: