When two worlds collide

Enterprise content management systems are helping organise scientific information while laboratory systems are starting to incorporate data from elsewhere in organisations. Siân Harris reports on how these two worlds are coming together

Scientists and engineers tend to like order. That is what they are used to. But not all scientific data and information is well ordered, or structured in the same way as all the other pieces of information. Take a new project from the French ministry of defence, for example. When such organisations want to equip a new fleet of tanks there are many complicated decisions to be made, about where and how the tanks will be used, what weapons are needed and which equipment will work with which other equipment. Then, there are the all-important factors of how much the result would cost and what changes would be needed, such as training personnel to use the new equipment.

With so many considerations, the French defence ministry turned to that staple tool of science and engineering, simulation. However, there is a catch: while the simulations in pure physics, for example, might use sets of data in the same, numerically-based formats, the data required for this French military project is much more disparate. This information is generally in a mixture of XML, PDF and other formats so cannot easily be fed straight into a modelling tool. In such situations, ideas from the business world can come in very handy, particularly electronic content management (ECM) systems. Such systems specialise in bringing order to unstructured data in a multitude of formats and help organise the information ready to feed it into the models. And this is exactly how the French ministry of defence approached this problem. They turned to two different software companies that worked together to produce the Joint Technical Simulation Architecture. This system has two components: the part focused on the simulation, produced by systems integrator Capgemini; and the part focused on organising the information, the ECM.

This latter part is provided by French ECM company Nuxeo. According to Bassem Asseh, who is the company’s account manager for the defence industry, ECMs enable companies to handle all types of documents whatever their format, whether they be Word files, PDFs or multimedia files, for example. ‘Documents stored within our system can be stored, classified, or accessed either through a file plan or search engine and more than one person can work on a document at once,’ he says. ‘We can provide reporting, allowing customers to know which documents are following the right process and which are stuck somewhere. If a project isn’t working fast enough, an email can be sent to the person in charge of a particular document.’ So how does this system help the French military? ‘We provide document management and configuration management,’ says Asseh. ‘If you want to put two materials together in a tank you need access to descriptions of the first and second materials to decide whether they can be used together. We manage the information that they need to build the system.’ And Nuxeo’s ECM is a good fit with Capgemini’s simulation system, according to Asseh.

‘Capgemini provides its Eclipse environment for the simulation and our system has an Eclipse RCP interface.’

In addition to document and configuration management and the simulation, an important feature of the Joint Technical Simulation Architecture is security management, especially given its military application. ‘Security is part of the interface and is able to give access rights based on attributes of the users. It is also related to the user directory so it can identify which users have which properties,’ says Asseh.

Science and insurance

Another situation where ECMs can help make the bridge between scientific results is in the insurance industry. Quest Diagnostics, previously known as LabOne, is a good example of this. This company provides laboratory testing for the insurance industry. Such clinical testing is a paper intensive industry and the company was handling more than 65,000 pieces of paper every day before it began to deploy ImageNow from Perceptive Software. According to Tom Johnson, Perceptive Software’s healthcare product manager, ImageNow helps the company to streamline document management for clinical requisition, toxicology and substance abuse testing, life insurance screening services and other processes. ‘Quickly deploying ImageNow in multiple departments allowed Quest Diagnostics to realise immediate returns on their technology investment,’ he says.

‘ImageNow manages all the extraneous data from paper, to email and to various file types, for example, allowing access from any system,’ he continues. ‘With Perceptive Software’s technologies ImageNow can interface with nearly any system. We can also operate using HL7 [standards for health information] and we have other options that can rely on programming as well, such as API and web services.’

Meanwhile, as the ECM can help in scientific processes, so scientific tools can move outwards to help the wider organisation.

This is what has happened with NoteBookMaker from the company of the same name. Originally conceived as a straightforward electronic laboratory notebook (ELN), this product has grown to include a chemical inventory module for tracking chemicals and formulations and a quality information module. It can also incorporate the functionality of other parts of the enterprise, such as accounting and human resources, if clients wish to include them on the same system. ‘We lead in with the ELN system, but can add additional tools, such as the ability to build accounting packages in it,’ explains Stephen Arpie, who is a director of NoteBookMaker.

‘The ECM would be another database system that the user would utilise. They can input and output automatically if they need to. In a laboratory setting users often simply copy or paste sample data from a LIMS, but for a bigger dataset, such as sets of accounts, users are likely to input and export the data into NoteBookMaker,’ he continues. This tool has found its way into several large organisations, especially in the USA. According to Arpie, the company’s clients include the US Department of Agriculture, who use the tool in field studies of cows; the US Army, where it is used in the infectious diseases laboratory; and Nasa, with its radio telescope data. Many organisations use the tool and its quality module to organise the information required for ISO 9000 certification, he added.

For government organisations, as well as for private customers and academia, the ability to keep track of and protect their work and information is very important. Like Nuxeo’s Asseh, Arpie sees security as an important factor in any system. ‘Some stuff is private so you need to limit who can get in. You don’t want to make it too easy for rogue employees or others to move or copy data,’ he says. Such individuals can be hindered by not keeping certain aspects of the tools proprietary, such as the code or the field names. ‘You can put limitations on data for security. If they don’t know what a particular field is actually called they can’t write to it,’ Arpie explains.

Formats and integration

The integration of data formats is less of a challenge for the systems that manage both the scientific and enterprise information in organisations, according to Arpie. The range of formats involved can be wide, including Excel, XML, CSV, PDFs, Word documents and graphical formats like EPS and JPG, not to mention the native formats of genomic databases or whatever system the scientists use. However, Arpie comments:

‘NotebookMaker can embed and attach any format just as you’d attach files to an email and you can even attach a whole operating system or the application used to create a particular data set.’ Such flexibility helps in archiving scientific information, he explains. ‘There are enough file formats that can be interchanged easily between various systems,’ he comments. ‘All the relationships can probably be generated, but nobody really needs to associate accounting data to GC chromatographic data.’

Such integration issues have been clarified further by the advent of standards governing ECMs themselves and the way they are structured. As Nuxeo’s Bassem Asseh explains, the Java Content Repository (JSR-170), which is about four years old, was the first standard to define the way that documents are stored with ECMs. ‘It allows you to have a unique storage system and access through several ECMs,’ he points out.

In other words, not only can the systems handle many data formats, but this standard also facilitates more than one database system to be used.

With such interchanges of both file formats and database systems simplified, we could well see more blurring of the edges between scientific and business tools in future years.

Analysis and opinion

Robert Roe looks at research from the University of Alaska that is using HPC to change the way we look at the movement of ice sheets


Robert Roe talks to cooling experts to find out what innovation lies ahead for HPC users

Analysis and opinion