Knowledge: Data analytics

This chapter takes the theme of knowledge management beyond document handling into the analysis and mining of data. Technology by itself is not enough – laboratory staff need to understand the output from the data analysis tools – and so data analytics must be considered holistically, starting with the design of the experiment

Data analytics is the term applied to the process of analysing and visualising data, with the goal of drawing conclusions and understanding from the data. Data analytics is becoming increasingly important as laboratories have to process and interpret the ever-increasing volumes of data that their systems generate.

In the laboratory, the primary purpose of data analytics is to verify or disprove existing scientific models to provide better understanding of the organisation’s current and future products or processes.

Data mining is a related process that utilises software to uncover patterns, trends, and relationships within data sets. Although data analytics and data mining are often thought of in the same context, often in connection with ‘Big Data’, they have different objectives.

Data mining can broadly be defined as a ‘secondary data analysis’ process for knowledge discovery. It analyses data that may have originally been collected for other reasons. This differentiates it from data analytics, where the primary objective is based on either exploratory data analysis (EDA), in which new features in the data are discovered, or confirmatory data analysis (CDA), in which existing hypotheses are proven true or false.

In recent years, some of the major laboratory informatics vendors have started to offer data analysis and visualisation tools within their product portfolios. These tools typically provide a range of statistical procedures to facilitate data analysis; and visual output to help with interpretation. Alongside the integrated data analytics tools, more and more vendors offer generic tools to provide software that can extract and process data from simple systems through to multiple platforms and formats. The benefit of integrated data analysis tools is that they will provide a seamless means of accessing data, eliminating concerns about incompatible data formats. As with any other laboratory software, defining functional and user requirements are essential steps in making the right choice. Key areas to focus on are that the tools have appropriate access to laboratory, and other data sources; that they provide the required statistical tools; and that they offer presentation and visualisation capabilities that are consistent with broader company preferences and standards.

Data analytics plays an important role in the generation of scientific knowledge and, as with other aspects of ‘knowledge management’, it is important to understand the relationship between technology, processes, and people. In particular, staff need to have the appropriate skills to interpret, rationalise, and articulate the output presented by the data analysis tools. To take full advantage of data analytics, it should be considered as part of a holistic process that starts with the design of the experiment.

A quote attributed to Sir Ronald Fisher, ca 1938, captures this point: ‘To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: He may be able to say what the experiment died of.’

Next: Summary >


Robert Roe reports on developments in AI that are helping to shape the future of high performance computing technology at the International Supercomputing Conference


James Reinders is a parallel programming and HPC expert with more than 27 years’ experience working for Intel until his retirement in 2017. In this article Reinders gives his take on the use of roofline estimation as a tool for code optimisation in HPC


Sophia Ktori concludes her two-part series exploring the use of laboratory informatics software in regulated industries.


As storage technology adapts to changing HPC workloads, Robert Roe looks at the technologies that could help to enhance performance and accessibility of
storage in HPC


By using simulation software, road bike manufacturers can deliver higher performance products in less time and at a lower cost than previously achievable, as Keely Portway discovers