Thanks for visiting Scientific Computing World.

You're trying to access an editorial feature that is only available to logged in, registered users of Scientific Computing World. Registering is completely free, so why not sign up with us?

By registering, as well as being able to browse all content on the site without further interruption, you'll also have the option to receive our magazine (multiple times a year) and our email newsletters.

Delivering data for environmental protection

Share this on social media:

Colin Gray explains how the Scottish Environment Protection Agency used an informatics hub to integrate its data and move towards an evidence-based approach

The Scottish Environment Protection Agency (SEPA) generates and collects millions of data points each year.  Data plays a fundamental role in allowing SEPA to performing its environmental duty, in improving efficiency, and in moving to an evidence-based approach, while improving interactions with stakeholders. Integrating all the Agency’s data into a single platform that allows it to be interpreted and used effectively provided its own challenge for the organisation. It is a challenge that has been successfully met.

SEPA collects data covering the full range of business areas including environmental, regulatory and business planning. Hydrological data such as river flows and rainfall; sampling data from chemistry, ecology, and microbiology; compliance and regulatory data from licensing; and finally business data such as planning, resources, budgets, and projects; are just a few examples of what we deal with. With such a large volume, diverse range, and cost of collecting this data, we must ensure the data is of high quality, and that we make best use of it.

Getting the right tools

During the last decade, SEPA’s Environmental and Spatial Informatics Unit (ESIU) has endeavoured to provide it with data analysis and modelling informatics tools to ensure best value and use of data is achieved. We had previously trialled the Tibco Spotfire and Statistics Server applications, and decided to invest in this software just over three years ago. Utilising this opportunity, ESIU embarked on a considerable effort to ensure we make best use of SEPA’s data and in particular, samples collected and analysed in our laboratories, from all over Scotland. We decided to deliver this through an Informatics Hub with many web-based, interactive, data-driven tools.

The aim was to provide these informatics tools, some of which are automated and integrated into SEPA’s LIMS and its Oracle databases, through web-based applications, so as to greatly enhance our understanding of the environment and our capabilities in analysis. We wanted to:

  • enable and improve data verification processes;
  • improve the quality and use of data, ensuring we are an evidence-based organisation making maximum use of data collected and supplied to us;
  • improve and automate generation of reporting for internal and external audiences.

To achieve this, we developed and employed a highly collaborative, Agile-driven development method that kept us focused on our business needs while ensuring close collaboration between the data scientists in ESIU -- who are data experts and can build these analysis and modelling tools -- and the end-users and experts, whether they be fellow scientists, business planners, or managers.

Accessibility for all

By making these informatics tools available through interactive web pages, we removed the need for specialised software. This means that any member of staff, or even members of the public, can access the datasets and take advantage of the interactive tools to perform analyses and research.

We configured the vast majority of tools to open instantly, by preloading the necessary data overnight for each tool. This not only improved usability for users of the tools, but also greatly reduced network load during business hours as Spotfire is able to cache the data loaded overnight for use during that day.

Where live data is required, or the data sets are too large to be preloaded, it is made available in SEPA’s Oracle driven data repository which is specifically designed for efficient and fast retrieval of large datasets, rather than querying data against slower corporate systems designed for data entry and storage. This policy of moving all datasets required for reporting and analysis into this optimised repository, rather than accessing the live data sources, greatly enhanced our ability to analyse large volumes of data, as well as combining previously disparate datasets to further enhance our analysis and modelling capabilities.

Better communication

This initiative has greatly improved efficiency, not only by providing instant access to standardised analyses and reports, but also by replacing laborious, error-prone Excel-driven processes with these informatics tools. The quality of the stored data is better because the opportunities for human error have been reduced in this way and so we have greater confidence in the data. Because the system now exposes large amounts of data in an easy to use and understandable manner, we can make better decisions on the basis of a better understanding of our environment. It also allows us to bring together previously disparate datasets to build a much more complete picture of regulation and the environment.

By combining these datasets we have also increased the value and impact of each dataset: i.e. the whole is greater than the sum of its parts. Furthermore, we are able to communicate data and information more clearly and external parties can access the information, as we publish informatics tools externally through Scotland’s Environment Web website (www.environment.scotland.gov.uk/get-interactive/discover-data/ ) and SEPA’s own website (www.sepa.org.uk/ ). 

ESIU is developing an ever increasing number of informatics tools (housed in an internal web-driven Informatics Hub) to maximise the value of the data we collect and analyse. The hub we have developed is accessible to all staff via our intranet. It provides a central place for all staff to go when they want to access our informatics tools. Since the hub’s inception in June 2012, we have added more than 50 tools to it. These tools have been used nearly 43,000 times since launch, with usage increasing year-on-year. More than 1,000 staff members, which represent over 75 per cent of SEPA staff, have used these tools. Development of these informatics tools is now integrated into all SEPA projects where appropriate, and its usage spans all our work portfolios, rather than just the Science and Strategy portfolio for which it was originally intended.

Colin Gray is a Senior Specialist Scientist at the Scottish Environment Protection Agency

Case study 1: Scottish pollutant release inventory

Each year, SEPA collects data from operators in Scotland to populate the Scottish Pollutant Release Inventory which is reported to Europe, and published on SEPA’s and the Scotland’s Environment website.

This dataset has data from more than 800 operators covering ~200 pollutants emitted to air and water. Previously this data was validated via a series of spreadsheets, which were passed between SEPA regulatory teams and the Scottish Pollution Release Inventory administration team. It was a lengthy and difficult process, given the amount of data and issues of integrity always associated with flat files such as Excel.

In 2011, we replaced this with interactive web-based Spotfire tools. These tools were able to highlight issues in data instantly and consistently, also while making reviewing and browsing the data much faster and more efficient, giving staff one place to view the data. We added the ability to add questions and comments to enable discussion, and recording of discussions. A review of the previous method, versus the new informatics Spotfire-driven method found a 66 per cent time saving in the first year. Given that the first year included training of staff to use the new tools, future years will achieve savings of more than 80 per cent in staff time.

However it was not only efficiency savings that were found: we found that by proving analyses such as ranking of sites and emissions, utilising visualisations of the data, mapping the data and instant methods of finding defined issues, the understanding of the data and subsequent data quality also improved significantly.

Case Study 2: Data analysis and visualisation of the environment

SEPA has collected millions of chemistry and ecology samples and supporting data, dating back to the 1960s. Our scientists need a way of being able to analyse and interrogate this sample data, that we invest considerable time and effort in collecting, analysing and storing.

The existing software was unable to meet this need, so we developed a Spotfire-driven informatics tool called Data Analysis and Visualisation of the Environment (DAVE). This tool allows anyone to retrieve the data recorded for any site and samples rapidly; then be able to analyse these as time-series charts; perform statistical analysis such as outlier checking, trends analysis, and step change analysis. It also allows the user to perform other tasks, such as testing distributions -- e.g. do the samples follow a normal or log normal distribution -- as well as assessing whether there is seasonality present in the data, and calculating science specific metrics, such as ecological metrics.

All these analyses, data, and charts can be exported as graphs or as data into Excel. Through various filters such as catchment, which team is responsible for that area, river name etc., users are able to search for any location in Scotland where a sample has been taken, or they can browse a map of our sampling network and simply select the sampling locations of interest to analyse. This means users can now analyse samples down an entire river to understand the changing conditions and environmental pressures on that river. This tool has become the most used informatics tool in SEPA.

We receive regular feedback on the positive impact of DAVE. For instance, even the simple task of extracting data for data retrieval requests that we receive, now takes a matter of minutes and saves chemists’ and scientists’ time. This tool puts the entirety of Scotland’s sampling at any scientist’s fingertips, allowing scientists to concentrate on research and science, rather than battling with unsuitable and difficult tools previously used to retrieve, manipulate and get data into a usable form.

Case Study 3: Diffuse pollution farm inspection analysis

SEPA invested in a tablet-based method for performing inspections of farms in priority catchments in Scotland. The data collected using the tablets was then moved into our Oracle systems when staff returned to a SEPA office. Each farm inspection provided a highly detailed review of the farming activities, as well as putting these into geographical context.

However, to analyse such a large volume of data would take considerable effort. To tackle this we developed a suite of informatics tools which could instantly retrieve any farm’s details, produce maps of the data, and display the photographs taken during the farm inspection. This gave our operations staff a rapid method of reviewing farms where issues, or examples of good practice, were being found.

We developed a highly detailed single-farm analysis tool; a national trends analysis tool, which will help understand farming in Scotland as a whole and show how farming is changing over time; and a tool which could automatically generate a letter summarising the farm inspection results for the land owner or farmer. This letter-generation tool not only summarised the farm inspection, but provided mitigation options and feedback from our staff to help improve the farm, while providing maps to detail where this feedback related to.

By using these informatics tools, and in particular the letter-generation tool, our staff are able to produce letters much more rapidly than the previous manual method and also understand and respond better to issues. This entire system was awarded the Holyrood Connect 2013 award for Innovation in the public sector.