Robert Roe looks at the computing and data management challenges that lay behind this week’s warning that climate change is not only real but already adversely affecting the United States of America
Massive supercomputing simulations and an innovative peer-to-peer collaboration on data management software lay behind this week’s warnings, issued by the White House, that climate change is already affecting the USA and that its effects will only get worse.
The 840-page US National Climate Assessment (NCA) cautions: ‘Evidence for climate change abounds, from the top of the atmosphere to the depths of the oceans’. According to Michael Wehner, a climate scientist at the US Lawrence Berkeley National Laboratory and a co-author of the report: ‘Today's rare weather events will become commonplace. Climate change is real and is likely to get worse. How much worse depends on our actions now.’
Wehner, who works in the Computational Research Division at Berkeley, stressed that: ‘The climate models that were used in the NCA3 to make the projections come from all over the world. Climate science may be unique in the extent that we share data between ourselves and the public. The NCA is likely the most transparent report ever produced by any government and the data sharing is a big part of that.’
More than 300 experts from many different countries put together the information contained in the NCA report. Data management and information flow in such a huge project was, itself, a significant issue that had to be solved.
Wehner continued: ‘My job was to collect that model output and analyse it. The database may be an interesting computing story as the challenges in distributing data are large.’ The team used the portal developed by the Earth System Grid Federation (ESGF), a peer-to-peer (P2P) collaboration that develops software for the management, dissemination, and analysis of model output and observational data. The ESGF provided a secure gateway for scientists all over the world to share information so that it could be analysed and reviewed researchers who may never even have met in person.
The ESGF is an inter-agency and international effort led by the US Department of Energy (DOE), and co-funded by US Government research organisations as well as laboratories such as the Max Planck Institute for Meteorology (MPI-M), the German Climate Computing Centre (DKRZ), the Australian National University (ANU) National Computational Infrastructure (NCI), and the British Atmospheric Data Centre (BADC).
The team of computer scientists and climate scientists developed an operational system for serving climate data from multiple locations and sources. Model simulations, satellite observations, and reanalysis products are all generated and distributed across the world. This is important because HPC resources that are used to run complex models are often located in research organisations or universities across the world. This system allows the outputs from climate models to be shared quickly across all participants of the study increasing the potential for collaboration.
As a result, Wehner explained: ‘The computing systems encompass probably every advanced architecture available and a lot of not-so-advanced ones as well. For the NCA3, depending on the future scenario, the number of different modelling groups ranged from 19 to over 50, from the US, Europe, Asia, and Australia.’
In order to fully grasp the complexities of climate change a study must not only take information from the present but also the past so that any changes can be verified as being significant. Even so, there are limitations as to what the models can do arising both from there not being enough data and from the difficulties of constructing sufficiently detailed and accurate models.
The NCA report is upfront about this: ‘There are multiple, well-documented sources of uncertainty in climate model simulations. Some of these uncertainties can be reduced with improved models. Some may never be completely eliminated. The climate system is complex, including natural variability on a range of time scales, and this is one source of uncertainty in projecting future conditions. In addition, there are challenges with building models that accurately represent the physics of multiple interacting processes, with the scale and time frame of the available historical data, and with the ability of computer models to handle very large quantities of data. Thus, climate models are necessarily simplified representations of the real climate system.’
The issue of the time-frame is particularly difficult, as the NCA report states: ‘Climate models are not intended to match the real-world timing of natural climate variations – instead, models have their own internal timing for such variations.’ The report give an example of how modelling studies do not account for observed changes in solar and volcanic forcing, and concludes: ‘Therefore, it is not surprising that the timing of such a slowdown in the rate of increase in the models would be different than that observed, although it is important to note that such periods have been simulated by climate models, with the deep oceans absorbing the extra heat during those.’
Uncertainties are not the only problem climate scientists face. Adding complexity to a model or improving its resolution can drastically increase the computational power needed to run the simulation. This makes climate science challenging, not just because of the complex systems, but also due to the challenges associated with HPC and big data.
One example is earlier work that Wehner conducted at t Berkeley extreme weather. Wehner researched intense land hurricanes, known as derechos, and atmospheric rivers like the one that California saw in late 2012. The studies required huge amounts of computing power for the model to be run in a timely manner.
‘In order to simulate these kinds of storms, you really do need high-resolution climate models. A model run can produce 100 terabytes of model output. The reason it’s so high is that, in order to look at extreme weather, you need high-frequency data. It’s a challenge to analyse all this data,’ said Wehner. He found that a model that would take 411 system days to run on a single processor could be completed in just 12 days on Hopper, a massively parallel supercomputer at the National Energy Research Scientific Computing Center (NERSC) at Berkeley Lab.
The work done over the past few years to understand and quantify the uncertainties and improve the models has improved climate simulations so that, as the NCA report observed, some models do now have the ability to replicate ‘observed climate’ in their simulations.
