DATA ANALYSIS: OCEANS
Down by the sea
View from a robot data acquisition buoy in the Pentland Firth, between Scotland and Orkney (part of a solute transfer survey in association with a University of the Highland and Islands project).
Felix Grant fishes for data analysis applications among the mysteries of the deep
‘Wide is the ocean, sweet gravity...’ while that refrain from Cerys Matthews’ song1 Ocean is intended as poetic metaphor, it is also appropriate to a scientific computing view of things. The size (roughly two thirds of planetary surface area) and mass (on the order of a quintillion tonnes, sloshing around daily under tidal pull) of the oceans are central to their importance and there is little on earth, from its core to the limits of atmosphere or from microbes to tectonics, that can be treated meaningfully without reference to these huge bodies of water and dissolved solutes.
Starting from the macro end of the scale, earth system modelling systems, such as that of the GENIE2 project, are increasingly providing grid facilities to local computing tools used by researchers. In GENIE’s case, the National Grid Service (NGS) and Natural Environment Research Council (NERC) data grid connect with a wide and flexible range of analyses, stretching down to individual desktops. It includes among its components a frictional 3D Edwards and Shepherd3 ocean, and incorporates specific facilities for GEODISE linkage to MatLab. A recent example of its use is a study4 that, among other things, identifies the possibility of an early warning period for potential collapse of the Atlantic thermohaline circulation (THC). Vulnerability of this Gulf Stream driver is one of the best known ‘doomsday factors’ associated with posited anthropogenic climate change. Since ocean movements operate over centuries and millennia, analysis has to address an equally long time base and GENIE pays particular attention to the 20,000 years or so since the last glacial maximum.
On a smaller span, both spatially and temporally, but of no less scientific interest, are the migrations of the bar tailed godwit. This wading shorebird crosses astonishing distances (10,000km and more) across open water without stopping: individuals have been tracked across the Pacific from Australasia to China. In some ways, the flight of hummingbirds across the Caribbean may be even greater feats of endurance. Analysis of these birds and their flights has a great deal to teach us in a number of respects – not least, according to one study5, our habits of thinking about oceans. We tend to think of the ocean as a barrier to movement, but this may be an anthropocentric (or, at least, land-based) fallacy. Gill et al suggest that it may instead provide to avian species an ecological corridor, offering benefits such as freedom from predators.
Below the bar tailed godwit’s flight path, and below but close to the surface of the water, algal blooms have impacts that affect human interests through the food chain and attract increasing analytic interest. Bloom arising from eutrophic nutrient increase is at least partially related to human-centred activity, and has complex mixed effects on wider biosystems. Those species which feed on the algæ involved also increase, in the short term at any rate; so do those that in turn feed upon them; but oxygen is depleted and light is cut off from lower depths. Biodiversity is reduced, ecologies overturned as invasive species gain an advantage, toxins are released. The consequences range from ciguatoxin poisoning in human populations to ‘dead zones’ in the marine environment. With a considerable portion of the earth’s human population dependent on marine food sources, bloom is a vital research area. A University of Toronto doctoral thesis earlier this year6 suggested methods for integrating high order mathematical models for biological and chemical water quality with Bayesian methods for a step change improvement in study of the factors lying behind blooms. While these methods were applied only to closed systems such as lakes, the principles could in principle be extended to localised oceanic systems at least. A marine biologist in Queensland, Australia, concerned with fish stocks, discussed huge GenStat databases on which continuously updated analyses are looping 24 hours a day, covering bloom in shallows from the Antarctic up the Asian Pacific coastlines as far as the Bering Straits.
Fishery is, for the reason mentioned above, a prime mover in applied ocean research of all kinds: even when not obviously linked, there will often be a connection somewhere down the line. The scale and scope vary enormously, from individual researchers working in spare moments to whole government-funded networks of private, public and academic facilities pulling in a single contractual harness. The analytic software range is commensurately wide.
One of Thermo Scientific’s customers for its Nautilus LIMS is Canada’s federal Department of Fisheries and Oceans (DFO), who uses it to manage its marine research and development programme. Surrounded on three sides by the world’s longest national coastline, one of those sides frozen over, the DFO sees them as a single resource embracing everything from fundamental research through fisheries and Inuit rights to the coastguard – which covers a lot of data analysis. DFO mentions a long list of research interests that ranges alphabetically from aquaculture to zone monitoring, and in a phased switchover currently in its fifth year the resulting data, users and facilities involved are being brought under the single information management umbrella. The range also stretches from a study7 of gene expression effects in salmon, analysed in Sigmaplot, to a geomorphic model hosted on a high performance computing platform.
Fisheries, for that matter, represent a big area in itself. Just the analysis of collateral damage to marine species, habitats and environment from net fishing looks at a portfolio of concerns as detailed as injury to juvenile blue swimmer crabs8 or as large as the global threat to coral reefs. Increasing industrialisation of fishery extraction in recent decades has made analytic approaches to management and conservation of stocks essential. Tracking and analysis of shoal behaviours is both a pragmatic short-term guide to productivity and essential underpinning for fundamental long-term conservation planning.
Software in use for this sort of work naturally spans the whole gamut of available products, but some are more dominant than others. There is an increasing trend away from acquisition of larger systems (though they are used too) towards smaller desktop solutions. GenStat, with its strong life sciences history, is popular; so is the spatially oriented Systat.
Schematic illustration of GENIE’s structure from Price and Yool at the Southampton e-Science and Oceanographic centres.
There is also a significant strand of work done using mathematical software; marine ecosystems are, as Akpalu9 points out, in a discussion of Wolfram Mathematica-based work on biodiversity in fisheries management, ‘complex, and ... ecologically interdependent ... losing a species could produce a cascading effect on other species’. An example is offered by Clemente and others10 in which reduction of top level predators leads to a rise in sea urchin populations and consequent decline in habitat quality. The economic analysis of such possible biological futures is often more appropriately done in symbolic terms.
Fishery is not the only oceanic harvesting process. Then there is the pharmaceuticals industry, which does not neglect marine biota in its endless search for new medically useful molecules. Spanish biopharmaceuticals company PharmaMar goes so far as to specialise in this sort of marine sourced search for new cancer treatment leads. Data from more than 70k marine organisms are currently banked in the company’s LIMS (Nautilus, again), available for analysis by PharmaMar’s own research staff and linked academic facilities.
Then there is the prospect of mineral extraction from sea water, once regarded as an uneconomic pipe dream, but now attracting renewed interest as new technologies and new shortages change the rules. Tantalising synergetic opportunities arise, for instance, from viewing minerals as a side benefit of geothermal energy – though so do questions, and urgent analyses, regarding environmental destabilisation.
That risk, environmental destabilisation, is present in every human use of the sea. The oceans are large, and can absorb a great deal, but their balance is not uninterruptible. Even the most cursory look at computerised marine data analysis activity reveals two familiar strands, often forced into collaboration, but always fundamentally separate: how to exploit efficiently, and how to conserve effectively. Perhaps, to sidestep my structure for a moment, the most high profile example is the petroleum industry, which drills down (literally) through all the layers of shallower seas from surface through ecologies to the geology below. An increasing body of legislation, and of scientific analyses to support it, seeks to tie producers into environmental protection systems. Compliance evidencing is built into the planning of new facilities, with data collection, storage, management and analysis an important part of the process. The opening up of Sakhalin, a closed area in cold war days, to development of a vast oil and gas project includes an environmental monitoring laboratory complex. The operator, Sakhalin Energy, runs a central LIMS (not Nautilus, this time, but SampleManager – also from the Thermo Scientific stable) serving an extensive system of realtime and long term analytic and reporting programmes that cover both the island and the surrounding Sea of Okhotsk.
Further down, below the fished zones, life continues – largely ignored, but by no means unaffected by human activity. Since the layers are not systemically isolated from one another, damage below often rebounds as damage above and a growing degree of analytic attention is turning to what happening in deeper habitats. Cross boundary predation is one factor. Nor is the food link the only area of concern. Another is the impact on microbial populations whose survival seems to depend upon ‘a collection of very subtle adaptations ... and genomic plasticity to cope with the sparse and sporadic energy resources available ... and ... unique proteins not found in surface-dwelling species’11, all of which may be vulnerable to pollutants drifting down from human activity regions. Since the exact operation of the oceanic biological carbon pump is still only sketchily understood, developing analysis of linkages to deep sea microbial life may prove to be an essential requirement for future climatological discussions.
Surface alkalinity maps, generated by Ocean Data View, from the US Department of Energy’s Carbon Dioxide Information Analysis Centre. The oceans play a part in the carbon cycle whose importance is only now beginning to be properly understood.
Right at the bottom, of course, comes the physical ocean bed – or, perhaps more accurately since there is even greater variation here than across the planet’s land surface, beds. Two tools that pop up here with particular regularity are Ocean Data View and Golden Software’s Surfer, both of which construct sophisticated visualised analytical data worlds. There is life here, too, and we are only beginning the task of studying it, never mind analysing the results. Geology becomes prominent, from archaeosedimentary analyses to volcano counting in the deepest reaches.
Data analytic attention to deep marine ecosystems has often focused on archaea, and there is considerable interest in the study of those organisms that flourish in conditions inimical to most forms of life and offer clues to what might be possible in extraterrestrial environments. Other forms are there too, though. There have been interesting analyses of parasitism within these communities, for instance, and of surprisingly broad fungal diversity12. Metagenomic analysis is an ascendant analytic arena, where you can find almost every software tool in use, but heavy duty applied numerical packages are much in evidence. The research group who gave me most of my inside access in this area (to whom many thanks, and my regrets that their sponsor forbids publicity) were running MatLab intensively, under symbolic direction from Maple.
Deep seabeds are significantly closer to the earth’s volcanic processes than most land areas, and host a significantly higher percentage of resulting activity. They also bear the pressure of thousands of tonnes of water piled up on every square metre. Various estimates of the number of undersea volcanoes have been made, all of them based on extrapolation from quite small studies, but the number seems likely to be in the millions.
This is where continents, tsunamis and quite possibly the first building blocks of life are made. Here, more than in any other macro scale terrestrial context, computerised data analysis is not just a beneficial development but the only eyes and ears available to researchers.
Direct inspection (primarily by robot submersibles that also gather nonvisual data) has its place, but is inevitably limited to small localities; only remote sensing programmes of various kinds, and subsequent data analytic processing, can give either the overall big picture or widespread detailed and reliable subsets of it. Oceanographic surveying is a portmanteau activity, pooling information from fragmentary commercial, military and pure scientific activity and knitting it all into a growing and improving, but still very sketchy, analytic whole.
The sea bed is not limited to the deeps, of course. It slopes up to become the continental shelves, where it eventually becomes part and parcel of the fishery layers and, closer still to land, becomes a primary component of surface zone systems. In those places where the sea bed only just remains below the surface, as tides ebb and flow, can be found the most rapidly changing and complex biological and physical analytic régimes. My Australian algal bloom researcher mentions that colleagues studying the interaction of current, nutrition and erosion in Great Barrier Reef ecosystems sample numerous selected sites at temporal and spatial resolutions of 10Hz and 10mm respectively, and still suffer ‘analytical blurring’.
Here, too, is the most immediate interaction between sea and land. The erosive effect of water movement is most intense at the land-sea interface, producing radically different solute profiles. Rivers have a similar effect and simultaneously reduce salinity in complexly various ways while subjecting the sea bed to both erosion and deposition processes. Run-off has a less acute, but equally chronic influence. Longterm analytic studies show an extensive and highly involved complex of two way linkage between land and ocean effects as coastal bioscape features such as mangroves and seagrass meadows recede or modify.
Wide is the ocean ... certainly wider than an article of this length, and too wide for human comprehension in any traditional sense. A progressively improved composite jigsaw of data analytic pictures is the nearest thing to an all encompassing view that we are ever likely to get.
1. Matthews, C., ‘Ocean’, on Cockahoop. 2003, London: Warner Music. 2564-60306-2
2. The GENIE project. Available from: www.genie.ac.uk.
3. Edwards, N.R. and J.G. Shepherd, ‘Bifurcations of the thermohaline circulation in a simplified three-dimensional model of the world ocean and the effects of inter-basin connectivity’ in Climate Dynamics, 2002. 19(1): p. 31.
4. Lenton, T.M., et al., ‘Using GENIE to study a tipping point in the climate system’ in Phil Trans R Soc A, 2009. 367(1890): p. 871.
5. Gill, R.E., Jr, et al., ‘Extreme endurance flights by landbirds crossing the Pacific Ocean: ecological corridor rather than barrier?’ in Proceedings of the Royal Society B: Biological Sciences, 2009. 276(1656): p. 447.
6. Zhang, W., Application of Bayesian Inference Techniques for Calibrating Eutrophication Models. 2009.
7. Devlin, R.H., et al., ‘Domestication and growth hormone transgenesis cause similar changes in gene expression in coho salmon (Oncorhynchus kisutch)’ in PNAS, 2009. 106(9): p. 3047.
8. Uhlmann, S.S., et al., ‘Mortality and blood loss by blue swimmer crabs (Portunus pelagicus) after simulated capture and discarding from gillnets’ in ICES J. Mar. Sci., 2009. 66(3): p. 455.
9. Akpalu, W., ‘Economics of biodiversity and sustainable fisheries management’ in Ecological economics, 2009. 68(10): p. 2729.
10. Clemente, S., J.C. Hernandez, and A. Brito, ‘Evidence of the top-down role of predators in structuring sublittoral rocky-reef communities in a Marine Protected Area and nearby areas of the Canary Islands’ in ICES J. Mar. Sci., 2009. 66(1): p. 64.
11. Konstantinidis, K.T., et al., ‘Comparative Metagenomic Analysis of a Microbial Community Residing at a Depth of 4,000 Meters at Station ALOHA in the North Pacific Subtropical Gyre’ in Appl. Envir. Microbiol., 2009. 75(16): p. 5345.
12. Le Calvez, T., et al., ‘Fungal Diversity in Deep Sea Hydrothermal Ecosystems’ in Appl. Envir. Microbiol., 2009: p. AEM.00653.