DATA ANALYSIS: ECOLOGY

All the world's a stage...

All the world's a stage...
A crow, crossing a cleared ground corridor across its territory on a military training site, acts as a transmission vector maintaining partial continuity between subecologies on either side.

Felix Grant explores how data analysis is applied to the field of ecology

Scientific Computing World: October/November 2008

Ecology has come a long way from its first emergence as a distinct field of study in the 1970s. Like me, over the same period, it has moved from cuddly hippiedom to a strongly data-centred, systems view of the world.

OK, so that’s an oversimplification. From another view it was a datacentric approach, and offered increasing access to computerised methods, that enabled and drove the newly separated discipline in the first place.Nevertheless, the engine that drove undergraduates of my generation in ecological directions was the emotional impact of Carson’s Silent Spring[1] or its successors, with the likes of Kershaw’s Quantitative and dynamic plant ecology[2] simply providing the means. Only later did the study itself become the point of the game.

I am finalising this article during a break between sessions of a planning meeting in which Russian hosts discuss details of a major migration study with opposite numbers from every continent except Antarctica. Thirty years ago, even if such a meeting had been imaginable, the attitudes of those participants would have been different. Even now, those participants’ approaches here are subtly different from those which they display on home turf. As with many areas of scientific computing, this is down in large measure to the internet and other distributed information, analysis and communication developments: access to global data encourages not only objective analysis, but methodological comparison.

That globalisation of data also underpins the nature of the field itself. Ecology is, by its nature, interconnective; arguably, it really exists only in those interconnections and not in the nodes that belong in other disciplines. While manageability requires that ecotyping of bacterial systematics[3] and global migration of viruses[4], crop pathogens[5], birds[6] or humans[7] are distinct data collection projects, all are subsets of a larger and ever growing data pool on the biosphere as a total system. Human minds are reasonably good at comprehending the two limits of that range, but the infinitely complex network of levels between is more difficult and without scientific computing would likely never be addressed at all. Since localised human impacts now constitute a major input to global habitat output, filling in those black box blanks is a priority for any ideas of a managed future.

It would be ridiculously over the top to suggest that any real approximation to a detailed whole planet ecology model (or ‘gaia model’ as some describe it) yet exists, but work towards joining up some of the dots is being done throughout the extent of that connective tissue. Particular foci of attention include critical examination of local interventions, construction of supersets from local data, and interlocal vectors.

A Miner3D visualisation of multivariate data from a military site.

Local interventions, such as the establishment of reserves or corridors, encouragement of dual use, and so on, are nothing new. Serious attempts to analytically tie them in to a larger picture, however, rather than working on qualitative working assumptions, are less common. Statistical examination of both internal and external site connections is getting increasing attention, though. The most rigorous examples, ironically, are often in corporate and governmental developments where environmental awareness might be least obvious. In particular, militaries around the world are discovering an ecological zeal, which a more cynical writer than I might ascribe to public relations support for retention of valuable land, and are applying sophisticated analyses to evidence benefits.

Militaries also have particular views of public relations information exchange: scientists, soldier bureaucrats and software suppliers alike were everywhere keen to discuss data analytic ecology projects in considerable detail, but less willing to be quoted. There is a lot of information available in the public domain (see, for example, the EU’s LIFE, Natura 2000 and the Military[8]), but detailed data analytic specifics tend to come hedged about with limitations and nondisclosure agreements. An interesting picture still emerged, even if it has to be pieced together and drawn in general terms. The analytic infrastructure varies widely, from one person in a broom cupboard with a PC to large and well-funded departments with high performance computing access, but value of the work done does not seem strongly correlated to extent of facilities.

Two very different Commonwealth sites (an urban warfare training ground and an artillery range) are modelled by single committed individuals. In both cases, dedicated equipment is limited to a ruggedised laptop computer carrying low powered software. In both cases, a deal has been done with a university or another government department under which partially anonymised and coded data are supplied in return for GenStat analyses returned. Both models show (albeit on radically different scales) patterns of regeneration around centres of destruction (whether by CS smoke grenade or large calibre shell) producing webs of cause and effect through their surroundings. Since the statistical likelihood of such destruction at any given point is small and roughly Poisson distributed, the effect is analogous to natural hazards such as fire or lightning strike.

Other internal patterns of destruction, lower in intensity but more extensive, include routes frequently and/or heavily used by armour. Not only crushing everything along their path but laying down pollution layers, these drastically alter conditions in long corridors which are irrelevant to birds, flying insects, airborne seeds and spores, and so on, but completely disrupt many ground based biophenomena. They form differentially effective ‘fire breaks’, which in some cases interdict predation and lead to recognisably different subecology variants on either side. Analyses of these effects show both benefit and damage, not to mention a range of broadly neutral effects, with no natural analogues.

Overall, the limited case evidence I saw seems to suggest that diversity is increased and fostered by both point and line effects. As one site commander cheerfully commented: ‘From bacteria up to carnivores and trees, the message is that if we don’t actually blow you up or squash you, we generally give you a good life.’

That GenStat is used in both cases is partly down to historical links, but the software itself is particularly well suited to work in ecology. Originating in agriculture, its development fed by a large life sciences user base (though GenStat is generically applicable, VSNi often uses the subhead ‘software for bioscientists’), it is currently handled by a company (VSN International, or VSNi) with close links to Rothamsted Research. Rothamsted not only gave birth to the software, but does serious ecological work of its own. It has inbuilt blocking and randomisation underpinning its models and methods. Subject-specific additions in the three most recent releases (9th edition onwards) include the Diversity Indices and Species Abundance menus (under Summary Statistics and Distributions respectively). Implicit indicators can be found throughout the package in, for example, CRBIPLOT, which has ‘sitescores’ and ‘speciesscores’ as plot options, with species as the default, or the inclusion of ecology as one of two usage examples for the LORENZ procedure. Then, there is ethos: in any company you will find individuals with idealistic commitment to goals beyond the bottom line, but VSNi is unusual in putting them up front. It’s the only company where my question ‘why ecology?’ has been met with ‘because it’s important, because it’s neglected by others, and because it’s fun’.

Not, of course, that GenStat is the only option. Across military sites surveyed I found just about every software package known to statistics, from SigmaPlot and Unistat to SAS – and some generic mathematical ones too, with both Maple and Mathematica cropping up in similar university links.

Miner3D offers a military site case study of physical processes monitored over time through control well sensors. Since its publishers are subject to the same restrictions as I, and cannot reveal site or client details, there’s no saying whether this coincides with sites where I also found their software in use. Elsewhere, the highly visual approach is being used to explore the spatiotemporal relations between impact centres or interdiction zones and change in organism community balances elsewhere across the site as a whole.

Looking at the data analysis results, regardless of the tools used, brings home the fact that my separation of locality and interconnection is a convenient fiction. ‘Local’ has a very different meaning for a bacterium or earth worm and a fox or swallow, and what constitutes a significant interlocal vector changes scale accordingly. A local intervention site, like any subset chosen for study at any scale, is only a subset of the whole and replicates within it the same mycelium-like structure of cause and effect that connects it to the larger outside.

Conversely, supersets are also largely in the eye of the beholder – an aggregation of observationally convenient subsets to extend understanding. Size and complexity increase, but principles remain much the same. And any subset or superset can be defined either by membership or a distinction between internal and external vectors.

A coastal site maintained in undisturbed condition by restricted access.

The Australian National Variety Trials (NVT) database[9], although it aims to increasingly homogenise data, is a good example of a superset aggregated from existing heterogenous subsets. A huge amount of data drawn from earlier state level programmes underwrite the new national database, which is accessed by web browser and analysed through a linear mixed model in two stages, by statistical and biological reliability weighting, powered by ASReml (again, from VSNi). Users of the database can run reports by crop (10 are offered), state, region and season, to find data applicable to their local conditions, but taken together it represents a continent-wide resource in one place. A few quick exploratory runs confirmed that even a nonsubscriber can draw useful comparisons by overlaying its output on other data.

Very often, aggregation is more ad hoc and opportunistic. Across the developing world, in particular, small groups or individuals are assembling supersets from wildly disparate existing research output in order to then resection them in different ways and derive new outcomes. African elephant movements, a particular plant seed, a rare snail and a benign virus feature in one particularly intriguing study which is, frustratingly, again under military wraps.

Another study[10], also involving African elephants, but free of knee-jerk confidentiality, relates them to human population expansion, arable farming, bee hives, environment descriptors and trees. Analysis (GLM) is once again courtesy of GenStat. The hypothesis being investigated here is that bee populations (or aural simulations of them) may form interdiction barriers that can be utilised to separate grazing elephants from crops, to the benefit of both.

As I noted above, ecology in its purest form arguably consists of the arcs in the network rather than the nodes that they connect. They are incalculably many, and their total interaction impossibly complex, but that doesn’t stop attempts to tease out, unravel and map some of them as informational entities. Among the most studied arcs are interlocal vectors tying one conceptual entity to another by input/output transactions. The assembly of arcs into a growing armature gives form to ecology, rescuing it from fuzzy narrative subjectivism and giving it concrete objective existence, and is entirely dependent on computerised data modelling as its growth medium.

A vector can be literally anything that moves, or even just an aspect or characteristic of something that moves. Where it is a biological entity its effect may be only on its own population, on that entity’s environment, or (more commonly) both. Migrations are, on most scales of measurement, interlocal – connecting two or more locations, carrying some effect between them. Birds are the best known and, in some cases, farthest reaching examples, but the recent flurry of popular press and internet interest in stingray migrations reminds us that they are not alone. Birds and insects are also interlocal vectors on smaller scales. A frequent phenomenon in the military site studies was the presence of distinctly different microenvironments on either side of an interdiction corridor (such as a tank route), but with certain homogeneities replicated and maintained by passage of birds across them.

Examples of the organism as vector within its own population are commonplace, but a recent example of particular interest is the Tasmanian Devil [11], which is suffering from a transmissible cancer. Its own biting behaviour transmits a rogue cell line producing facial tumours. Not only are the tumours fatal in months, but they inhibit sexual contact and devils are starting to breed in adolescence with only marginal viability.

Human migration is, of course, not to be forgotten and dramatically increases with time; a recent Proceedings of the Royal Society B paper on comigration of humans and mice has also seen wide press coverage, as has the role of American grey squirrels as carriers of a virus fatal to indigenous European reds.

ASreml powered output from Australian National Variety Trials and, lower foreground, GenStat menu with Diversity measures item.

Nonbiological vectors associated with human movement, particularly land transport corridors such as rail and motorway networks, are well known and much studied, but military movements are a component that has recently come under the same scrutiny as military sites. A particular feature of these is that they move repeatedly between a limited number of locations, with much less control than civilian traffic, and (especially but not only in operational conditions) visit highly disrupted environments. One of the sites visited has vigorous anomalous populations of organisms otherwise found only on another continent where there are operational or formerly operational theatres.

The natural final home for testable vector hypotheses is simulation software, where they can not only be examined, but induced to interact as well, though a lot of upfront analysis goes into identifying them in the first place. I have a courtesy copy of SIMILE, from Simulistics, and it does a beautiful job of bringing to life the exploration of GLM collections extracted from raw data by the likes of GenStat. Roughly half of the military sites examined had some form of GUI-driven and UML-compliant software in use for the examination of internal vectors, in some cases external vectors as well.

And what about the hippies? I opened by welcoming their transformation to technocrats, but technocracy without ideals would be a cold and unproductive thing. I’m glad to record that the earlier spirit still lives on alongside the new. People like those at VSNi and elsewhere, with value-driven qualitative frameworks for their quantitative work and business models, are an essential compass. And at least one well-respected quantitative ecologist, who contributed significantly to this article and was not even born when ecology made its separate appearance, has a private email address with the word ‘hippie’ incorporated within it.

Sources

Miner3D, Miner3D, info@miner3D.com
SAS, SAS analytic software, http://www.sas.com/apps/forms/index.jsp?id=genuk
Simulistics, Simile, info@simulistics.com
Systat Software, SigmaPlot, info@systat.co.uk
Unistat, Unistat, unistat@unistat.com
VSN International, Genstat, enquiry@vsni.co.uk

References

1. Carson, R., Silent spring. Repr. with a new afterword. ed. 1962, London: Penguin Books in association with Hamish Hamilton, 1999. 0140273719

2. Kershaw, K.A., Quantitative and dynamic ecology. 1964, London: Arnold.

3. Koeppel, A., et al., Identifying the fundamental units of bacterial diversity: A paradigm shift to incorporate ecology into bacterial systematics. PNAS, 2008. 105(7): p. 2504-2509.

4. Nelson, M.I., et al., Phylogenetic Analysis Reveals the Global Migration of Seasonal Influenza A Viruses. PLoS Pathogens, 2007. 3(9).

5. Munkacsi, A.B., S. Stoxen, and G. May, Ustilago maydis populations tracked maize through domestication and cultivation in the Americas. Proceedings of the Royal Society B: Biological Sciences, 2008. 275(1638): p. 1037-1046.

6. PRBO: Pacific Shorebird Migration Project. 2007.

7. Hugo, G., Population geography. Progress in Human Geography, 2007. 31(1): p. 77-88.

8. Gazenbeek, A., LIFE, Natura 2000 and the military. Life Focus, 2005.

9. NVT Online. Available from: http://www.nvtonline.com.au/.

10. King, L.E., I. Douglas-Hamilton, and F. Vollrath, African elephants run from the sound of disturbed bees. Current Biology, 2007. 17(19).

11. Cancer forces Tasmanian devil onto endangered list. Nature, 2008.