DATA ANALYSIS

Sploshing about in ponds, and other stories

Sploshing about in ponds, and other stories

Felix Grant reports on how electronic equipment and scientific processes have modernised fieldwork

Scientific Computing World: August/September 2007

There was a time when fieldwork was the heart and soul of science practice. No reputation was complete unless it was based on a healthy chunk of time spent in the field. Lab time was important, but as a phase in a process that centred on discovery in the field. Those days are gone; the focus of glory has shifted to the laboratory, or even gone entirely in silico. That doesn’t mean that the importance of the field worker has declined; science is still a whole process, not just a few high-visibility star turns, and the field worker is still an essential part of fabric.

But, of course, to misquote Mandy Rice-Davies only slightly: I would say that, wouldn’t I? My love of science started with digging up rocks, gazing at real stars on freezing nights, and (best of all) sploshing about in murky ponds. Furthermore, my current livelihood is built upon the willingness of science to indulge my continued delight in grown up versions of such pursuits. But it’s true nonetheless: new science can continue for a long time on the basis of existing data and increasingly sophisticated methods, but ultimately requires the input of new data which, in many cases, only fieldwork can supply.

Fieldwork has changed with the times, though. Today, as I splosh into my murky pond, an array of electronic equipment is at least as important as the jamjar, the net on its bamboo stick, or Clegg’s Freshwater Life of the British Isles[1]. Nor are those electronics any longer limited to data gathering or reference: qualitative, practical and budgetary considerations all require increasing levels of concurrent analysis as well. This has led to a radical revision of traditional practice, with a repeated cycle of rigorously separated phases now collapsed into the field itself. The software to do this will, in most cases, be whatever subset of that used at base can be carried into the field within hardware constraints, although there may be control and communication considerations to which I’ll return.

Much depends on exactly what ‘fieldwork’ really means, and that in turn depends on who answers the question. Observing archaea around a deep ocean vent, or counting small mammal droppings on a mountainside (adult equivalents of sploshing about in ponds) are clear cut and obvious cases. But what about civil engineering tests on motorway bridges, or studying the operation of an industrial facility? Or, in the other direction, Spirit and Opportunity rovers on Mars, or the deep space Voyagers? One study of declining emphasis within environmental science education[ 2] defines field work as ‘any study ... that takes place outside the classroom’; it seems reasonable to modify that to ‘any study that takes place away from the base work place’.

Having defined the field, how much analysis is done there depends very much on the circumstances. In essence it comes down to a single question: why analyse data in the field, rather than do it later or submit for remote handling? For the time being, at least, ‘in field analysis’ means less power and fewer resources than either of the other alternatives, not to mention weight, cost, maintenance, and other penalties, so there has to be a clear payoff to make it worthwhile. The main constraints on that payoff are usually defined on one hand by how far the cost effectiveness of the fieldwork can be enhanced by using analytic results to inform conduct of the study in progress, and on the other by availability of fast and reliable telecommunications. There may be other constraints in particular circumstances, commercial data security being a common example.



Setting up a DASYLab and Flexpro equipped laptop to record subsonic vibration patterns in a slate foreshore. Detailed sampling strategies will be decided on spot, on the basis of analysis from initial skeleton records.

Cost effectiveness is almost always enhanced by some level of immediate analysis, and the margin of advantage increases with distance or difficulties of access. Often a very superficial treatment of data from an initial preplanned data collection will throw up apparent distribution patterns that justify further investigation with shaped sampling methods, and it’s very rare for later return to the field to be cheaper or more convenient than immediate reexamination at the time. If, for instance, sampled levels of a pollutant are within a close band of levels across the whole of a study area, but dramatically higher or lower at two locations within it, then more intensive sampling around those two points will be invaluable in establishing the shape, extent and gradient of the anomalies. The study area may be in a remote region of a continent halfway around the world from my base, with a climate that has involved assembly of equipment able to operate in extreme conditions. In that case, an extra couple of days or a diversion of effort from something else may well be a minor cost while returning could be impossible. Even if I have only travelled a few miles up the motorway, it may be some time before another day can be taken out of a busy diary for a return trip. There is also the question of transient phenomena, which may be spotted through sampling at one time but not be present on a later return; striking while the iron is hot can greatly improve understanding. Another aspect, important in gaining acceptance and cooperation in many areas of fieldwork, is the importance of communicating to local interested individuals or groups what is being done; raw data cannot do this, but analysed results can. As Pomeroy and Rivera-Guieb[3] summarise it, in relation to Canadian fisheries research: ‘The focus of the assessment can be adjusted in response to learning acquired in the field, making it an adaptive process ... understanding of local conditions can be better used ... Stakeholders can participate in analysis, increasing their sense of ownership ... Mistaken assumptions that may have influenced the design of the assessment can be corrected...’

The other constraint, availability of data telecommunications, is less clear cut. In principle, there are very few places on earth now beyond signal reach – but the speed, quality and reliability are more problematic in practice. Industrial sites will almost certainly have high grade ADSL landline links, but commercial paranoia may prohibit external handling. Cellular telephony saturates large proportions of most developed countries, but the transmission speed (even on GPRS, the so called ‘mobile broadband’) is low – and few activities are more likely to take place in areas without coverage than rural fieldwork. Global coverage is, in principle, available from satellite systems, but in practice is vulnerable to interruption either by blockage or bureaucratic complexity. Satellite transmission rates are again low, while the necessary portable equipment also adds an extra overhead to weight, bulk and power consumption.

This current situation may not persist. There was a time in the 1980s and 1990s when development of usefully portable equipment followed a more rapidly rising curve than useful coverage by data communications, but that has now reversed and the break-even point seems certain to shift increasingly in favour of remote processing in a greater number of circumstances. For now, however, local processing of data is often the most attractive option, especially where large data volumes are involved.

Moving away from earthbound concerns for a moment, the future of space exploration may also depend on increased autonomy in both analysis and the decisions flowing from it. A colleague in the aerospace industry is working on theoretical bases for a self-educating class of deep space probe artificial intelligence systems, which will have discretion to alter sampling plans, and to a limited extent adjust flight plans too, in response to results of onboard analysis. Raw data, analysis results, and decision processes, will all be signalled back home, and instructions received in return, but in the growing gaps caused by light speed delay the probe will make its own interim best choices based on a balance between strategic aims and short term maximisation of data quality.

Meanwhile, back on terra firma, selection of equipment for fieldwork splits according to the three-way tension between portability, durability, and analytic capacity. The division is a fairly simple one, in essence. Where a vehicle is available, weight and bulk drop to the bottom of the priority scale with modern equipment. Motorised vehicles can easily supply the necessary power; even a sailboat (I know of at least two current long-term field studies based on engineless nine-metre craft) can carry a petrol- or wind-powered generator, solar panels, and so on. For an expedition forced to operate on foot, however, each gramme has to be justified and battery life is crucial. Durability, dependent on operating conditions, ranges from normal commuter knockabout standard to full hardened military ruggedisation (see box); here, of course, the backpacking researcher with most need of protection can least afford to carry it.

Portable power supplies for a handheld running on disposable cells are a trivial problem these days (lithium cells can keep a researcher in the field for a year if need be, without significant weight penalty), but rechargeable models and laptops require outside supply. Solar panels allow intermittent use, but not usually sustained operation unless they are large and the location is favourable. Some small fuel cell designs have proven workable although these are still early days.

Hand-wound clockwork is an inviting idea although not generally available. Professor Greg Parker, at Southampton University, did some work on this a few years back, but found the equipment of the time not amenable. The One Laptop Per Child project has made good progress, however, using low power consumption technologies, and perhaps this is an approach which will resurface in the future. Hand, heel and water-driven battery rechargers are other currently viable options that point in the same direction.

Software for the field tends, as anywhere else, to follow needs and be limited by hardware. On a handheld, despite the surge of development in recent years, analysis is limited to manual methods – probably as a check and guide for sampling, or one component in a larger model development and refinement loop. Software will be generic rather than specifically data analytic, and computational rather than algebraic – probably one of the Matlab simula such as MtrxCal, PDAcalc Matrix, or Lyme. On a laptop or similarly-specified PC equivalent machine, obviously, software can be exactly the same as on the desktop: if GenStat, S-Plus or Statistica is in use at base, then there is no reason why it shouldn’t go into the field as well, although power constraints may limit its use in some circumstances – Unistat is particularly good at providing more analysis for less power consumption. Where computers are to operate unattended, even for short periods, automation is important and this will be a consideration; in a few cases, this leads to whole bespoke software systems being written from scratch, but an appropriate combination of off-the-shelf products will almost always suffice for terrestrial purposes.

A particular field convenience in many cases, and increasingly a necessity in some, is direct connection of data analysis to data collection (including, in most cases, positional systems). My own standard statistical software workhorse provides several methods, an inboard command language underpinning most of them, which I often put to work in industrial contexts, linking to in house QC systems. Many products provide Visual Basic, DLL or ActiveX bridging methods (OriginPro, for example, handles all three and is particularly intimate with LabView). Whatever the mechanism, the operation falls into one of three types: analysis interrogates acquisition records, acquisition inserts into analysis worksheet files, or an intermediate database store serves both.

FlexPro, which I’ve recently been putting through its paces on a multiple site urban pollution study, has a couple of routes to such integration. There is an Automation Object Model, but also the FPAccess interface that allows opening and closing of FlexPro databases, direct high volume data transfer, and external control of some FlexPro functions, even if FlexPro itself is not installed. The FlexPro data-saving module for DASYLab streams, which has been controlling the conduct of volatile pollutant source sampling, I shall be reviewing separately, but briefly it allows an easy-to-use choice of direct (but slower) data flow from catchment into analysis or buffered (but faster) dumping into a database, which is opened for analysis either subsequently or in gaps in the stream. The selection of which linkage to use depends partly on the rapidity, continuity and density of data capture, partly on the criticality of immediate response. The latter, in turn, varies with the context. For most industrial cases, this is a decision that can easily be made in advance; pond sploshing scenarios such as my pollution study also tend to fall into one camp or the other, and decision-making is in any case usually a human speed business. Remotely controlled fieldwork in inaccessible locations may be a different matter, in which case a layer of automated decision-making can be added through programming file operations against (for example) data or descriptor thresholds.

The future of field science seems to be a mixed and evolving one, but not in doubt. For a long time (certainly, I hope, long enough to see me out!) there will be a need for human field workers – but they will be accompanied increasingly by field analysis tools of greater sophistication, which will be accessed remotely more and more though global communication networks of increasing coverage. At the same time, as our sphere of curiosity extends further from our comfort zone into the seas, the earth, the rest of the universe, less and less of it seems likely to be conducted in person by human beings – or even, in the latter case, by remote human direction. Scientific computing will increasingly take over the role of scientific human beings; but field data will always be needed.

References

1. Clegg, J., The freshwater life of the British Isles. 1965, London, Frederick Warne.

2. Scott, I., I. Fuller, and S. Gaskin, Life without Fieldwork: Some Lecturers’ Perceptions of Geography and Environmental Science Fieldwork. 2006. 30(1): pp.161-171.

3. Pomeroy, R.S. and R. Rivera-Guieb, Fishery co-management: a practical handbook. 2006, Wallingford, CABI.

9780851990880

4. Department of defense test method standard for environmental engineering considerations and laboratory tests, U.D.o. Defense, Editor. 2000, US Department of Defense. p.539.

Ruggedisation

Ruggedisation (or, commonly, given the US provenance of the word, ‘ruggedization’) is a broad term, covering a number of different approaches to making equipment that survives heavy field usage. It may be applied in a general way, or in reference to particular (usually US military) specifications. At its simplest level, it means an outer casing designed to take more than the usual level of knocks and resist light liquid splashing. One up from that is a similar level of protection from vibration and shock for internal components. A friend of mine designed a PC of this type inside a standard NATO military surplus ammunition box, with rubber mountings inside and a roll-up keyboard: ideal for travel in the back of a field study Land Rover, or rattling around a laboratory wet bench circuit. At the exotic (literally) end of the scale, specialist circuit boards may be built to cope with the radiation levels, in vaccuo cooling problems and long term component failure backup redundancy needs associated with space research applications.

The relevant US military standard[4], periodically revised and currently at version ‘F’, (MIL-STD-810F), specifies separately and in combinations for pressure and temperature extremes, fluid contamination, solar radiation, varieties of rain including freezing and icing, humidity, fungal growth, salt fog, dust and sand, operation in explosive and acidic atmospheres, immersion, acceleration, vibration (including gunfire), acoustic noise, ballistic and other physical impact shock, and fire. There are commercial suppliers offering laptop equipment at this level of protection to scientists who need it – notably Panasonic, in some of its Toughbook models, and the Durabook range.

Science, even in the field, is not the same thing as combat, and it’s best to start from scratch in deciding what you need rather than just buying automatically into somebody else’s standard. I’ve found over-the-counter commodity laptops, subnotebooks and handhelds amazingly resilient in conditions for which they were never designed. An ancient A5 Toshiba Libretto, bought a decade ago and long discontinued, now on its fourth hard drive upgrade and running from a portable solar array, is still giving sterling field service despite horrific abuse (though I’ve lost a number of other machines: be diligent about back up of both data and systems). The secret of the Libretto’s survival is not hardening but small size.

Then again, there are circumstances where a scientist in the field may want something for which even the military has no use: a colleague is in the process of building, for an undisclosed client, a bespoke computer capable of sitting and doing its unaccompanied thing for extended periods without additional protection under a sustained fluid pressure of 650 atmospheres.