Thanks for visiting Scientific Computing World.

You're trying to access an editorial feature that is only available to logged in, registered users of Scientific Computing World. Registering is completely free, so why not sign up with us?

By registering, as well as being able to browse all content on the site without further interruption, you'll also have the option to receive our magazine (multiple times a year) and our email newsletters.

In this case

Share this on social media:

Topic tags: 

Felix Grant delves into the real-world application of statistical software

Fishing for data

Impacts of fisheries on the marine environment are a matter not just of scientific interest but of economic sustainability. Knowing how to manage these impacts means gathering geospatial and fishing intensity information on a large scale, for which reason there are vessel monitoring (VM) systems whose data are archived and, to an increasing extent, shared. From gathered data, analyses at regional, national and international scales seek to assess, amongst other things, how the choice of fishing ground definition criteria influence effects on stocks, collateral species, habitats and seabed ecologies, and so on. They also feed into assessments of how such criteria affect size, shape, location and overlap of actual fishing grounds in practice, across seasonal and long-term activity patterns.

It seems, from these analyses, that defining a fishing ground by exclusion of infrequently fished areas produces smaller operational loci but increases extraction rates at the margins. Regulatory criteria which favoured removal of activity at those margins, encouraging fleets to move inward, would reduce impacts and minimise interaction between fisheries.

The resolution of VM data collection, however, is an issue. If that resolution is too low, it can fail to adequately describe movements within the study space. Too high and given the large size of that study space, the costs of data collection, handling and management escalate rapidly. At present, the general scientific consensus within this field is that sampling frequency is too low for much further improvement to be researched. Increasing the resolution by raising the sampling rate would improve understanding of the detailed activity breakdown within individual fishing grounds.

To assess the cost benefit balance of different sampling rates, researchers gathered a body of high-frequency position data for comparison with existing practice on collection and analytic methods. Comparing the more exact positional information available from this high-resolution sampling with conventional predictions derived from low-resolution track interpolation showed the reliability of interpolation varied widely, depending on particular fleet behaviours.

Furthermore, the differences affected subsequent environmental impact calculations with track reconstruction from lower grid cell resolutions, producing underestimates compared to higher density sampling. A blend of higher sampling density and compensation factors would allow for this, and the researchers come to the conclusion that 30-minute data capture intervals would supply the optimum compromise between resolution and cost.


Statistical prophylaxis

Ibrahim is a vet, working a rural patch in a group of East African agricultural communities. His territory is not a stereotypical picture of African poverty, with a population which is well fed and, though they have to be careful to plan ahead, reasonably secure. There are primary schools in every village and secondary education in some. On the other hand, infrastructure is thinly spread and sometimes distant. There are occasions when Ibrahim, in times of acute need, has to apply his skills to human patients.

His resources, when he is at his base in a central village, include a Land Rover, a solar array backed up by a petrol generator, a small rudimentary laboratory, a laptop computer and the only telephone in the area. On the computer is installed a copy of GenStat Discovery Edition, a version of VSNi’s flagship statistical package which is freely distributed to researchers throughout the region and supported by a university several hundred kilometres away.

Last year, Ibrahim began to encounter clusters of unusual acute symptoms, outside his experience and affecting farmed ungulates and the humans who tended them. Unable to identify the cause, he sent samples off for examination in the capital. Waiting for a diagnosis, which would certainly take some time and might not come at all, he turned his mind to interim measures.

Not knowing the cause, he could make no assumptions about transmission vectors and could not enforce draconian measures such as quarantine or precautionary slaughter just on the off chance they might be effective. Using GenStat to analyse data on time, location, severity and duration of outbreaks, however, he was able to identify patterns. There was a progressive model of spread, with incidence having started close to the road leading to the regional administrative centre and tending, on the whole, to move from one community to the next nearest. Jumps from one focus to another rarely crossed water; the exceptions being where there was frequent traffic between them. After a week or so, those animals which had not died ceased to display symptoms, although there was limited recurrence.

Working from these findings, he encouraged villagers to keep goats and cattle inside areas surrounded by irrigation ditches where possible. Those tending the herds he advised to sleep on the other side of a channel from their charges where feasible, and not to make unnecessary visits to neighbouring communities.

Within a month, new symptomatic outbreaks had become rare. Six months later, though the problem had yet to be identified, it seemed to have disappeared completely. 


Ear we go

Sensory organs are extraordinary instruments in many ways, not least the range of input magnitudes with which they can cope without damage or loss of perceptual resolution. In that latter respect – the ability to discriminate between signal and noise at very low amplitudes – the ear is particularly noteworthy. At its lower limit of perception, human eardrum displacement of less than the diameter of a hydrogen atom can be interpreted as useful information. This performance is achieved despite levels of thermal noise 10 times greater than this threshold signal.

Researchers at Rockefeller University in New York addressed this puzzle using a statistical approach based on fractional Brownian motion modelling. Their description of their work is an intriguing detective story of exploratory hypotheses guided by experimental results, with analysis and modelling conducted in MatLab.

The physical mechanism involved uses a system of hair bundles in a fluid medium, and using microrheological methods the researchers examined the statistics of their thermal fluctuations.

What they found was that the motion of a hair bundle didn’t match what might be expected; autocorrelation showing a power-law relationship and an unexpected frequency slope. From this, they deduced that the thermal motion of the bundle is probably due to subdiffusion, a hypothesis they proceeded to cross test.

Having established by further data analysis that observed fluctuations were not influenced by instrumental factors, they further hypothesised that the observed phenomena could be explained by coupling of an elastic element with anomalous fractional Brownian motion. They therefore needed to identify a feasible physical structure and suitable viscoelastic model which would support this hypothesis. This was achieved by comparing power spectra statistics from physical experiments with theoretical predictions through maximum likelihood fitting.

The result is a picture of a frequency-specific differentiated transduction process, which passes signal preferentially compared to bands in which noise predominates. 


Striking attitudes

Academic studies often arise from an original intuition or observation. They also, very often, have practical implications. Phuoc is a young researcher, at the very beginning of a career in academic social science and embarking on her first self-directed research study. She proposed an investigation into individual responses to sexual orientation and their implications for social policy in changing societies, referenced against the work of Gregory M. Herek in US American contexts.

Her data is obtained using survey methods based around four-way triangulated multiple design questionnaires, designed to synchronise with Herek’s but with added dimensions. She starts by sorting and filtering data in Excel, extracting particular subsets which exhibit patterns of interest before subjecting those to more searching analytic methods. Principal component and discriminant function analysis then sift out both the predictors of particular responses and the triggers most likely to produce those responses.

Her first data collection and analysis cycle, focusing on respondents in their late teens, shows the same predictors of negative or hostile attitudes as in Herek’s US American studies. Most prominent are place of birth, socioeconomic background, religion and whether or not the respondent personally knows someone with a different sexual orientation to themselves. Those responses are, however, consistently expressed at considerably lower intensity and frequency than Herek found.

More interesting is the distribution of triggers, as revealed by varied question structures. Even in highly liberal response sets, where a majority of responses are positive, any residual hostility is concentrated in the same small number of trigger areas. Conversely, even in predominantly negative response sets there are surprising, and surprisingly consistent, positive exceptions in very specific areas. These sharply defined areas of cognitive dissonance are closely replicated across multiple cross-sectional samples, and now form the target for the next round of data analytic attention using an expanded and targeted sampling regime.