Beyond the skies

Felix Grant looks to the heavens and sees scientific software shining bright

In a forest in southern England, a small astronomical observatory is maintained by the sort of individual love and enthusiasm that too often dies out as we leave childhood. The New Forest Observatory[1] (NFO) is the baby of Greg Parker (researcher in multiple directions, consultant, professor of photonics and head of nanoscale systems integration at Southampton University, company CEO, erector of fingerposts on the road to quantum computing, indefatigable ideas generator…). And the NFO’s latest project is a ‘miniWASP’ imaging array, named for a spectacularly successful large scale collaboration.

Professor Parker is quick to point out that the name miniWASP is imitative rather than literal: like the arrays used by the WASP (Wide Area Search for Planets) project[2], it aims multiple CCD (charge coupled device) equipped refractors at the sky, but as a means to gather large amounts of ‘deep sky’ image data in a short time rather than for planet searching.

Nevertheless, there is more than a superficial resemblance. For a start, in an age of radio astronomy, both use the visible spectrum or (some of Parker’s work in the near infrared) wavelengths close to it. Planet searching is one of those scientific activities which still can be, and is, conducted by other motivated individual amateurs – if not at the mass production rate that an automated system can routinely maintain.

A superWASP array uses eight 120 Megapixel refractor cameras, while miniWASP will sport four at 10 Megapixels, and the disparity in available computing power is considerable, but both rely on data analytic comparison of multiple captures to provide high-quality final images – Parker predicts that when miniWASP is complete he ‘will probably need to run four computers to keep [it] afloat’.

The New Forest Observatory presently gathers the multiple component exposures (at least 50 of them, each of about 10 minutes, for one final image result) sequentially over time.

The downloads, using Maxim DL software, are output to RGB, checked for extraneous components such as passing aircraft, ‘stacked’ to produce one composite data array, then passed to Parker’s US collaborator Noel Carboni. Whereas superWASP has bespoke dedicated parallel image processing software, Carboni uses Adobe PhotoShop to apply various techniques to light pollution, and extract enhanced signal while eliminating as much noise as possible.

SuperWASP processes its much larger captures using reference data that eliminates systemic artefacts deriving from the CCDs and optics in use, then examines them for fluctuations in star luminance caused by gravity lensing. These fluctuations flag possible transit events, such as the passage of a planet between the star and the observation point. Millions of stars are monitored in this way, across a broad field view, which emphasises quantity over magnification.

From the two WASP sites (the Canaries in the northern hemisphere, South Africa in the southern), more than 6,500 stars per second can be monitored.

The galaxy classification screen used by GalaxyZoo’s distributed volunteers.

Such stories exemplify the truism that astronomy, these days, whether at the large institutional end of the scale or at the enhanced serious amateur grade, is predominantly a data processing field. In truth, of course, it always has been. Babylonian astronomers took the first steps toward understanding celestial mechanics by studying the relations between repeated measurements of position. It was analysis of discrepancies between Tycho Brahe’s observations and Copernican theory, not the act of observation itself, which led Kepler to deduce the ellipticity of planetary orbits. But with the advent of plentiful scientific computing power replacing human perspiration, the true balance between observation and analysis has been able to find its own level.

Not that observation is unimportant; analysis obviously depends upon data and, in an intensive computing age, on databases. Jumping up 20 orders of magnitude or so (give or take a bit, depending on how you calculate it!), GalaxyZoo[3] is a project that, through the organisational and communicative capabilities of computerised databases, brings distributed human analysis to bear on the crude initial classification of galaxy types. Volunteers, anywhere, sign themselves into the project’s website, receive brief instruction and a check on understanding, and are then invited to classify images in the database into types (elliptical, clockwise or anticlockwise spiral, merger…) by clicking buttons. Although it’s not explicitly said, the same images are presumably offered multiple times to different volunteers as a cross check.

The end result will be a first level refinement of the database, allowing more detailed analysis by computers or experts to begin at a more efficient level. Images that don’t fit into any of the available categories automatically sort into a smaller set for expert checking, and means are provided for the volunteer to flag up anything particularly unusual. This sort of human mediated layer in computerised data analysis, to exploit the superiority that we maintain over computers in certain pattern recognition areas, is being applied in several areas – error trapping in OCR-based transcription of printed texts to digital storage being one notable example.

GalaxyZoo draws on the Sloan Digital Star Survey[4] (SDSS), a multinational collaboration of institutions, facilities and funding bodies, which aims to build a complex multidimensional data model for a significant proportion of the sky. At the time of writing, the scanned area was pushing towards the 10,000° square mark.

SDSS is interested in everything from galaxy scale data (it is establishing the distance to a million of them) down to asteroids and, in between, hundreds of millions of stars classified by up to a hundred descriptors – including label, absolute brightness and position at a minimum, with spectra for a sparse-sampled subset.

Computation is intensive throughout the whole SDSS process, from initial conversion of raw data to analysis of the results. Multiple exposures from a 120-megapixel camera, plus feeds from spectrographs recording 600 objects per observation, generate a lot of raw data. Just the process of turning these data into single images, even before it starts its onward journey for analysis, is enough to boggle the mind. The next generation of telescopes, such as the Large Synoptic Survey Telescope[5] producing petabytes of moving picture show output from three-day complete sky scans in 15 second exposures, are going to produce a further dramatic expansion of the flow.

Having completed their passage through the computational mill, the data becomes an end product released by SDSS in tranches, which must be organised for productive access by users.

One project seeking to facilitate this, supported by the UK National Grid Service[6] (NGS), involves transfer of SDSS data to Oracle databases. University of Portsmouth doctoral candidate Helen Xiang has demonstrated joint queries on a p air of heterogenous terabyte databases – one (Microsoft SQL) at her own institution, the other (Oracle) which she transferred to Manchester for hosting by NGS.

But ‘space research’ is not just astronomy. Space, after all, begins somewhere around 80 (according to NASA) or 100 (Fédération Aéronautique Internationale) kilometres above the Earth’s surface – just above our heads, relatively speaking, at little more than one hundredth of an earth radius. In terms of applied technology, the overwhelming majority of our attention is focused not much beyond that level.

Communications, geopositioning, weapons platforms, scientific study, espionage and surveillance, all operate very close by – an orbital, largely commercial and political sphere, well above the 100km point, but still closely wrapped about our atmosphere. One politically sensitive growth area in this sphere at present, having rapidly become strategic for military, commercial and scientific purposes over the past decade or so, is geopositioning. The existing US GPS and Russian GLONASS systems are due to be joined in 2013 by the new and improved 30-satellite European system, Galileo.

Visualisations in a Maple document of the earth’s magnetosphere, the 3D visualisation employing transparency and cut plane through use of nVizx (Montage components drawn from Cheb-Terrab et al[8], supplied with permission from Maplesoft).

While the existing systems provide ample proof of concept and feasibility, establishing the infrastructure for Galileo is a huge commercial and political undertaking – especially as it is designed to go beyond the quality and extent of the existing systems. One exemplar component, evaluation of receiving station technologies, is tackled through a software platform (the Galileo Receiver Analysis and Design Application, or GRANADA). Developed in Matlab, Simulink and related tools by Madridbased high technology company Deimos Space to run on a single Windows PC, Granada faithfully simulates the complete Galileo signal-processing chain to provide a test bench for all stages of the process.

As with all celestial bodies, geocentric phenomena extend considerably and progressively beyond our own orbital vicinity, the dubiously named magnetosphere (dubious because it bears little resemblance to a sphere) reaching about 15 earth radii in the sunward direction and at least a couple of hundred or so downwind. Studies of such layers are of particular importance domestically, and those of other bodies interest everyone from just macro scale cosmologists down to the more localised concerns of exobiologists (of whom more later). Modelling of complex interactive fluid structures such as the magnetosphere is a prime example of science dependent upon computing for its existence, never mind progress. A comment by Büchner et al[7] that ‘…interaction of a highly collisionless plasma with a strongly collision-dominated plasma requires an extension of the usual fluid plasma equations…’ is a disarmingly offhand understatement.

Constructing the necessary mathematical representations, conducting the volume of calculation which follows from them, feeding back results for checking against observation and thence into revised structures, not to mention redundancy-based error checking of every stage, would be ruinously time- and resource-hungry without cheap and plentiful machine labour to conduct it at high speed. Nor is passing over traditional methods for accelerated machine handling the only benefit: informational technologies also allow the models to be stripped down. This, in turn, brings both further efficiency gains and expanded plurality of access, supplementing enhanced opportunities for exploratory investigation.

Writing with specific reference to use of Maple in magnetosphere modelling[8], Edgardo Cheb-Terrab and co-authors make the more general point that the ‘…working landscape is changing with the advent of interactive mathematical systems [which capture] the analytic modelling, driven by a large body of mathematical algorithms, as well as numerical experiments, supported with an efficient computational engine’. Their illustrative application, a Maple document enhanced by nVizx 3D visualisation and presented to an environmental informatics audience, neatly demonstrates both the sociotechnological point and the relevance of space research to terrestrial concerns. ‘Instead of using empirical data…’, they assert, their ‘…equation-based approach, derived directly from first principles and a fundamental understanding of the magnetosphere, can provide adequate understanding and prediction capabilities without resorting to classical brute force finite element and finite difference simulation models’. The model that they put forward is intended to serve as an armature onto which other aspects can be added as required for particular comparisons with gathered data.

The magnetosphere appears to be crucial in maintenance of the very particular circumstances required for human (and all other terrestrial) life. Without it, our atmosphere and water would have been stripped away long ago – as, it would seem, were those of weakly magnetised Mars. Not that terrestrial life is the only conceivable option: there are many theoretical models for ways in which life and intelligence might arise on bases other than the carbon platform we know and love, including some which would be viable in hard vacuum. Data from the Cassini probe and the Hubble telescope are eagerly scrutinised for signs that Saturn’s moon Titan may offer a viable alternative basis.

Growing understanding of extremophile archaea in terrestrial habitats has expanded acceptance of what is, and may be, possible. Nevertheless, most exobiologists bet their funding money on searches for our own pattern: carbon, oxygen, water, all at so-called ‘Goldilocks zone’ temperatures.

Planet searching, with which I opened, is obviously an important part of such hunts. So is spectral classification of stars. In the more immediate future and locality, attention focuses on the first planetary exploration target: Mars, where hypotheses have their best short-term prospect of physical validation. If life exists on the HIP74995d, the terrestrial planet lurking at the outer habitable zone limit around star Gliese 581, more than 20 light years away, no human being now alive will ever see it. On Mars, however, it has a chance of being found – and I  could see it within the hour.

Since Mars is visitable, but only at great cost, common sense economics dictate that maximum pay-off be squeezed from every trip. Despatch of machines, which can stay there without incurring the cost of return or the overhead of life support, makes sense. So does processing data locally, cutting out two-way transmission delays in routine or urgent observation, decision and response cycles. In the medium term, this means gradually building up for Mars a small analogue of the integrated surface and orbital data communications structure, which already exists on and around earth.

Mars will, in other words, become an automated and connected scientific computing platform. A lot of work has gone into development of viable, delay tolerant and transmission efficient communications as part of a projected ‘interplanetary internet’. The data pipelines between the earth’s data networks and a fledgling Martian equivalent would be optical laser based, which brings a need for highly stable positioning and aiming mechanisms. Once again, as with the Galileo receivers, a software test bench approach has been used to facilitate Nasa’s development of the necessary inertial reference units – by Applied Technology Associates (ATA), this time, though once again with Mathworks tools.

Humans will probably follow machines to Mars, and to some other local destinations (the asteroids, perhaps the Jovian moons) eventually but, as distance increases, the edge will always lie with robots. A distributed machine-based research entity developed for Mars will be become an exportable prototype for all future space exploration – so successes and mistakes now are exciting in more than a temporary sense. By the time an equivalent of Professor Parker’s observatory (run, of course, by a robot version of Professor Parker) gives us the first deep sky view from somewhere out on the orbit of Pluto, whether or not life has been found on the way, the solar system will be a fully wired (or, rather, wireless) data neighbourhood with its roots in the work being done today.


1. Parker, G. and N. Carbonari. The New Forest Observatory. www.newforestobservatory.com

2. SuperWASP Home Page. www.superwasp.org

3. GalaxyZoo. [cited]; Available from: www.galaxyzoo.org

4. Sloan Digital Sky Survey. www.sdss.org

5. Large Synoptic Survey Telescope. www.lsst.org

6. UK National Grid Service. www.grid-support.ac.uk 

7. Büchner, J., C.T. Dum, and M. Scholer, Space plasma simulation. Lecture notes in physics, 2003, Berlin; New York: Springer. 3540006982

8. Cheb-Terrab, E.S., J.Cooper, and B.W. Wilson, Modelling the magnetosphere with Maple, in XIX International conference on Informatics for Environmental Protection. 2005: Brno, Czech Republic


Adept Science Splinex nVizx info@adeptscience.com

Adobe Consulting PhotoShop T: +44 0 20 7317 4100

Applied Technology Associates Sensor development and manufacture ContactATA@aptec.com

Deimos Space GRANADA www.deimos-space.com/ES/contacto.asp

Diffraction Ltd Maxim DL www.cyanogen.com/company/contact_main.htm

Maplesoft Maple info@maplesoft.com

MathWorks Matlab, Simulink info@mathworks.co.uk


For functionality and security for externalised research, software providers have turned to the cloud, writes Sophia Ktori


Robert Roe investigates the growth in cloud technology which is being driven by scientific, engineering and HPC workflows through application specific hardware


Robert Roe learns that the NASA advanced supercomputing division (NAS) is optimising energy efficiency and water usage to maximise the facility’s potential to deliver computing services to its user community


Robert Roe investigates the use of technologies in HPC that could help shape the design of future supercomputers