Parallel efforts create a virtual sky
The universe is being digitised at an unprecedented rate, a fact that presents opportunities and challenges in seemingly equal measure. Take the Sloan Digital Sky Survey, a project which, in the process of mapping and imaging one quarter of the entire sky and determining positions and absolute brightnesses of more than 100 million celestial objects, will ultimately gather dozens of terabytes of data in different formats and at different wavelengths. This is but one of many prolific sources of astronomical data.
Generally speaking, data from these sources remain unconnected. There is widespread recognition that the full potential of these data, as a result, is seriously underexploited. Not only that, the size of individual data sets has also exploded, particularly those delivered by large facilities (such as the ESO's Very Large Telescope). This presents particular concerns for astronomers without access to the computing capabilities necessary to explore such data sets. It seems that astronomy can now contribute its own particular chapter to the data deluge saga.
The obvious solution to the astronomers' situation - to make data in all the archives conform to the same format, then collect existing and future data together in that same format, and finally make all of them available to astronomers on different machines worldwide - would seem so ridiculously ambitious as to be laughable.
But that is exactly what an international team of teams of astronomers and computer scientists has decided to do. If it comes off, the result - an international 'virtual observatory' - will be a tribute to academic collaboration and cooperation.
A global entity of this nature is in fact a long way off, but work on regional virtual observatories has already begun. A US effort led by John Hopkins University and California Institute of Technology aims to create a National Virtual Observatory (NVO) by 2010. It has already received a five-year grant of $10m (a sixth of the total anticipated costs) to get it off the ground.
Now Europe's astronomers are involved in a similar thrust, following an award of €4m over three years from the European Commission to build a European equivalent of the NVO - the Astronomical Virtual Observatory (AVO). The idea is that the AVO, like its US cousin, will enable astronomers to gain instant access to data from both ground- and space-based telescopes that are observing across the entire wavelength range - from gamma rays, through visible light, to radio waves - and then to combine these data seamlessly using a common interface from any machine, thereby enabling remote mining of multiwavelength data archives.
There are six partner organisations currently working towards the AVO. These include the European Space Agency (ESA), research centres, consortia and observatories in France and UK, and the European Southern Observatory (ESO) in Germany, which is leading the project. The AVO consortium has quickly formed a close alliance with the US NVO and both teams have representatives on the other's respective committees.
Both are also in communication with other, similar initiatives brewing elsewhere, specifically in India, Australia, and Japan. The ultimate aim, all parties agree, is a truly global virtual observatory.
Peter Quinn at the ESO, who is heading up the AVO project, explained its significance: 'AVO and the international VO effort aim to change the way that astronomical research is done. This will be achieved by a number of methods. Firstly, by enabling the interoperation of multi-wavelength data from space and ground archives to form a picture of the universe in a manner unbiased by a particular spectral window. Secondly, by connecting distributed archive sites to powerful computing resources to enable the mining of large survey databases like the Sloan digital sky survey. And thirdly by introducing the astronomical community to Grid technologies to deal with the dataflow... and to mine the interoperating resources.'
The scale of the project is breathtaking. It faces two main challenges: to make the data interoperable in the first place; and then to store and deliver the data and their associated functionalities. As far as interoperability is concerned, a consensus on the nature of the solution has already been reached, at least in part: 'We have chosen XML as the obvious candidate technology for interoperability at the ASCII data level,' revealed Quinn. 'This should be sufficient to cover the needs of astronomical catalogues, observation logs, data quality descriptions and general reporting. It still remains to be assessed if we need to make additional technology choices to fully cover binary data.'
For the second challenge, the main items on the agenda concern network bandwidth, scalable storage and compute power and reusable Grid middleware - the glue that sticks the system together and makes it function consistently. Grid computing, observed Quinn, will be key to resolving these issues.
Enter AstroGrid, itself a UK consortium of six university groups and a government lab, and also one of the six AVO partners. The AstroGrid project was recently awarded around 2m by the UK government's Particle Physics and Astronomy Research Council (PPARC). It is dedicated to creating a distributed computing network, based on next-generation Grid technology, that will manage the data processing, storage and delivery tasks associated with accessing and mining vast astronomical archives. It is both independent of and inexorably tied up with the AVO initiative.
Professor Andy Lawrence of Edinburgh University, head of the AstroGrid team, explained: 'As a member of the AVO, some of our work contributes to this project and we have named people who spend a proportion of their time on AVO work. But we have a different overall timetable, which is somewhat less relaxed than that of the AVO.'
The key difference is that AVO is embarking on a three-year, phase-A study,whose outcome at the end of that period will be a thoroughly researched and documented prototype - a 'proposal system' for building the final virtual observatory, rather than a functional product. AVO at this stage is geared to getting the science, the model and the community absolutely right.
In contrast, the impetus for AstroGrid is to get something tangible up and running. 'We are working on a three-year timescale for everything, which means that by the end of it we intend to have something built that actually works. It will help astronomical science very early on and it will also help to trial longer-term aims of virtual observatory initiatives,' Lawrence pointed out.
In fact the AstroGrid team needs to produce results at this accelerated pace if the AVO is to have a means of testing its own premises at the end of its three-year period, since AstroGrid will offer a model (although not necessarily the final model) for the engine room underneath the entire AVO structure. Not surprisingly the timetables of these parallel endeavours have been carefully constructed to complement one another.
The AVO has started by working with the NVO in the US to put together new software standards for the interoperation of astronomical catalogue data - named the VOTable. 'This will allow, for the first time, catalogue data from major international observatories to be connected,' said Quinn.
The AVO is also contributing to the international effort to draw up an initial roadmap that will lead to the global observatory. This must enable agreement on common standards and lay down the timings for early demonstrations of new VO scientific capabilities. An international VO conference in Garching in June this year is expected to move much of this effort forward.
Meanwhile the AstroGrid consortium has opted for a high-speed, commercial-style approach to designing the structure for the network. Researchers at different locations are presently defining the scientific problems, creating use-cases ('stories' that set out different ways in which the system might be used) and formulating architecture. This is based on a formal, software project planning approach to the job, using the 'unified process' method together with the Universal Modelling Language (UML) tool to abstract and derive the architecture, based on the scientific problems and use-cases identified. The process is a circular one, involving multiple modifications of usecases and software modules as the understanding of requirements progresses.
Lawrence expects this part of the process to take a full year, of which six months remain. There are also a number of important software issues to be debated. For example, AstroGrid members are still not fully decided between Unix/Java based technology or Microsoft technology for the development platform - although Lawrence privately believes it will be the Unix/Java combination that prevails. In addition, a conversation underway in the wider computing community, concerning future routes for Internet development, will have implications for AstroGrid. On one side of the conversation are the Grid computing people, who are addressing issues of how to share CPU power across a network. AstroGrid needs this approach to enable authorisation and authentication of data and scientists' access to data, for example.
And on the other side there is the Web services community, who are optimising the delivery of services and data over the Web and are keen adopters of XML technologies.
'Web services are also very relevant to the idea of a virtual observatory,' said Lawrence. 'Web services software will be able to automatically compare, say, infrared with X-ray data from different locations, and make it appear that all data are sitting on the astronomer's machine.' Lawrence and others hope that the two strands will find a way through the technological challenges and come together. He is optimistic about this possibility: 'Lots of people want there to be a Grid version of Web services. Large organisations like IBM and Oracle are interested too.'
For the AstroGrid consortium, the race is on - first testing of early prototype modules will begin within the next six months and will be ongoing after that. D-Day on the researchers' calendars is sometime in autumn 2004, the time by which they have committed to releasing a working system. 'It's a scary commitment!' admitted Lawrence, 'but with some caveats. Inevitably it won't have all the desired functionality at that stage.'
Meanwhile, for the AVO the devil is in the detail: 'At the end of three years we expect to demonstrate 'complex' capabilities involving all archives within the AVO consortium and utilizing elements of Grid technology,' described Quinn. 'The fully operational VO will be planned as phase B.'
So it will be a long time - earliest estimates suggest around six years - before we see an international VO. But the result could be profound. Routine observations will not even need telescopes, and crosswavelength, cross-location data comparisons are expected to provide entirely new insights into the universe.