Grid powers up at ISIS

Share this on social media:

Getting CPU-intensive computations to run faster used to mean buying a new computer. Not any more, reports Kenneth Shankland and Tom Griffin

The ISIS Facility at the UK's Rutherford Appleton Laboratory in Oxfordshire is the world's most powerful source of pulsed neutrons and muons for condensed matter research. With more than 20 instruments surrounding the neutron source, and a second neutron source (Target Station 2) under construction, ISIS supports an international community of around 1,600 scientists from disciplines including physics, chemistry, materials science, geology, engineering, and biology.

Computational methods have always played a significant role in analysing data collected at the facility, and the continual rise in computer power has allowed increasingly complex modelling and analysis operations to be brought to bear on problems as diverse as the structure of high-T superconductors and H-atom vibrations in paracetamol. However, for many analyses, the prospect of 'increased accuracy' or 'more results, faster' remains alluring and additional computer power is always welcome. As many of the analyses performed fall into the category of 'coarse grained' (e.g. Monte-Carlo simulations or parametric scans), they are ideally suited for adaptation onto a distributed computing system. One such, the Grid MP system from United Devices, has been in use at ISIS for about 12 months.

The aim of such a 'grid' system is to harness 'spare' CPU cycles on existing desktop PCs at the facility and use them to perform useful scientific calculations, such as materials simulations, instrument simulations, crystal-structure determinations etc. It is widely acknowledged that the CPU of the average desktop PC is hardly (~10 per cent) used during a typical day of running office software, such as word processors and mail clients. Therefore, integrated over the number of PCs to be found on desktops in the facility, there is a vast reserve of CPU power waiting to be harnessed.

In the Grid MP system, there are two key hardware components: the server computers and the client computers that are registered with the server. The servers comprise two dedicated dual-processor Linux PCs which act as the focus for the splitting, scheduling, distribution, and tracking of the compute jobs. They also monitor the status of all the client computers that constitute the raw CPU power of the grid system. These client computers are standard desktop PCs located within the facility, running a small piece of United Devices software called an agent, which communicates with the server. The agent has the capability to execute computer programs on the clients in such a way that the people sitting at the client computers are not aware that these programs are in fact running. The net result is that large compute jobs (consisting of a series of smaller, discrete compute jobs) can be executed much faster than normal, without resort to purchasing additional hardware.

It can take hours or days to render at high resolution on a single PC a ray-traced image, such as this muonium ion implanted in C60. The time can be reduced almost proportionally with the number of CPUs by rendering the image (as a series of 'strips') on a grid.

Of course, some work is required to enable applications to run in the grid environment. A customised C++ submit program is written, which takes care of packaging the application along with the necessary dependencies (e.g. input files, DLLs) into a series of discrete work units ready for distribution. A complementary program takes care of retrieving results from the server and then collating them into the final desired answer. Once adapted, applications are tested on a subset of the grid before being deployed on the full system.

A wide range of applications has been installed on the system, ranging from computational chemistry codes to image-generation packages. Perhaps one of the most significant applications for ISIS is that of instrument design. Target Station 2, with its enhanced flux of long-wavelength neutrons, is an excellent platform on which to build instruments for the study of large-scale structures in soft condensed matter, biomolecular sciences, and advanced materials. The Monte Carlo instrument-simulation codes now installed on the grid (VITESS and McStas) allow the scientists to create a computer model of their instrument and to investigate its performance as a function of various critical parameters, such as neutron guide length or guide curvature. In order to obtain good simulations, many virtual neutrons need to be 'fired' through the instrument. Fortunately, the nature of the problem is such that if n neutrons are required for a particular simulation, they can be split into packets of size n/x neutrons for distribution onto x client computers. The performance gains can be spectacular: recent simulations of the HET instrument using VITESS utilised some six months' worth of CPU time in only 48 hours, running under Grid MP with around 100 clients. It is therefore no longer necessary to limit investigations into instrument performance, due to a lack of available compute power.

Planned for Target Station 2, WISH (Wide angle In a Single Histogram) is a modern neutron diffraction instrument that includes a very large array of detectors around the sample. Of particular import in its design has been simulating the performance of the guide that transports neutrons from source to sample.

Power of this magnitude can also transform the way in which certain problems are approached. Molecular dynamics (MD) is a well-established technique for simulating on computer the atomic structure of materials and is increasingly being exploited to study structural phase transitions induced by pressure (P) and/or temperature (T). The ability to map out P/T space is advantageous both prior to experiments (highlighting those regions to be studied) and during the subsequent data analysis. The material under study is first simulated using MD at some nominal pressure and temperature. The system can then be perturbed in either P or T, and new MD runs started to provide new simulations at these discrete P/T points. By grid-enabling the MD application used at ISIS, P/T space can be traversed more quickly and efficiently from the initial simulation. With the power available, P and T steps finer than those previously tractable can now be employed, allowing more accurate mapping of phase boundaries. Alternatively, it is possible to include a 3rd search parameter, such as the chemical composition of the material under investigation.

Mapping of P/T space using MD calculations can provide valuable insights both before and after a neutron diffraction experiment.

Another problem particularly well-suited for distribution on a grid is determining a crystal structure from powder-diffraction data using simulated annealing (SA). Solving complex molecular structures in this way, from either X-ray or neutron powder-diffraction data, requires numerous SA trials, as no one single trial is guaranteed to reach the global minimum corresponding to the correct crystal structure. In particularly challenging cases, the success rate in finding the global minimum can drop to only one or two per cent of the total number of SA trials performed, and as each trial may take an hour or more to run, the whole process can become very time-consuming. Fortunately, each SA trial, which starts from a different random point, can be run independently and by utilising the grid, solution times can be brought down to acceptable levels.

One of the most interesting aspects of running a grid at a scientific facility is that sociological issues prove to be just as important as technological ones. While many at the facility have embraced the power of the grid, others have found it hard to accept that the grid jobs will not interfere with normal operation of their PCs, despite ample evidence to the contrary. Others worry about security - e.g. will the grid provide an easy entry-route for viruses, or will their own data be secure on the grid? All these questions can be answered, and concerns assuaged, but nevertheless it pays to introduce new technology gradually and to provide plenty of help and support both for users of the system and for those contributing their PC power to the grid. It is particularly important to assist the latter, as it is they who will ensure that the power of the grid increases with time, as they upgrade their PCs to the latest models.

Kenneth Shankland and Tom Griffin work in the Data Analysis Group at the ISIS facility of the Rutherford Appleton Laboratory.