Drilling for fuel
When the cost of a bad decision can run to hundreds of millions of pounds, significant pressure exists for oil and gas companies to get their drilling decisions right. Seismic surveys remain the principal tool of the prospecting geophysicist, and while the basics of the technique have continued relatively unchanged over the last couple of decades, the practice of interpreting the resultant data has been, and continues to be, revolutionised by breakthroughs in HPC.
Exploratory geophysical surveying entails creating a seismic wave in the crust of the earth, either through an explosive detonation, an air blast, or an underwater spark, and then recording the sound waves from some distance away using an array of geophones when on land, or hydrophones trailing from a boat when at sea. The raw seismic data is often many terabytes in size. An average survey could consist of hundreds of thousands of shots, i.e. hundreds of thousands of waves created and recorded.
Before the drilling takes place, HPC has a key role to play in two aspects of the modern exploratory survey: data processing and image analysis. The first entails interpretation of acoustic data into a 3D map of the acoustic impedance of the Earth’s subsurface. Depending on the area surveyed, this data processing can take months, or up to a year. When complete, interfaces between various rock types can be resolved. Once the acoustic impedance data has been processed into a 3D matrix, the analysis can begin. As late as the 1990s, this step would have entailed printed charts, which experienced analysts would study for many months before picking a target. Skilled analysts are still vital but their work can be accelerated and made more reliable by use of computerised tools.
Speed is of the essence – floating data centres
Oil producers have understandably large quotas to fill, and drilling schedules must be met to ensure that new wells are opened on time. Add to this the cost per day of running a fleet of seismic survey vessels and it’s apparent that a rapid data turnaround is desirable in order to generate a return on exploration investment quickly. Texan reservoir services company Geotrace recently filled an unusual order made by Norwegian marine geophysics company Wavefield Inseis. The client had bid on a large 3D survey off the shore of Libya, and a quick turnaround was an important factor on account of the high oil costs pushing drilling rigs to run at full capacity. Wavefield realised that, rather than acquiring data, writing it to tapes, and bringing the tapes to an onshore processing centre, the data processing could be more-effectively carried out as the data is collected, on board the seismic survey vessel. Geotrace were employed to install a data centre onboard The Endeavour – the flagship of Wavefield’s fleet, still under construction as the contract was awarded.
Matt Gaskamp, data centre operations manager at Geotrace, says that getting the number of CPUs required onto the vessel proved to be a challenge. His team estimated that 1,500 to 2,000 cores would be required in order to get the processing completed within the turnaround time specified by Wavefield, and the footprint allocated to the data centre had already been decided: four racks, and four 16A cabinets to work with. After software compatibility checks, Geotrace chose a solution based on Dell’s blade servers. Blade dimensions were necessary in order to achieve the number of cores required for the project, and Dell products were chosen because of versatility of powering options. Low voltage processors were used so as to fit within power constraints.
The system was built in its Texas headquarters before being shipped out, and has been in operation since summer 2008. Gaskamp cites the fact that the boat has no dedicated data centre administrator as a slight drawback, but the system has been designed to be remotely supportable from London.
Accelerating processing – the Kaleidoscope project
When it comes to accessing the Earth’s remaining hydrocarbon reserves profitably, oil and gas companies are examining ever-more complex geologies. The Gulf of Mexico is thought to contain 37bn barrels of undiscovered, conventionally recoverable oil, but these reserves are located underneath thick deposits of salt. These salt layers necessitate the use of computing intensive reverse-time migration algorithms (RTM). All seismic processing involves an inversion of the acoustic wave equation, but there is no mathematical model allowing this to be done directly. Processing is therefore iterative, with RTM being the most advanced form. Professor Jose M Cela, director of the Barcelona Supercomputing Centre’s (BSC) Computer Applications in Science and Engineering department (CASE), explains why the technique is necessary: ‘There are many places on the planet where oil is already extracted and at which they have not used RTM. Within normal geologies, there are many more-simple algorithms able to give an image of reasonable quality. However, when dealing with a geology in which salt is a major constituent of the terrain, RTM offers a valuable advantage. The salt is like a mirror for other algorithms – you cannot see the geological structures that are underneath. RTM is the only algorithm allowing one to see the geological structures beneath salt.’
An example of the kind of images produced by seismic data coupled with analysis software. Changes in the acoustic impedance of the geological subsurface are resolved as colour changes, and filters can be applied to highlight features. Image courtesy of ffA.
A project called Kaleidoscope, carried out principally by Cela’s team at the BSC and the Spanish oil company Repsol, has found novel ways of accelerating RTM for seismic processing, achieving an acceleration of two orders of magnitude over conventional clusters. One order of magnitude acceleration was achieved through the use of novel hardware, and a second was achieved through the optimisation of RTM algorithms for that hardware.
The hardware selected by the team was the Cell processor, jointly developed by IBM, Sony and Toshiba, and first used in Sony’s PlayStation 3 games console. Each Cell consists of a power processing element (PPE) and eight accelerator cores (synergic processing units or SPUs). These SPUs are specialised processors with simplified instructions compared to the conventional processor, and the eight of them are controlled by the PPE. The peak performance of the cell processor is 250GFlop. Professor Cela estimates that a single computational node in a cluster built on conventional architecture might be able to run an RTM algorithm at a rate in the order of 10GFlops, and simply the use of the Cell processor pushes this rate up to 100GFlop per computational node. The trade-off of this increased performance is that the programming becomes more complicated as the SPUs cannot access data that is in the programme’s general memory; they can only access the processor’s local memory, meaning that the programmer must arrange for information to be transferred from general memory to cache memory – and this is a complicated process.
The second step of the project entailed careful optimisation of the software. A consideration of the numbers involved will demonstrate the ways in which this was achieved: a typical computational domain may consist of 1,000 nodes in the x, y, and z directions, leading to 109 points in the simulation. For each seismic shot, the simulation has 5,000 to 10,000 time steps, meaning that several terabytes of data are required just for each seismic shot. Additionally, there may be up to 100k shots in each survey. A computational node capable of processing this data therefore requires a capacity of at least one terabyte, far beyond the capabilities of current commercial PCs. When dealing with these quantities of data, Cela says: ‘It’s mandatory that we simplify the algorithm in order to reduce the amount of I/O, and we use a compression technique that allows us to save disk space. Alongside this we reduce the number of shots needed to generate an image of good quality.’ In this way, overall single-node speeds were increased to the teraflop scale for RTM.
Despite the impressive increase in speeds, industry is still hungry for more power. ‘Limitations still exist with respect to the velocity of the algorithm; if you want to simulate very huge geographical areas with this technique, you will require a very huge supercomputer,’ says Cela. Repsol has already implemented a 120TFlop machine for daily production, which will be devoted to RTM. ‘Using other technologies, it would not be possible to execute the algorithm in the proper time,’ says Cela. ‘Now they can.’
Aberdeen-based company Foster Findlay Associates (ffA) produces an array of processing tools for use on conventional workstations. The tools are not dissimilar to those used in 2D image processing, and they are used by analysts in order to facilitate visualisation of seismic data. Typically, a 3D image will be run through the equivalent of a series of edge-preserving image filters in order to clean noisy data, and features will be subsequently picked out and highlighted, depending on the requirements of the user, by way of further algorithms. The algorithms used are processor-intensive; they use large tensors, compute Eigen values and Eigen vectors, and some of the methods are iterative. Stephen Purves, technical director at ffA, describes the company’s analysis software: ‘For years it was a black box processing environment, consisting of lots of processing modules. A huge array of different processors takes a volume of data in, processes it, and dumps the results out to a disk.’ Processing times scale depending on the volume of data, and while processing, the user could be left waiting for a couple of hours. The company is moving towards real-time visualising offerings – interactive tools with less of a ‘black box’ aspect. This real-time visualisation requires an HPC element, and the company has had good results through the use of the Nvidia Cuda platform for GPU processing.
The Geowave Endeavour, flagship of Wavefield’s marine geophysics fleet, boasts an onboard data centre to process seismic results in the field. Image courtesy of Geotrace.
‘A lot of our algorithms are already embarrassingly parallel,’ says Purves, meaning that problems can be split into parallel workloads with minimal extra workload. ‘We can process over some local neighbourhood, rather than point-by-point, so each of the points in the dataset is pretty much independent and treated as such by the algorithms.’ This parallelism made the problem well suited to implementation on the Cuda platform.
Purves says that adapting processing modules to the Cuda API has been relatively easy. Although the company’s developers had been working in C++ rather than the adapted C language used in Cuda, Purves observes: ‘When you get down to the nitty-gritty of a computational kernel, it’s often very C-like anyway. You’re thinking carefully about data structures and memory usage, and you’re not really using objects and polymorphism, which are the more advanced features of C++. With any programming you have to think about memory management, but [with Cuda] you’re thinking about it in very high-level terms. A C call of “give me a buffer this size please” is sufficient. You’re not having to deal at all with the nitty-gritty down at hardware level; a lot of it is hidden, and it is very easy.’
Despite the ease of implementation, the decision to move towards GPU processing was not trivial; Purves cites the company’s impetus for the move: ‘We’ve been keeping an eye on [GPU processing] for a couple of years, and we had a proper think about it about two years ago: “Do we want to do something with general-purpose GPU?” It’s purely the existence of Cuda, along with its maturity, that has encouraged us to make the decision. Before that, it just didn’t balance up; considering the amount of time we’d have had to spend porting, dealing with shaders, and thinking about low-level stuff on the graphics card, it wouldn’t have been cost-effective.’
A third of ffA’s process modules are now ‘Cuda enabled’ for acceleration on a suitable GPU, and the company plans to expand this support for all of its compatible modules. Processing a 5GB volume using dual Intel Xeon 3.33GHz processors and 16GB of DDR2 RAM takes 44 minutes. Adding a Quadro FX card to the system (commercial graphics accelerator) drops this time to 13 minutes 28 seconds, and a Tesla card (specialised GPGPU) instead of a Quadro reduces the duration to 9 minutes 32 seconds. A system containing three Tesla cards performs the process in 3 minutes 38 seconds - a speed-up of 12x on the conventional system. Because the graphics cards are capable of outputting 3D images for visualisation, this does not need to be built in separately.
Wherever the oil and gas industry ends up in the future, it is certain that it will continue to represent a well-funded demand for the highest performance solutions that computer science is able to provide it with. This is an industry in which costs, turnovers, and, ultimately, profits run to hundreds of billions of pounds annually. Developers of useful HPC will always find a willing customer here.