HPC APPLICATIONS: CERN
A universe of data
The massive YE+1 end-cap was lowered into the CMS cavern. This is a very precise process as the crane must lower the end-cap through minimal clearance without tilt or sway.
Paul Schreier visits CERN to learn about the computing resources needed for the world's largest ever scientific experiment
‘Listen…there’s a storm!’ Although it was raining lightly as I entered the museum at CERN with my host and guide, Dr Bernd Panzer-Steindel, computing fabric manager in the IT department, that’s not what he was referring to. We had just walked by a cosmic-ray detector that gives an audio and video indication of incident radiation, and for a brief moment it sounded like a popcorn popper. Panzer-Steindel’s ear, and in fact his entire profession, is oriented towards helping others learn more about the basic particles of our universe. He and his team, who supply the computing resources for the thousands of scientists at this world-renowned nuclear research facility and make data available to thousands more around the globe, are making a major contribution towards the success of the LHC (Large Hadron Collider) project, which will start running at full capacity in coming months. As he says, ‘it doesn’t do you any good to have this fantastic experiment if you can’t collect and store the data for later analysis.’
CERN has embarked on the warm-up, debug and calibration phases of the LHC, the world’s largest scientific experiment. Its scope is massive in terms of not only its physical size – a ring 27km long circling underground across the Swiss/French border near Geneva – but also in terms of its computing resources. In fact, despite the enormous computing facility located at CERN, the project requires more resources than it is feasible to house there, so data is being provided to hundreds of sites around the globe so thousands of physicists can take part in the experiments around the clock.
The LHC at work
In the LHC, two beams of particles travel in opposite directions in separate tubes. Protons circulate in the LHC for 20 minutes before reaching their maximum speed: 99.99 per cent of the speed of light. They are guided by thousands of superconducting magnets of various types and sizes, some as long as 15m. Just prior to collision, a special type of magnet ‘squeezes’ the particles closer together to increase the chance of collision. To avoid colliding with gas molecules in the accelerator, the particle beams travel in an ultra-high vacuum as empty as interplanetary space. The internal pressure of the LHC is 10 x 10-13, ten times less than the pressure on the moon. When the two beams collide, they generate temperatures more than 100,000 times hotter than the sun, but concentrated in a miniscule space. By contrast, the cryogenic distribution system that circulates superfluid helium around the accelerator ring keeps the LHC at -271.3°C (1.9K) – colder even than outer space.
Dr Bernd Panzer-Steindel, computing fabric manager at CERN’s IT department, stands on a piece of live artwork where various elements light up as they detect incoming cosmic particles. (Photo: Paul Schreier)
The beams inside the LHC collide at four locations along the ring, corresponding to the four main experiments being set up. Two of them – ATLAS (A Toroidal LHC ApparatuS) and CMS (Compact Muon Solenoid) are designed first of all to search for the Higgs boson and particles that could make up dark matter. These two experiments record similar sets of measurements but use radically different technical solutions and designs; if they provide similar results about the Higgs boson, scientists will have much greater confidence in the findings. Third is ALICE (A Large Ion Collider Experiment), which collides lead ions to recreate the conditions just after the Big Bang and will allow physicists to study a state of matter known as quark-gluon plasma. Quarks are bound together by gluons, and there is such an incredibly strong bond that isolated quarks have never been found. Collisions will generate temperatures so high that physicists hope that the protons and neutrons will ‘melt’ and free the quarks from their bonds with the gluons. Finally, the LHCb (Large Hadron Collider beauty) specialises in investigating the slight difference between matter and antimatter by studying a type of particle called the ‘beauty quark’ or ‘b quark’.
The sensors and controls used to put the LHC into operation and maintain its correct performance are carried out largely through local embedded computers. For instance, local control systems trigger only when an event of interest occurs. These systems are also needed to help reduce noise, and in the LHC, some unusual noise sources arise. Here ‘noise’ is defined as any physical effect that has nothing to do with the desired physics. One is gravity from the moon, which changes the length of the ring by 100 microns, a distance that can have a great effect when you’re keeping electrons near the speed of light on track. Another effect is the weather, which can change the weight of the nearby Jura mountains and thus has an effect on the ring’s configuration. Yet another noise source at first stumped the physicists. They found stray EM fields entering the system at semiregular intervals. It wasn’t until there was a railroad strike that they determined the source was the high-speed TGV trains passing nearby.
Petabytes of data
Now consider the massive amount of data that comes from these experiments. Every second, 600 million particle collisions are measured, and scientists filter out the thousand or so that are interesting. The electronic ‘photo’ of each event requires 1 to 2Mb of storage. When running at full capacity, the LHC will run between 150 and 200 days per year, leading to a rough range of between 10 and 20 petabytes (1 petabyte = 1 million gigabytes) of data per year. This demanded a new approach to data storage, management, sharing and analysis – tasks handled by the LHC Computing Grid (LCG) project.
In CERN’s own IT centre there are roughly 2,000 computing nodes with 15,000 cores; but even when there were just 400 cores in the computer cluster, they had almost 20TFlops of power, putting it in the Top 500 worldwide. Also in the centre are 1,000 servers that can store 7,000 TB along with six tape-storage ‘silos’ (three with 6,000 tapes and three with 10,000 tapes and several more scheduled for next year) with a capacity of 20,000 TB; in addition, another 6,000 PCs sitting on desks throughout CERN run Windows XP/Vista and Microsoft Office applications. The PCs in the computing centre all run Linux, and CERN has standardised on a variant of Red Hat Enterprise edition. In cooperation with Fermi National Accelerator Laboratory, CERN has added to Red Hat a few customised features such as AFS (Andrew File System), security and performance tuning. This version has been renamed Scientific Linux, and it is likewise opensource and available to the public.
The PCs in the computing centre serve not only for data storage, but also implement the infrastructure for the 8,000 scientists working at CERN as well as administrative staff. Each year, a third of these are swapped out for new units. The machines being ‘retired’ are based on Intel’s dual-core Xeon processor, and those PCs entering service today use dual-socket, quad-core Intel Harpertown processors. In addition, while the old machines are of the desktop variety, the new ones are either rackmounted or blades in a chassis. The various PCs and storage media are interconnected with standard 1GB Ethernet rather than high-speed variants such as Infiniband. Explains Panzer-Steindel: ‘Our focus is on overall data throughput in the computing fabric, so latency is not an issue. We have to analyse as many “photos” as possible in a short time, but because these “photos” are independent of each other, we can use a “trivial” coarse-grain parallel approach for this problem.’
Electricity and cooling requirements are putting an upper limit on the amount of computing resources that can be installed at CERN.
Because thousands of scientists want to run their programs, and all need access to the massive amounts of data being stored, the task of scheduling the jobs for best usage becomes quite complex. At CERN, an average of 70,000 jobs are scheduled each day with daily peaks reaching 100,000 jobs. To allocate the workload across the computing cluster, Platform LSF software from Platform Computing matches the jobs being submitted and keeps the resources running close to maximum capacity. LSF also balances job priorities and handles different user requests, from jobs that run quickly to simulations that might need to run for weeks. Further, with only limited file-server resources available, a storagemanagement software scheduling requirement arises when many users request access to many different datasets, and here again LSF manages the scheduling.
Why tape storage and not disks?
Given the advances in disk capacities, why does CERN still rely on tape backup of the scientific data? It’s a simple matter of TCO – total cost of ownership. Panzer-Steindel estimates that the cost of the media alone is roughly the same at $0.20/GB, but total system cost is $0.30/GB for tape and $1.00 for disk – and, in a project of this size and with strict budgets, that difference is huge. CERN already has 15TB on tape, and the data gathering hasn’t even begun in earnest. In addition, a tape in a robot needs no continuous electricity/cooling, but spinning disks do.
He has examined all sorts of alternatives including USB disks, which alone cost about the same as tape storage – but you must also factor in the fact that they either run from a wall wart, which would require special power strips to accommodate thousands of them, or the building-services team would have to install a separate 12V power bus throughout the facility. ‘We certainly follow market trends,’ he adds, ‘but they must make sense for us.’
CERN is using tape-storage installations from IBM and Sun (StorageTek), which have been stressed and tested successfully over the past 18 months. They are managing the required ~3GB/s data rates easily and are running at these rates already now in production.
In this regard, Panzer-Steindel makes an interesting distinction: ‘Despite the size of our facility and the massive processing we do, we are not involved in "high performance computing" in the traditional sense; rather, we are involved in "high throughput computing".’ There is a constant data flow of more than 5GB/sec in the computing system all the time, leading to 500 TB/day data movements. ‘We purchase products close to the consumer market where prices are very aggressive,’ he adds. ‘We have an annual IT budget for hardware of $25m. Every six months we send out tenders, and we review the three most promising bids. These come from both major familiar suppliers and unknowns, but in the end, it’s the price/performance ratio that must be right.’
This track is an example of simulated data modeled for the CMS detector on the Large Hadron Collider (LHC) at CERN. (Copyright CERN)
When you look at the racks in CERN’s computing centre, one striking thing is that each is only perhaps a third or a fourth full. That is because the air cooling in the building is limited by the amount of heat that can be removed per square meter.
While the new machines being put into service have lower power consumption with higher performance, the demands on computing power are growing so rapidly that there is a need for a new computing centre, which is in the planning stages, as well as for support from other computing centres around the world. With the delivery of systems at the end of 2009, CERN will have no extra power capacity; the existing electrical distribution system can’t handle another megawatt. There is a 2.5MW limit on the existing computer centre, and the limit comes entirely from the installed cooling infrastructure. The entire CERN facility uses 180MW when the accelerator and the experiments are running.
The LHC Computing Grid
Because of these limits, CERN cannot cope with all the data coming from the experiments. Thus, it has set up the LCG (LHC Computing Grid) Project to build and maintain a data-storage and analysis infrastructure for the entire high-energy physics community who work with the LHC. It will give roughly 15,000 scientists in some 500 research institutes and universities worldwide access to experimental data. Further, data must be available over the 15-year estimated lifetime of the LHC. The analysis of the data, including comparison with theoretical simulations, requires of the order of 100,000 CPUs at 2006 measures of processing power.
CERN chose a data grid model because it provides several key benefits. For one, the significant costs of maintaining and upgrading the necessary resources for such a computing challenge are more easily handled in a distributed environment. Also, there are fewer single points of failure.
A distributed system also presents significant challenges which include ensuring adequate levels of network bandwidth, maintaining coherence of software versions at various locations, coping with heterogeneous hardware, managing and protecting data so it is not lost or corrupted, and providing accounting mechanisms so that different groups have fair access.
CERN has dubbed itself as the Tier0 computing facility, and it sends all the data to Tier1 facilities, of which there are roughly a dozen such as Fermilab and the Grid Computing Center in Karlsruhe, Germany. A requirement for the Tier0 and all Tier1 facilities is that they must be available 24 hours a day, seven days a week. A full copy of all the experimental data is spread across all the Tier1 facilities.
Tier1 centres make subsets of data available to Tier2 centres, each consisting of one or several collaborating computing facilities that can store sufficient data and provide adequate computing power for specific analysis tasks. There are approximately 200 to 300 Tier2 centres.
Finally, individual scientists can access even smaller subsets of data from particular experiments through Tier3 computing resources, which can consist of local clusters in a university computer centre or even individual PCs. The Tier3 centres are just now being set up, so there are no hard numbers of the users. But a rough guess of all the computing nodes that will be used from Tier0 through Tier3 is in the order of 30,000.
Block diagram showing the role of CERN as the Tier0 facility in the LCG computing grid.
Data analysis: the fundamentals don’t change
The first data records from particle accelerators were photographs of particle traces. In those early days, workers sat at desks and used rulers and protractors to measure the length of traces and their curvatures. From this information, scientists could deduce what had happened. Today, data analysis follows the same basic principles except the photos are digitised and software can make the measurements.
Most of the post-experimental analysis deals with statistics and visualisation and uses custom software written at CERN. One example is ROOT, an open-source analysis program. It is being used in all major highenergy and nuclear physics laboratories around the world to monitor, store and analyse data, and people in other sciences as well as the medical and financial industries use it. The ROOT system provides a set of object-oriented frameworks to efficiently handle and analyse large amounts of data. Having the data defined as a set of objects, specialised storage methods are used to get direct access to the separate attributes of the selected objects without having to touch the bulk of the data.
In contrast, in the design of the LHC and in running experiments, commercial scientific and engineering packages are used. The packages that Panzer-Steindel notes that are being used for the large electrical, electronic and mechanical infrastructure for the accelerator and experiments include Ansys, AutoCAD, Cadence, LabVIEW, Matlab, Mathematica, Opera, PVSS, CATIA, StarCD, and Saber – among many others.
For instance, a beam that travels off course in the ring can cause catastrophic damage. To prevent particles from straying, more than 100 collimators are installed. Each uses blocks of graphite or other heavy materials to absorb energetic particles that move out of the nominal beam core. In a standard configuration, a PXI instrument chassis from National Instruments controls up to 15 stepper motors on three different collimators through a 20-minute motion profile to accurately and synchronously align the graphite blocks, and a second chassis checks the realtime positioning of the same collimators. In a given collimator, both PXI chassis run LabVIEW Real-Time on the controller and LabVIEW FPGA on the reconfigurable I/O devices in the peripheral slots. CERN uses this hardware and software to create a custom motion controller for approximately 600 stepper motors with millisecond synchronisation over the 27km of the LHC.