Big Data needs networks
The day before the official opening of the ISC’14 conference in Leipzig in June 2014, delegates assembled for a full day of discussion on how international cooperation could foster the technological developments needed for the next generation of high-performance computing.
At the seminar, the man primarily responsible for driving forward the US exascale programme, Bill Harrod, proposed that ‘the greatest area of such cooperation is system software’. As programme manager for Advanced Scientific Computing Research at the US Department of Energy, he outlined how cooperation could help in developing novel operating systems; software tools for performance monitoring (particularly for energy efficient computation); and system management software that will cope with hardware failures, both in processor nodes and in memory and storage.
At the same event, Mike Dewar, chief technical officer of the Numerical Algorithms Group (NAG), reminded his US and Japanese colleagues that, unlike them, Europe was not a single country but rather that the ‘European Union is a collection of nation states’. Some EU countries had different interests in, and expectations of, HPC from others and this diversity made it difficult for the EU to collaborate internationally in the same way as individual countries.
However, Europe offers several models for international collaboration on scientific projects. Cern, the European Laboratory for Particle Physics is well known; Prace, the Partnership for Advanced Computing in Europe, is now well-established. But less well known perhaps is an infrastructure project that, amongst other things, underpins the international cooperation facilitated by Prace.
Although this particular project is not directly a supercomputing organisation, European science and supercomputing would be very different without it, so the model it offers is worth some study. It is not a loose collaboration of academic researchers, but equally it is not an intergovernmental organisation with all the bureaucratic machinery that that entails. It owns property, and operates a service for scientists, but it also conducts research itself.
Dante (Delivery of Advanced Network Technology to Europe) was established in 1993 to coordinate the way in which data communication networks for European research and education were established, working with the individual countries’ own National Research and Education Networks (NRENs). Dante builds and operates the high-speed networks that connect the NRENs to each other and to the rest of the world, enabling scientists, academics, innovators and students to collaborate across dedicated networks, regardless of where they are.
While Cern and other international scientific collaborative projects tend to be run by intergovernmental bodies, set up only after lengthy diplomatic treaty negotiations, and with Government delegations monitoring every step in the operation, Dante is distinctive in that it is actually a limited liability company, owned by its shareholders, and run by a management team responsible to the board of directors. The shareholders are the National Research and Education Networks rather than government representatives. (Although some government departments are shareholders, instead of that country’s NREN, this tends to be because the legal status of the NREN does not permit it to own shares in another organisation.)
For historical reasons, Dante is headquartered in Cambridge but, disappointingly perhaps, instead of a Wisteria-covered medieval college building, it is run from a very modern office block to the south of the city and therefore far from the picturesque and historic city centre. (Europa Science Ltd, the publishers of Scientific Computing World, occupies rather dowdier and much less grand offices just around the corner.)
Dante’s main project is Géant, which serves about 50 million users across Europe, reaches over 100 countries worldwide – and, according to Dante, is the most advanced international network of its type. Given the volumes of data that modern research generates, the infrastructure is largely fibre optic cable and in September 2014, Dante, together with its main contractor, the commercial company Infinera, announced that they had successfully demonstrated a single-card terabit super-channel on an active segment of Géant’s production network between Budapest in Hungary and Bratislava in the Slovak Republic.
One of the challenges network operators face is the amount of time it takes to deploy capacity with multiple line-cards and the burden of managing hundreds and thousands of fibre connections. The demonstration showed, for the first time, that a single photonic integrated circuit can enable more than a terabit of super-channel coherent capacity from a single line-card with a single fibre connector.
Terabit networking is not only a significant achievement in its own right, but it demonstrates the way in which Dante is more than just a service provider. Research is also part of its brief. It is one of the strengths of the organisation, according to Dr John Chevers, chief business development officer for Dante, that people working on the research side can have their work informed by the needs and characteristics of the infrastructure provision as well.
Over the years, Dante’s work programme has been organised as consecutive ‘projects’, with what is now the fourth consecutive Géant project drawing to its close. Chevers hopes that, in future, Dante will be able to plan its work on a longer timescale in the past, opening up the opportunity to plan on a more ambitious scale for the network. It is hoped that next April, in conjunction with the European Commission’s Horizon 2020 research programme, a seven-year initiative will be possible, so that the organisation will enter ‘an exciting new phase’.
The ‘Europe’ that Dante serves is much bigger than the EU, however – stretching from Israel to Norway. This presents challenges of geography and national sensibilities. Some countries already have a lot of fibre in the ground, whereas others have less capacity in their fibre network. Yet Dante has been successful, Chevers said, because it understands the research and education sectors and tailors its services to their needs. ‘We are providing scale and quality and global reach that is not available commercially,’ he said.
Apart from its flexible governance structure as a limited company rather than an unwieldy intergovernmental organisation, another key to success is its financial policy. It is funded in part by the European Commission, as part of its ‘European Research Area’ policy, and also by subscriptions from the national research and education networks (NRENs). But NRENs are not charged in the way that a commercial service provider would structure the fees, Chevers explained. They do not pay for the volumes of the traffic that they send down the fibres but rather for the available capacity. This reflects the fact that scientific usage of a communications network differs from the commercial. Scientific data flows tend to be ‘bursty’ – there is much higher peak utilisation, compared to the average, than for commercial networks. A commercial provider would want to sell every last Mbit on the link, whereas Dante has to install capacity that has the headroom to cope with, for example, intermittent but very large data transmissions from the Large Hadron Collider at Cern.
Given the complexity of the patterns of usage, it is quite difficult to rank Dante’s users – apart from the fact that the high-energy physics community is far and away the biggest user – according to Chevers. But at SC14 in New Orleans, the organisation demonstrated how astronomy can benefit from the power of networking. This collaboration goes way beyond European boundaries, as the project is the Square Kilometre Array telescope. This is being built in the southern hemisphere, where the view of the Milky Way Galaxy is best and radio interference least, and it straddles the frontiers of several southern African counties, with cores in South Africa and Australia. ‘They will have networks within the instrument and international networks to transfer results in a computed form globally, so we can expect a lot of data. Dante is engaged as a consultant on the networking,’ he explained. A lot of the raw data in radio astronomy is in fact noise, so the data will be pre-computed before it is sent over the network (see The future of HPC in Australia page 10), ‘but even after local processing, there will be enormous amounts of data.’
In commercial high-performance computing, a lot of effort is currently going into bringing the compute to where the data is (see Why storage is as important as computation page 20). Thus software is now available that will allow ‘compute at a distance’ – in the cloud, for example – and that offers users remote visualisation tools so that they can see the results of their computations locally without having to transfer the data back to their own machines (see Tools for efficient computing page 24). Partly this is a result of poor bandwidth in some commercial networks and partly it is a function of the cost of data transmission in the commercial model. From Dante’s point of view, according to Chevers: ‘In Europe today, it shouldn’t be the network that is the constraint.’
Prace is an example of the diversity of users, because it operates a dedicated point-to-point ‘private’ network, so to speak. ‘We provide the capacity, but how they use it is up to them’, Chevers continued. Prace has its own telecoms switch in Frankfurt and Géant provides 10Gbps bandwidth links from supercomputer sites to that switch. This ‘star’ formation allows Prace to connect up and share capacity. ‘At the moment, we’re in discussions to refresh that architecture and re-jig it for the current and future needs of Prace. It’s not static and it will evolve as usage patterns evolve. We are a provider of capacity, but we also provide expertise – that the average Telco could not provide – on large supercomputing projects using networks. Prace have very good network specialists and we provide as much or as little support as they need.’
Dante has its own points of presence –about 20 of them in Europe where its optical equipment and routers are located. ‘Where possible we have leased our own fibre. In some cases, where fibre infrastructure is sparse we lease capacity from the Telcos.’
Its research work has led to several developments that make the lives of researchers using the network much easier. The eduGAIN service was developed within the Géant project and enables identity federations around the world to connect, simplifying access to content, services and resources for the global research and education community. ‘If you have a digital identity in London and want to access the Barcelona Supercomputer, for example, then this will allow the two to talk,’ Chevers said.
A second tool to aid international access to research networks, funded by the Géant project, is eduroam (education roaming) which allows students, researchers, and staff from participating institutions to obtain internet connectivity across campus and when visiting other participating institutions by simply opening their laptop. The eduroam system is based on the principle that the user’s authentication is done within the user’s home institution, while the authorisation decision to allow access to the network resources is controlled by the visited network. ‘You don’t need a separate password for each location you visit, but your details are still secure – your credentials are recognised around the world.’
Dante and Géant therefore are ‘not just about the network – big fat pipes – but we are looking at every technology that facilitates research and the transfer of data,’ according to Chevers. The Géant model is successful, he believes. ‘It’s been adopted in other regions of the world – a successful European export!’ As one example, he cited RedCLARA – Cooperación Latino Americana de Redes Avanzadas (Latin American Cooperation of Advanced Networks) – which develops and operates the only Latin-American network for regional interconnection intended to foster collaboration in research, innovation and education across Latin America by means of advanced telecommunications networks.
It is part of Dante’s role, not only to manage research and education networking projects serving Europe, but also to consult and assist networks in the Mediterranean, sub-Saharan Africa, and central Asia regions. Dante has helped with the Central Asia Research and Education Network (CAREN) which has been operational since July 2010, and links scientists and students in Kazakhstan, Kyrgyzstan, Tajikistan, Turkmenistan, and Kazakhstan with Uzbekistan, a candidate country.
It also coordinates the ORIENTplus project that connects the Chinese NRENs CERNET (China Education and Research Network) and CSTNET (China Science and Technology Network) with the 50 million users of the Géant network, via super-fast connectivity between Beijing and London. In January 2013, the link capacity quadrupled to 10Gbps to meet traffic growth. One critical point about this link is that it goes via Siberia whereas virtually all commercial links go via the USA – about double the distance. It is, in Chevers’ view, another example of how Dante can provide a service that is of higher quality than that offered by commercial providers.
The life sciences have become a very significant user of the ORIENTplus network with massive amounts of genomic data going back and forth between the European Bioinformatics Institute (EBI) at Hinxton not far from Cambridge and the Beijing Genomics Institute (BGI).
But it remains the case that Cern’s LHC is the biggest user and the main driver for network development. Chevers pointed out that when the LHC is switched on again after its current outage, it will generate two and a half times as much data as before: ‘And we’re ready for that.’