Does the future of HPC lie in the cloud?
Wolfgang Gentzsch is certain, both in his article on page 26 of this issue of Scientific Computing World and in his talk to the ISC Cloud conference in Heidelberg at the end of September: the cloud is the way to entice all those scientists and engineers who are still wedded to their workstations (about 95 per cent of them) to embrace the advantages of high-performance computing.
Michael Resch, who runs the Stuttgart supercomputing centre, worries about the obstacles – and they come in the form of lawyers, taxmen, accountants, and politicians, he told the conference.
So much promise. But, rather as literal clouds are high in the sky, so ascending to ‘the cloud’ can be hard for scientists and engineers.
The independent software vendors, who make the programs that scientists and engineers would like to use in the cloud, are not sure how they can licence their software for such an environment. As Felix Wolfheimer of CST remarked, in a moment of candour: ‘There is a lot of fear in the sales department about opening up the licensing model’ so that software licences will be flexible enough for use in the cloud.
And Brian Sparks from Mellanox summed up the security concerns succinctly and directly: if celebrities can have nude photos of themselves uploaded into the cloud without them really realising that that was happening – until the security of the cloud is hacked and the pictures displayed to any voyeur with a web browser – what commercial company is going to trust their intellectual property to the cloud?
Moving to the cloud
Plenty, according to David Pellerin, business development principal for high performance cloud computing at Amazon Web Services. He told the meeting that the pharmaceutical company Bristol Meyers Squibb had loaded data from clinical trials on to AWS. ‘This is sensitive data – security and compliance are important,’ he assured his audience. A second instance from the life sciences was Illumina, which had created ‘BaseSpace’ in the cloud into which human genome sequences were being uploaded, as part of the general trend towards personalised medicine. Pfizer and Novartis both were moving to the cloud, he continued, with Pfizer running molecular modelling problems on AWS – an area about which pharmaceutical companies are usually extremely sensitive, since their molecular models betray information about which compounds they are researching and which new chemical entities they think might be suitable candidate drugs.
The move to carry out scientific computing in the cloud is not confined to the life sciences and pharmaceutical industries, there are examples from the oil and gas sector as well. Pellerin cited Stochastic Simulation, an Australian company that provides specialist modelling software and services to the oil and gas industry. The company, headquartered in Perth, Western Australia, has developed robust and extremely fast software for simulating reservoirs to estimate how much oil or gas can be extracted. Much of the work is done on the cloud, according to Pellerin.
However, Thomas Goepel from HP pointed out that commercial perceptions of data security are not the sole factors and not even the over-riding factors. Different national laws and attitudes could constrain the use of the cloud, he pointed out. One US company had carried out genomic analyses, using individuals’ data, on the public cloud in the USA relatively uncontroversially, but it became a huge issue when the company moved to do something similar using data from people in China and Europe. It had to build new solutions and create a cloud within China to hold the Chinese data, for example. Goepel told the meeting that although the cloud was attractive to small to medium-sized companies that could not afford a cluster, he felt that high-performance computing was more reluctant to go into the cloud than enterprise computing.
But according to Pellerin, at Amazon Web Services, cloud providers offer added value to their customers: ‘We are not offering infrastructure-as-a-service. All cloud companies offer services layered on top, for example remote desktops in the cloud.’ One example is the unconventional use that has grown up for Amazon’s AppStream. This was originally intended to support streaming games, but now, according to Pellerin, some independent software vendors (ISVs) are looking at the technology as a way of enabling visualisation of the results of scientific computations being carried out on the cloud.
His point was echoed by Wim Slagter from Ansys, who set out seven pointers for best practice in the cloud. The first rule, he suggested, was ‘don’t move data more than you have to’. The starting files for a computerised engineering simulation may be quite small, he said – in the region of about 20MB, perhaps – but the results could be way more than 2GB and may take too long to download over some connections. This meant that visualisation was necessary to view the results remotely. This led on to the requirement to have a full remote desktop which, in turn, was dependent on a low-latency (100ms or less) network connection.
Both network communications and the data storage had to be secure, Slagter continued, and this meant that ‘data in motion’ might have to be encrypted. In an echo of Michael Resch’s concerns about legal responsibilities, Slagter remarked that a cloud environment meant at least three actors had both practical and legal responsibility in keeping data private and secure: the cloud provider itself was responsible for the physical security of the building where the servers were located as well as the security protocols used; the ISV had responsibility for the security of the application that was being run; and the customer had to have a set of security policies and procedures governing who had access to the portal into the cloud and who was licenced, within the customer’s own company, to use the application software and access the data. At the same time, there had to be effective end-user access for both job and data management.
His point was reiterated later by Felix Wolfheimer of CST: ‘Who is at fault if a set-up fails or leaks data?’ he asked. ‘There are three people involved: the customer; the ISV; and the cloud provider.’ The complexity of the cloud arrangement could mean, he remarked, that no one knows what bugs lie inside the software stacks. If the software is open source, then the bugs will be identified fairly quickly, but the task remains of ensuring that the patches are implemented.
The problem of software licences
Slagter also confronted the challenges that cloud computing presented for software licencing. The traditional model was steady-state usage of the software, he reminded his audience, and was usually met by the outright purchase of the software (often with an annual support and maintenance/upgrade contract). For burst capacity, short-term access to the software could be met by ‘leasing’ rather than purchasing. But one of the advantages of the cloud to end-users was that they could use it to accommodate fluctuating demand for compute resources, and there a pay-per-use software model might have to be applied. He stressed that end-users also want to be able to ‘re-use’ their on-premises software licences for the cloud – they want to extend the use of the software (without having to buy a separate licence) for use within either a private or a public cloud.
But the ISC Cloud conference heard that one of the issues that is holding back further adoption of the cloud was that there were many different licence options – and some of this diversity was on display during a special discussion session with representatives of some of the smaller ISVs. NICE, perhaps best-known for its EnginFrame Grid portal, has been working on distributed computing since 1996, according to the company’s Karsten Gaier. In a manner similar to Ansys, NICE offers licences under a purchase model, a yearly rental, and to accommodate cloud use, it now offers monthly, daily, and even hourly rental options.
Tomi Huttunen, director of R&D for the Finnish acoustic simulation software company Kuava, told the meeting that his company offered a licence on the basis of a small monthly fee but it also factored in the CPU time used. Kuava offers ‘cloud credits’ based on the size of the job and length of time that it takes to run. ‘How to build software for hybrid clouds [a mixed use of the in-company datacentre and a public cloud] is a challenge for a small software company,’ he said.
According to Felix Wolfheimer, CST’s licencing model is more traditional. It too offers perpetual licences with an annual fee for support, and it will lease its software annually, over three months, one month or a week, for cloud set up. Acceleration options for high-performance computing are licenced separately using tokens. The company is known for its 3D electromagnetic simulation software and counts organisations such as Cern among its customers. It offers discounts to academic users. Although, as he frankly confessed to the meeting, the prospect of making licencing more flexible to take account of cloud usage was generating fear in the sales departments, CST is trying to collect experience with on-demand or pay-per-use arrangements, but in a test environment with a select few of its customers. He pointed out that the logistics of administering the contract and sending out the licencing files, complicated the process of moving to finer granularity of licencing for periods as short as a day or two.
Make the cloud easier to use
‘The cloud is ready for high-performance computing (HPC). Enthusiasm is large, but progress is slow,’ Gaier said. Why should this be so since, as HP’s Goepel pointed out, the cloud ought to be attractive to small and medium sized enterprises that cannot afford their own supercomputing clusters?
Other companies have been launched specifically to bring the advantages of cloud computing to scientists and engineers. SimScale is a relatively recent spin-out from the Technical University in Munich. Founded as an engineering consultancy company, with a focus on numerical simulation, in 2010, SimScale has now developed its browser-based CAE platform. According to its managing director, David Heiny, it sees itself as a native cloud provider. ‘Our target customers are people completely new to simulation,’ he said.
Part of the reason for the relatively slow uptake in cloud computing by science and engineering companies, according to Oliver Tennert, director of technology management at Transtec, is that it is only comparatively recently that such simulation tools and other advances – for example, those that make remote visualisation possible – have become available.
As in so much of high-performance computing, where uptake depends on ISVs developing end-user application software that engineers and scientists find easy to use without having to turn to specialists in parallel programming, so Tennert’s analysis seemed to extend this point to technical computing in the cloud. Intel’s Stephan Gillich endorsed the point that ease of use was an issue across the whole of simulation and not unique to the cloud. Thomas Goepel, from HP, remarked that HPC was still not the ‘killer app’ for innovation and economic growth because of its complexity, a shortage of skills, limited scalability of existing software and high cost of hardware. ‘When we looked at the cloud,’ he said, ‘these issues were the same.’
Despite David Pellerin’s assurances that cloud providers were giving their customers added value, ease of use featured as a major issue in widespread adoption of the cloud. Wim Slagter, from Ansys, pointed out that small and medium-sized companies perceive different barriers from those recognised by the big companies – and ease of use was foremost among them.
It was therefore a good call by the conference organisers to dedicate a specific session to the concerns and interests of the ISVs because, if this analysis holds good, then the advent of ISVs, such as the ones featured in that session of the conference, who are interested in reaching out to end-user scientists and engineers through the cloud can only accelerate the uptake of the cloud.
At a fundamental level, there are two clouds. Michael Feldman of intersect360 Research pointed out that his company’s data indicated that only one third of cloud-related spending in high-performance computing is in the public cloud; some two thirds is on private clouds. His point was echoed by Transtec’s Tennert: ‘We help customers build up their own private cloud and have not made the leap to public clouds. Moving customers into the public cloud means that the customers have to trust that their data will be safe.’ High-performance computing constitutes a vital part of the data chain for many companies, he continued, which is why there is an issue for them in letting the data off their premises. Gaier, from Nice, pointed out that there is no standard process to access the cloud through company firewalls, so connectivity is itself difficult. In the words of Ansys’s Wim Slagter: ‘One size does not fit all,’ when it comes to cloud deployment.
However, SimScale’s Heiny thought the issue was more about perception than reality: ‘What we see is the diffuse fear of users about putting the data up there – the fear of the risk.’ But even though he thinks ‘technically, everything is there for security,’ he conceded that ‘supercritical data will never be transferred to the cloud.’