Cloud bursting is revolutionising research at universities. A lot of universities are now engaging in cloud bursting and are regularly taking advantage of public cloud infrastructures that are widely available from large companies like Amazon, Google and Microsoft writes Mahesh Pancholi, Research Computing Specialist at OCF.
The concept of cloud bursting essentially came out of spare capacity that Amazon had on its massive server farms whilst running its websites. These server farms were built to meet particularly high demands at times like Christmas and Black Friday, but the rest of the time they sat idle, so the idea was created to sell that spare capacity.
This has since grown into a whole business otherwise known as infrastructure as a service (IaaS). Instead of having to buy your own kit and run your own services, you can rent time on someone else’s server and use their data centre resources. There is no longer a need to worry about power and electricity costs, data centre space or system administrator’s fees, as you pay a subscription cost to the IaaS company who will do it all for you.
Cloud bursting in universities
The uptake of the public cloud in universities has already happened, particularly when providing core IT services. By using Office 365, rather than an in-house email server, a university is utilising capacity in the cloud, so instead of having a rack of servers and system administrators to run their email service, it has become a full service from the public cloud for all of the university’s users. That’s probably where the biggest uptake started and since then there has been the realisation that at some point in the future, these cloud services are likely to be cheaper to run than buying your own equipment and running it all in-house.
In general, there has certainly been enthusiasm to move towards cloud services and out of that came the OpenStack revolution, which is seen by many as the best of both worlds. You get a ‘cloud-like’ service with the ability to provision whatever type of server you want as a virtual machine, but with the advantage of it being onsite, giving you the control, privacy and data sovereignty.
For example, many organisations prefer not to put HR data on the cloud, but if you have OpenStack onsite, you have a flexible compute platform where the HR data can sit idly for most of the month, and then for the five days it has to work hard, it can burst out to the rest of the infrastructure; helping everything run more efficiently and quickly for that crucial time of the month. Providers of Research Computing Infrastructure have been keen to take advantage of the flexibility and security OpenStack provides and projects such as eMedLab and CLIMB (Cloud Infrastructure for Microbial Bioinformatics) are two very successful examples of this showing that Private Cloud has an established use for many Universities. But what about Public Cloud?
Usually, the groups who provide Research Computing Infrastructure at universities lead the way in the uptake of new technology, but because the public cloud has been so widely publicly available, everyone has been trying cloud bursting at the same time. There are large Enterprise organisations that have moved vast swathes of their infrastructure out to the public cloud because it’s much cheaper than holding onto a data centre and its staff and means they can grow and shrink that infrastructure as demand requires it. At universities, core IT computing is leading the way with public cloud bursting, unlike research computing at universities which is on par or slightly behind on its uptake.
Does it come at a price?
With cloud bursting, there can be attractive initial rates to run the servers, but on top of that, there are all the additional costs which, unless you’re experienced at running IT or cloud infrastructures, might not necessarily be noticed on the outset. For example, there are costs around data egress whereby most companies will say it is free to put your data into the cloud, but then there will be a cost to store the data on a monthly basis and a cost to access that data. So essentially, you are paying for the bandwidth when you are accessing your data back out of the cloud.
When conducting a scientific experiment which involves huge amounts of data, you need to continuously access the terabytes of data to run analyses, you’ll get charged every time you retrieve that data. In some instances, even if you just try and run a search through your file structure that counts as a data egress charge as you are still accessing the data. Unless all the potential scenarios have been considered in the use of cloud bursting, these sorts of costs can sneak up on you and it can become very expensive, very quickly.
This has been recognised by both public cloud providers and the UK’s provider of digital solutions for UK education and research – JISC. There are ongoing efforts to provide special pricing agreements for Universities, waivers for certain charges, and even large amounts of credits to help show the utility of using public cloud services to enhance and expand the research capabilities for universities.
Public cloud providers are surveying the market and partnering with companies, like OCF, for their pedigree in providing solutions to the UK Research Computing community. In order to help Universities take advantage of their products by integrating them with the existing infrastructure such as HPC clusters.
This leads us to the real argument for the public cloud… you no longer have to build a system big enough for your largest workload. Black Friday is a good example, you don’t want to have to build a server farm big enough for Black Friday. You build a server farm big enough for your ‘normal Wednesday’ and burst out to the public cloud to deal with the additional workload. If there is no such thing as a ‘normal Wednesday’ for your organisation (such as a consultancy with a very peaky workload) it makes even more sense for the whole workload to be in the cloud, so that you only pay for what you consume rather than having too much or worse too little compute available to you.
Bursting HPC platforms
Typically, HPC services or systems are bought from research council funding and the best HPC system is bought at the time. But, eighteen months later, its utilisation is close to 100 per cent and the compute capabilities ageing with no more money available for a new system. But all is not lost, as it is becoming cost effective for research computing departments at universities to access cutting-edge technologies or burst beyond their current capacity by leveraging the public cloud.
A university can extend the life of its cluster and be given the ability to offer newer technologies and services through its cluster by bursting out to the public cloud. At the time of purchase of its new HPC system, a university might have bought the latest GPUs at the time of release, but eighteen months later, NVIDIA has released a better performing GPU which can’t be purchased due to lack of additional funding. Hardware manufacturers are recognising this advantage and bolstering it by releasing new hardware to public cloud providers before the rest of the industry – for example, NVIDIA released their latest Volta GPUs and they were available to use in the public cloud before they could be bought! By bursting into the public cloud, the university can offer the latest and greatest technologies as part of its Research Computing Service for all its researchers.
HPC specialist integrators, like OCF, can help the Research Computing team make this a process which is transparent to their users, with the design of a research computing platform with a single point of entry for users. This involves researchers to be able to use the on-premises cluster and that cluster can burst out to the public cloud for additional resources or when priority jobs are required. Also, through that single point of entry, researchers can take up the non-HPC aspects of the cloud, like Hadoop clusters or running AI workloads without needing the Research Computing team to install a whole new set of software or hardware!
Another key consideration is how to manage the billing aspect of bursting into the cloud, which could become very expensive if not monitored closely. There are specific toolsets that have been designed to help with billing control and they are continuing to be developed to meet the needs of universities.
Systems’ administrators who would have run the university’s HPC service previously now have the ability to provide a complete Research Computing Platform and can have greater visibility of all their researchers’ needs, so they can continuously enhance the design of the research computing service, helping the researchers carry out even better work. This kind of service is very attractive for universities and removes the need for researchers to become ‘pseudo-IT systems administrators’ who need to learn how to run a server under their desk and can focus instead entirely on their research.
There is a noticeable increase in awareness of the benefits of public cloud bursting by universities, particularly in Research Computing. Whilst no-one is replacing their on-premises HPC system with the public cloud yet, it is recognised that bursting into the public cloud is incredibly useful for the provision of the latest technologies or extra capacity and expertise for researchers.
As part of their 10-20 year roadmaps, there are some universities who are considering whether they will be buying an on-premises cluster in 10 years’ time and may consider alternatively purchasing a public cloud version of an HPC cluster. There will need to be a culture shift in universities and funding for HPC in the cloud to be fully accepted and most importantly the costs in using the public cloud will need to be driven down further for wider adoption, but the trends are pointing to it not being too far off.