Speeding up the storage stack
While storage volumes continue to increase dramatically, storage providers are trying to meet demand by increasing storage performance while introducing more efficient methods of managing data across large multi-petabyte storage platforms.
Molly Rector, chief marketing officer, executive vice president, product management and worldwide marketing at DDN explained that choosing the right system for a particular workflow is critical to getting the most out of your storage technology.
‘We have a very technical pre-sales engineering team on site with the customers, just listening to their requirements’, said Rector.
‘There are a lot of subtleties there that would lead to one product or another but there are four or five key things’.
Rector explained that the key principles to consider are based on expansion and upgrading requirements, the scale of the data, the number of sites that need to be connected and whether the system is going to be a read or write intensive environment.
‘In a lot of HPC centres, IO is not their bottleneck but in others it is. Then you go to things like object storage, which is great for low-cost, long-term storage of data but not great for connecting straight to a compute environment – it just doesn’t have the right performance attributes.
‘They each have their spot and it can be difficult for a customer to choose without a bit of consulting work’, concluded Rector.
These complex requirements mean that there is no one storage technology that is perfect for every situation or workflow. This led to two types of file system – Lustre and GPFS – gaining popularity but, recently, Object Storage has also started to see some use in scientific or HPC environments – thanks to its cost and ability to keep contextual metadata alongside experimental data.
Choosing the right storage technology can be a difficult business, but Cray has developed an end-to-end platform based on its Sonexion storage platform, which it believes can support the majority of workflows because of its inherent flexibility.
Barry Bolding, senior vice president and chief strategy officer at Cray, said: ‘Over the last 10 years or so Lustre is what the market has been demanding. The performance of Lustre is far better than GPFS for certain types of workloads. Now, GPFS can be better in certain situations as well but, for most of our customers, Lustre offers the highest performance.’
However, while Cray does favour Lustre, Bolding stressed that Cray’s philosophy is to provide a flexible system that users can customise to suit their needs. ‘If you are developing a storage product, you take a storage view of the world’ said Bolding.
Cray takes a systems view, and from a system’s perspective you need to have different tiers of storage – and they have to be visible to the user – because that is the only way to make sure that you can get the best performance out of your tiered storage.’
However, as Bolding explained: ‘we have a few customers who use GPFS’; it is possible to remove the Sonexion Lustre file system and replace it with a GPFS based solution. ‘What drives us towards Lustre is market demand from what we see in the requirements from our customers,’ he concluded.
The use of Object storage in scientific computing
In recent years, Object storage technology has started to gain traction in scientific computing industries. While it cannot offer the high-throughput performance of a traditional file system, it changes the conventional storage paradigm, storing data as objects with associated metadata rather than files in a specific format. This provides considerable options for data analytics, managing data more efficiently and even re-using data long after an application – with an associated file format – has been removed from the system.
Matt Starr, CTO at Spectra Logic, said: ‘Lustre and GPFS are great for scratch file systems. They are great if I am going to bring in a large file set, pump it into a scheduler and that thing is going to be spitting out intermediate data results to the file system.’
Starr explained that while Object storage fills a role similar to a traditional file system, it is used best when contextualising data on a lower performance tier – such as archiving. ‘The object storage domain tends to work better for large data sets that are being stored together with collections of meta data – it gives you a common format and a common set of semantics that tie both the meta data and the data together.’
This can be particularly useful for scientific research as it can provide much more than the simple raw data. Starr gave an example of an oil and gas company that may use metadata from a sensor, alongside seismic analysis data to verify a data set or to repeat analysis at a later date.
Starr said: ‘I want to know all about that seismic information, not just “here is the raw data”, I want to know where was it captured; I want sensor information because I may be extrapolating “bad data”.
The use of object storage opens up easier pathways for verification of data, more efficient data management and even data analytics activities across large unstructured data reserves. However, DDN’s Molly Rector stressed that, today, most HPC users are only using object storage for archiving purposes because the performance is not comparable to a file system.
She explained that most users deploy object storage for two main reasons – cost and access to data across multiple data centres. ‘It’s the price of object storage versus having file system licences and higher performance disks; it is much cheaper to put the data into WOS [DDN’s Object storage platform]. The other reason is if you want to move data very easily, here object storage can be really beneficial.’
Accelerating IO performance
One of the biggest developments in storage in recent years has been to remove the bottleneck associated with moving data from storage to processor, otherwise known as IO (input/output operations). Many major manufacturers have some IO acceleration within their storage portfolio, including DDN’s Infinite Memory Engine (IME), Cray’s DataWarp or Spectra Logic’s Black Pearl. It should be noted that ‘Black Pearl’ is a storage platform that includes SSD to accelerate performance, rather than a specific IO acceleration appliance.
‘We fundamentally believe that as data sets get to be multi petabyte, that it is not feasible to try and maintain multiple different products that are not designed together’ said Rector.
‘From an operational perspective, it is just not scalable. IME is just another piece of that. There are bottlenecks and there are inefficiencies in how the data gets moved around. IME is all about accelerating applications and, more specifically, talking IO bottlenecks out of the workflow.’
Both Cray and DDN affirmed that they expect tiered storage to grow in the coming years with DDN adding that it would be a standard for larger HPC sites, such as those in the Top500, within one or two years.
Bolding also commented on the success of Cray’s DataWarp technology: ‘We are selling a lot of DataWarp to a set of customers with a demand for high-bandwidth storage. DataWarp is our newest storage product, but already it is our second highest in terms of customer demand.’
Bolding also stressed that, as part of Cray’s systems-based mentality, the Cray DataWarp technology is integrated with its proprietary Aries interconnect technology – something that no other storage provider can claim. ‘It is a fast pool and there is no way you can do that with IME or any other kind of platform. They do not sit on the fabric with the processors so they have limitations on the bandwidth.’
The future of storage
Along with the incessant growth of data volumes, and the use of object storage and IO acceleration, the next trend in the development of storage technologies will revolve around increasing intelligence of the storage stack so that data can be moved and archived more efficiently.
‘In the case of automated scientific systems that are doing massive parallel compute you do not want to lock up cores, to lock up resources in the cluster until you are completely ready.’
To alleviate this congestion of compute resources, Starr expects that ‘control systems will move farther up into the software stack so that the automated system starts to drive where data resides, rather than a policy driven system hidden in an appliance.’
Removing this complexity from the current policy driven system would allow control systems to pre-empt specific workflows, checking the availability of files so that resources can be allocated as soon as they become available. This development should help to reduce queue times and wasted stress on the IO performance of the storage system.
Bolding indicated that he expects a further increase in the use of SSDs, particularly for IO acceleration. ‘We are going to see an increase in the use of solid state devices; they will still be more expensive than spinning disk, so it will mainly be used for near-line storage – very fast storage pools that need to be very flexible’ he said.
‘Sometimes they need to act like a file system; sometimes they need to act as a buffer or a cache. If you want to put storage close to the compute then you need to make it as flexible as possible in order to have it improving the speed of computation’, said Bolding.
Ultimately, all of these improvements focus on increasing performance – the rate at which data can be stored and retrieved – while simultaneously reducing congestion on the storage network, so that data can be moved quickly and efficiently across the system.
‘Ultimately, we are looking at it from a systems perspective. How can you build a computing environment that has no limitations on bandwidth storage, compute, or memory? That is the way that we look at the problem’, concluded Bolding.