Why storage is as important as computation
In August 2014, a ‘Task Force on High Performance Computing’ reported to the US Department of Energy that data-centric computing will be one of the most important requirements of HPC within the next 10 years. The report states: ‘The need to manage, analyse, and extract useful information from the tremendous amounts of data they [HPC systems] ingest and produce becomes commensurate and co-equal in importance to their computational power.’
The report continued: ‘This is the case across much of the government research enterprise, while the emerging confluence of big data and analytics capabilities with highly sophisticated modelling and simulation is promising to have a transformational effect on a number of major industries.’
Commercially, data-centric computing is already having an effect on enterprise markets not traditionally associated with HPC, and the companies that specialise in data storage for HPC are already experiencing demand from a much wider spectrum of customers.
According to Molly Rector, chief marketing officer for Data Direct Networks: ‘There has been a massive increase in the use of high-performance storage solutions for enterprise markets, with particular growth in areas such as life sciences, media and finance. We [DDN] have had 180 per cent growth in the financial services market in the first half of the year, twice as much for the first half of 2014 compared to the first half of 2013.’
She continued: ‘Cloud, media and entertainment, and enterprise business unit environments are now grappling with the same massive compute and data management demands that HPC technology was designed to address.’
Genomics and financial services are also increasingly large markets for storage companies. Rector said that genomics was probably the fastest-growing source of new customers, if not specifically among Fortune 500 companies.
Xyratex, now part of Seagate, has also seen growth in these enterprise markets. Ken Claffey, vice president of ClusterStor at Seagate, pointed out that the company had been engaging with customers from enterprise for some time, but that growth has increased dramatically. Claffey said: ‘Many of our customers in enterprise markets are looking for high-performance solutions that can deliver on price performance and scalability.’
Panasas has seen similar growth in enterprise markets, according to Geoffery Noer, vice president of product management at Panasas. He said: ‘The classic environment where scale-out storage was used in the scientific arena was really in the university and the government environment for big supercomputers. Even there, with the scale that some of these systems are looking at, reliability and availability are receiving renewed attention. However, the bigger trend that is happening is the adoption of scale-out storage for technical computing in the enterprise markets. The adoption rate there has been at a serious pace over the last few years and that is continuing unabated.’
However, it is not just genomics, financial services, and media and entertainment companies that are benefiting from the use of large scale storage. Increasingly, Fortune 500 companies are looking to differentiate their products from competitors. Noer said: ‘A Fortune 100 company expects an enterprise level of reliability and availability. They are using it for mixed workloads, for many different types of applications with many different groups of users at the same time, and so they need a level of reliability and availability that is much higher than would be provided in a typical scratch-space environment.’
Another strong area of growth for storage companies has been oil and gas, as Noer highlighted: ‘The compute and storage associated with the search and optimisation of oil and gas extraction is an enormous market for scale-out NAS. For Panasas as a whole, we ended 2013 with 70 per cent of business coming from the enterprise.’
New technical computing markets are growing rapidly, and with that come increased storage needs. However, there are some new markets that demand similar capacity and performance. Noer said: ‘Life sciences used to be much more about storage capacity and less about storage performance but storage performance has become much more important in the last couple of years because of those trends and we are seeing the same thing in media.’
The growth has knock-on effects within the storage companies. Rector explained that DDN has been expanding its offices over recent years. DDN opened an office in Paris to focus on the development of the infinite memory engine (IME). Rector said: ‘It takes time, what DDN delivers is not trivial, we do a lot of workflow optimisation.’
Seagate, and Xyratex before it, have been engaging with enterprise markets through the ETP4HPC project. Just as in the USA the DoE report saw HPC as essential to future economic prosperity, so in the European Union, this industry-led forum hopes to use HPC to encourage growth in European industry and encourage the adoption of HPC in enterprise markets. The ETP is funded from the Horizon 2020 programme, an EU-funded research and innovation programme.
Xyratex has been involved with Horizon 2020 projects and was a founding member of ETP4HPC. Claffey said: ‘Horizon 2020 is really aimed at driving the adoption of HPC technology throughout enterprise markets. What we are working on with the Horizon 2020 project is really looking forward to the exascale class systems. Here you need to provide a level of power efficiency, performance, and reliability that can scale to the meet the requirements of exascale computing.
The pressures from growing markets in enterprise and traditional HPC are driving demand for extremely efficient compute and storage architectures. Price performance in particular is being driven by enterprise, while HPC needs to address fault tolerance and power efficiency to scale to petabytes and potentially exabytes in the near future.
High throughput applications are already pushing at the limit of what is achievable with today’s storage technology. A significant bottleneck can be the memory or cache used to move large data sets into compute architecture. Removing these bottlenecks has been an area of significant research for storage providers.
Molly Rector, CMO for DDN said: ‘We are thinking about how you can use your compute more efficiently, even though we are a storage company.’ This is true both for the performance and reliability of supercomputers, and their associated storage solutions as they increase in size on the road to exascale class computing.
Rector said: ‘If you have a $100,000,000 supercomputer you want to keep it as busy as possible.’ This is part of an overall strategy at DDN that focuses on implementing technology to allow users to use their compute more effectively. Rector said: ‘It is too expensive to move the storage to the compute so we have been working on moving the compute to the data. Users need to be able to move big data sets into the memory.’
Move the compute to the data
One way in which DDN has been working on enabling its users to use compute more effectively is through the implementation of the infinite memory engine (IME), a burst buffer for big data HPC applications. The improvements are implemented through an intelligent software layer that provides a distributed NVRAM storage cache using patented algorithms to eliminate file-locking pressure.
Panasas has been developing its own solutions to increase the performance of their storage solutions. The release of ActiveStor 16 has seen continued development of the hybrid architecture, introduced in ActiveStor 14, that makes use of flash memory to provide low-latency memory based storage to quickly transfer small files and metadata.
Noer said: ‘We are effectively tiering all of the small files and metadata to flash and that has a tremendous impact on the performance of real world applications.’
‘When you have all those small files and the file system metadata clogging up the hard drives you rarely get the sort of performance that you expect from your drive. By putting all that data onto flash you can greatly accelerate the user’s experience.’
Price of storage is another issue, not because the price of storage is too high but because the storage needs of data intensive markets grow so quickly that storage is becoming a real concern for users that need to archive data for an extended period of time.
Panasas’ solution to this problem is to keep small files and metadata persistently in Flash SSD to achieve exceptional I/O performance that is not possible on hard disk. Larger files are broadly striped across SATA drives for cost effectiveness. This hybrid approach delivers optimum performance for a given cost with mixed file workloads, which is the case in almost all HPC and technical computing environments.
DDN has implemented object storage through a pre-configured cloud storage platform, allowing users to archive unstructured data onto the cloud service. This not only reduces the cost associated with the performance tier of the storage solution as users won’t need as much capacity, but also allows DDN and its customers to focus on delivering the highest performance possible for the regularly accessed data. Rector said: ‘Object storage allows us to alleviate the pressure on performance storage.’
However, performance is only one aspect of the requirements of storage in an exascale computing environment. Although performance will always be a key requirement for many applications, fault-tolerance of large scale systems is key to using compute resources efficiently. Claffey said: ‘When you deal with that scale it is imperative that you can provide the kind of reliability and scalability that the industry requires. At Seagate, we can provide that because we have unique intellectual property at every stage of the storage stack.’
Seagate has a distinctive position in the HPC storage market; not only does it produce the large scale systems but it also produces the drives themselves. This provides another layer of expertise in dealing with problems and allows the company to address reliability from the hardware perspective as well as the software controlling the distribution of data across a high-performance storage system.
For Panasas, Noer said: ‘It’s much more than just the possibility of a drive failing. Actually, the more common scenario is that you will get one or more sectors that are unreadable and the rest of the drive is accessible normally. The unreadable sector error is what is actually most common and the larger the hard drive the higher the probability of this kind of error occurring.’
Noer continued: ‘This is a disaster mitigation scenario, there is a big difference between losing 10 petabytes of storage by having three hard drive failures, to getting a list of 10 files that you have to restore to bring your system completely back up to health and allowing your users to access almost all of the files normally while you are restoring those 10 files.’
This level of data protection is a key aspect of storage technology at exascale. Storage solutions at this scale are used to run critical supercomputing applications often taking many hours to complete. Repeating application runs because of hardware failure must be kept to a minimum so that the HPC system can be used efficiently.
Noer explained that Panasas accomplished this using an intelligent per-file distributed RAID architecture that implements triple parity data protection. Noer said: ‘Part of what we accomplished with RAID6+ is actually triple parity protection, where we have, in addition to a RAID6 level of protection against two simultaneous device failures we also have a third level of parity protection that prevents the system from ever having to rebuild in the first place, if you run into sector errors.
The increased demand for storage in both HPC and enterprise markets is having a positive impact on the technology, as Claffey highlights: ‘There is a symbiotic relationship between the needs of the enterprise markets and traditional HPC. Enterprise in particular helps to drive price performance and that feeds back into the traditional markets.
‘This provides a feedback loop that really generates a better technology, a better solution for everyone,’ Claffey concluded.