HPC: openness, chaos or a split?
At the US supercomputing show in Austin, Texas, Tom Wilkie found the supercomputing industry in an upbeat mood, but also changing commercially in ways that reflect the impact of new technologies.
High-performance computing is now too big for any one company. This was the basic message behind the flurry of announcements about new products and the winning of prestigious contracts at SC15, the US supercomputing conference in Austin, Texas, at the end of November:
The event is always an opportunity to take the pulse of the industry and commercial decisions are now being made that in the main reflect technological decisions about which processor line to follow and which architectures to specialise in. Other analyses in this series of reports from SC15 can be found at Does the future lie with CPU+GPU or CPU+FPGA? and China's Long March continues.
SC15 also marked the point at which it is unarguable that this is the age of acceleration. As Nvidia proudly announced at SC15, more than 100 accelerated systems are now on the list of the world’s 500 most powerful supercomputers, published at the beginning of the conference. Some 70 of them use Tesla GPUs – representing a compound annual growth of nearly 50 percent over the past five years. IBM, in turn, announced new offerings centred ‘on the tight integration of IBM’s Power processors with accelerators’.
The diversity of hardware that now has to be deployed to keep improving performance -- and the sheer scale of the massively parallel machines being built and envisaged for the near future – means that there is no standard architecture and no standard way to program them. This diversity is too big for any one company to embrace all the technologies available by itself – let alone develop software to make programming them easy. So the industry is fragmenting as individual companies follow the commercial logic dictated by their technological choice.
To be successful in high-performance computing (HPC) today, it is no longer enough to sell good hardware: vendors need to develop an ‘ecosystem’ in which other hardware companies use their products and components; in which system administrators are familiar with their processors and architectures; and in which developers are trained and eager to write code both for the efficient use of the system and for end-user applications. No one company, not even Intel or IBM, can achieve all of this by itself anymore.
Two announcements on the eve of the conference crystallised these developments. Firstly, The Linux Foundation has set up an OpenHPC Collaborative Project to provide a new, open source framework to develop software for high-performance computing. Among the commercial members are Cray, Intel, Lenovo, Dell and HP (or Hewlett Packard Enterprise, as it is now called).
Secondly, IBM also made major announcements both about expansion of the work of its OpenPower network, and about a close tie-up between IBM and FPGA maker Xylinx. In an interview with Scientific Computing World at SC15, IBM’s Sumit Gupta was explicit about the need for many companies to contribute their specialisms and expertise to high performance computing: ‘The key message is that it is an unrealistic expectation that one company can build the best technology across the board. Our core strategy is that we don’t believe one company can build the best accelerator, the best network, etc. Market dynamics are important to sustain the industry over the long term.’
With the establishment of the OpenHPC Project, it begins to look as if the high-performance computing industry is in danger of splitting, between those working with Intel (which is a founder member of the OpenHPC Collaborative Project); and those favouring the IBM Power processor, given that IBM has already created the OpenPower network to develop its ecosystem. Although some companies (Penguin, for example) are members of both, Nvidia, which is closely partnering IBM, was conspicuous by its absence from the list of founder members of the OpenHPC project.
But in an interview with Scientific Computing World at SC15, Barry Bolding, senior vice president and chief strategy officer at Cray, warned that ‘OpenHPC has to be vendor-neutral and architecture agnostic: you have to do something that is broad to succeed.’ He said that if it became ‘architecture-centric’ then it would be going down the same path as the OpenPower foundation, which Bolding would regard as a retrograde step.
Bolding sees the most valuable role of the OpenHPC collaboration as working on problems that are common to many systems and many vendors – such as future operating systems, for example – while a company such as Cray would continue to develop its own technology in-house. An example would be Cray’s DataWarp storage accelerator, where working on this technology line would act as a differentiator for the company in the HPC marketplace.
Addressing software challenges
The OpenHPC Collaborative Project is aimed at software, which remains possibly the biggest challenge for HPC. It will provide a new, open source framework, consisting of upstream project components, tools, and interconnections to foster development of the software stack. The members of the project will provide an integrated and validated collection of HPC components that can be used to provide a full-featured reference HPC software stack available to developers, system administrators, and users.
Jim Zemlin, executive director of the Linux Foundation, pointed out that: ‘The use of open source software is central to HPC, but lack of a unified community across key stakeholders – academic institutions, workload management companies, software vendors, computing leaders – has caused duplication of effort and has increased the barrier to entry. OpenHPC will provide a neutral forum to develop an open source framework that satisfies a diverse set of cluster environment use-cases.’
Reducing costs is one of the primary goals of the project. By providing an open source framework, it is hoped that the overall expense of implementing and operating HPC installations will be reduced. This will be achieved by creating a stable environment for testing and validation, which will feature a build environment and source control; bug tracking; user and developer forums; collaboration tools; and validation.
Helping the developers
The result should be a robust and diverse open source software stack across a wide range of use cases. The OpenHPC stack will consist of a group of stable and compatible software components that are tested for optimal performance. Developers and end users will be able to use any or all of these components, depending on their performance needs, and may substitute their own preferred components to fit their own use cases.
Developers were also being catered for on the IBM side of the fence, with an announcement that IBM and fellow OpenPower members have extended their global network of physical centres and cloud-based services for no-charge access to Power-based infrastructure that also uses accelerators. During the week of SC15, a Power8-based accelerated computing cluster developed by IBM and the Texas Advanced Computing Center (TACC) began accepting requests for access from academic researchers and developers. Meanwhile, SuperVessel, a global cloud-based OpenPower resource launched in June, now provides GPU-accelerated computing as-a-service capabilities, giving users access to high-performance Nvidia Tesla GPUs to enable Caffe, Torch and Theano deep-learning frameworks to launch from the SuperVessel cloud.
Penguin Computing is a member of both the OpenPower and the OpenHPC networks because, according to Phil Pokorny, the company’s CTO: ‘We want to have the best of hardware from across the spectrum.’ He welcomed IBM’s decision to open up the Power architecture, reporting that in the past customers had felt that, ‘when the box was only available from IBM it was overpriced and over-engineered’. Now that it is opened up, he said, Penguin can offer other form factors, and be more nimble and closer to customer requirements.
Ultimately, Penguin’s customers are interested in solving end-user application problems and Pokorny pointed out that in high-performance computing there are a lot of very closely coupled codes. For this class of problems, he went on, even InfiniBand is not fast enough -- and that was one of the reasons that Penguin found the Power technology interesting. It could offer be two to three times as much memory bandwidth compared to Intel, and so would boost performance by attacking a real bottleneck, he said.
‘Intel needs the competition’, he continued, and ‘this is part of our motivation for exploring other technologies.’ It was a matter of ‘spreading our eggs’ rather than having them all in one basket. ‘This makes us a better partner for our customers,’ he said. He stressed that for an integrator such as Penguin, the unifying thread was the Linux operating system. ‘We specialise in Linux. We only have to train our support staff in one system – Linux.’
Convergence of HPC and Big Data
Gupta sees the OpenPower initiative as an addition to, not a change from, IBM’s existing Power server line, which is widely used in commercial and business computing by airlines, ATMS, and back office systems in major companies. OpenPower will allow people to explore the higher end of computing in a way that was not possible before, because it goes after sectors that have a lot of data – HPC; Big Data; and the Cloud.
In this sector, clients will not just be buying a Power system, he explained, but the promise of a whole ecosystem that is now being created, that will deliver applications that will allow them to achieve their objectives. In the past, he said, IBM could not pursue Big Data with its Unix product line: ‘But now we can because of the partnerships in OpenPower. It can’t just be IBM. Clients have to believe in the ecosystem.’
Cray’s Barry Bolding also sees convergence between HPC and Big Data. However, because Cray is tightly focused on the high-end of supercomputing, he believes that the company can offer solutions where it can bring traditional simulation and analytics together. He cited the way in which, in oil and gas for example, Ethernet used to be king in previous years but now that will fade out in favour of low-latency communications, he believes, and so the Ethernet cluster will be in decline.
Market chaos in HPC?
Bolding sees greater fragmentation in the industry than simply the Power/non-Power split. He cited the risk that customers who seek the security of a long-term relationship with their vendor may see recent developments almost as ‘market chaos’. IBM has off-loaded its x86 business to Lenovo; HP has split itself into two businesses; in October, Dell spent $67 billion on the largest-ever acquisition in the technology industry, buying EMC, the world’s largest provider of data storage systems. As a consequence of such developments, these companies’ focus on high-end computing (whether HPC or Big Data) may well be diluted.
Unsurprisingly for a senior Cray executive, Bolding sees such developments as reinforcing his own company’s standing in the market. As reported in Scientific Computing World’s website in June, Cray is only interested in customers who want high-end computing (and also, in this age of big data, in those who want advanced analytics). Those who have problems that could best be solved on commodity hardware should, in effect, look elsewhere. Because of this single-minded focus on the high end: ‘People can be certain about Cray,’ he concluded.
Cray and IBM are very different companies, of different sizes and with different interests; so not unnaturally, IBM sees things in a different light. Its philosophy, according to Gupta, is, so to speak, to have Power as the centre of ten million or so servers sold every year, but not necessarily to expect that all the ten million servers will be made by IBM. ‘We want to capture as much of the market as we can, so we need partners such as Penguin,’ he said.
In many ways, these divergences and different directions are a sign of a healthy and vigorous industry. Three years ago, there was an almost tangible air of depression at the US Supercomputing conference and exhibition. The turn-around started last year, with the announcement of the first procurements in the Coral project to build next-generation supercomputers at two of the US Department of Energy’s national laboratories (the third announcement followed some months later.) This year, the upbeat mood has continued, with clear optimism about the future on all sides.