Thanks for visiting Scientific Computing World.

You're trying to access an editorial feature that is only available to logged in, registered users of Scientific Computing World. Registering is completely free, so why not sign up with us?

By registering, as well as being able to browse all content on the site without further interruption, you'll also have the option to receive our magazine (multiple times a year) and our email newsletters.

A parallel universe

Share this on social media:

Topic tags: 

Cheaper clusters, multi-core chips, and ever more complex problems to solve mean that the era of desktop supercomputing is upon us. Even Microsoft is getting in on the act, as Tom Wilkie reports

For The MathWorks and its flagship product, Matlab, the future is parallel. The MathWorks aims to bring personal supercomputing to the desktop, according to Jim Tung, the company’s chief development officer. Towards the end of last year, the company announced version two of its Distributed Computing Toolbox, a significant milestone on the road to making Matlab a fully parallel program.

The main driver for these developments has been the availability of low-cost hardware over the past decade, Tung said. Nowadays, clusters of computers can deliver – cheaply – the processing power that at the beginning of the 1990s was attainable only with purpose-built supercomputers such as the Cray. And the price-tag at the time matched the complexity and sophistication of the hardware.

But if the price of hardware has come down dramatically, then the remaining obstacle to personal supercomputing for scientists and engineers has been the software. Programs originally developed to run on single processors do not easily lend themselves to running on many nodes. They need to be modified to take account of parallel processing – and, in some cases, need to be re-programmed every time they are run on a different number of nodes.

The MathWorks found that some of its users were customising MatLab to make it run on clusters. This did not involve altering the Matlab core code, but rather adding extensions and functionality to allow it to do parallel computing. Feedback from such users led to the company bringing out its first distributing computing toolbox in 2004. This first edition provided many benefits – it was easy to parallelise programs, the performance was faster and scaled with the number of processors, while changing hardware was not an issue. However, there were some drawbacks. Version one was restricted on which of the then existing schedulers could be used; there was no inter-processor communication; and it was hard to administer.

According to Tung, a second driver bringing personal supercomputing to scientists and engineers has been this development of operating systems and schedulers that support parallel operation and thus allow the hardware clusters to be used. One of the most popular is LSF from Platform Computing.

For The MathWorks, this has meant that ‘platform support’ has become increasingly important – and complex – over the past few years. A key factor in version two of the Distributed Computing Toolbox is that a range of third-party schedulers can now be used in conjunction with Matlab. The toolbox has also adopted the MPI standard for processor communications and the company introduced a dynamic licensing arrangement – users do not need to buy a Matlab licence for every node in their cluster.

The MathWorks clearly sees even version two as but an intermediate step towards the fundamental and far-reaching goal of full parallel programming in Matlab. Tung says: ‘Supercomputing is already present from the point of view of cost and hardware, but a restructuring of Matlab will make it a commodity – not homebrewed, not handcrafted – the overhead in building applications was high, but will now come down.’

For the colossus of computing, Microsoft itself, the future is also parallel. The company has taken to heart predictions from Intel and other chip-makers that processor speed is unlikely to increase in the future – instead the route to faster computing will be to place several processors on one chip. Speed in future will be achieved from parallel processing.

So Microsoft is moving in the direction of parallel processing – and eyeing scientific computing as the place in which to learn how to do it effectively. Last year, the company recruited the head of Britain’s publicly funded e-Science programme, professor Tony Hey, to become corporate vice president technical computing. This summer, professor Hey has been hitting the road to sing the praises of the new Windows Compute Cluster Server.

Microsoft is not aiming at the ultra-high performance computing market, but wants to put a cluster under the desk of small- to medium-sized companies and to bring supercomputing to the desktops of ordinary biologists, engineers and other scientists. The vision is remarkably similar to that of The MathWorks – desktop supercomputing. It is perhaps no surprise to learn that Microsoft has been cooperating with several vendors of scientific software, including Wolfram and The MathWorks for example, to customise their applications to run on the infrastructure provided by new Windows cluster operating system.

Microsoft may even bring out technical and scientific software of its own, Professor Hey suggested, marking a departure from the company’s previous exclusive focus on the business and personal computing marketplaces.

But perhaps most significantly, Professor Hey believes that interoperability is an inevitable consequence of a parallel future. Microsoft can no longer expect to dominate every application – whether in science or business. ‘We have to co-exist and interoperate with the Linux and Open Source world,’ he says. He sees Microsoft’s task in scientific and technical computing as ‘engaging with the scientists trying to develop quality tools that biologists and engineers like and that can interoperate with their favourite software.’

The phrase ‘working with the community of scientists’ is a constant refrain. From his own experience, and observations of how engineers and research scientists operate, Professor Hey feels that there is a need for a cluster computing infrastructure that operates ‘out of the box’, so to speak. ‘We want to demystify parallel computing and make it easy to use so that it is simple for an engineer to plug in a box without having to have a Linux expert on hand to do the command line interface.’ Too many graduate students, he believes, have begun research into, say, geophysics or biology, and come out as Linux programmers. Because graduate students are cheap, they’ve been used to service their research group’s basic computing needs rather than getting on with the scientific research that the computing power ought to make affordable and easy: ‘Instead of doing science, they end up as Linux gurus’.

Virtually all areas of science and engineering are generating more data. ‘How do you save data so it can be reconstructed in 20 years’ time? There are petabyte surveys in many fields. Just managing it is a problem, let alone archiving it. It’s just what you don’t want to give to a graduate student.’ According to Professor Hey: ‘If we, Microsoft, can produce the infrastructure, that means a generation of graduate students is not re-inventing the wheel. And we can offer tools all the way along.’ And the commitment to interoperability means that choosing Microsoft’s infrastructure does not mean that scientists also have to choose the company’s application tools. ‘We need to give people a choice and convince them that Microsoft tools are better.’

Both The MathWorks and Wolfram have announced support for the Microsoft Compute Cluster Server. Wolfram was one of the first scientific software companies to take parallel computing seriously. gridMathematica was launched several years ago and has already gone through several updates and enhancements and then, in November 2005, the company announced ‘personal gridMathematica’.

Roger Germundsson, Wolfram’s director of research and development, explains that in the academic arena gridMathematica had been widely used for some time. It is sold in packages of eight – one master plus ‘slave’ nodes, reflecting the original architecture of Mathematica, which was split between the user interface that talked, using MathLink, to a kernel that did not necessarily need to be on the same system as the interface.

Personal gridMathematica was aimed at a four-core system that is intermediate between the ‘standard’ and the grid versions of the software. The development took place in the knowledge that Intel and the other major manufacturers were looking to put multi-core architectures on their future chips. ‘I’ve had my eye on the four-processor chip development for some time,’ Germundsson says. ‘When Intel says this, it changes things.’

But the background and experience of likely users of gridMathematica was also a factor in the company’s approach to parallel computing. Germundsson noted that few potential users were likely to have access to massive clusters – most would be in the eight- to 16-node category. They would be people with little or no prior experience in parallel programming. But, he went on, now that more people are getting access to parallel computing technology to run real-life tasks, where they can split the computation to reflect the nature of the task, it opens up new ways to effective parallel treatments.

The problem of legacy-software that was written for single processor machines will gradually disappear ‘once you have easy-to-use tools and you can see it’s not complicated and you can see natural ways to split or organise the task.’ This will, he believes, take time, ‘but there is an opportunity now.’ He can already see many of Wolfram’s customers picking up on personal gridMathematica: ‘Our customers are very quick to get the best of everything. If there is parallel processing that is easy to use, then it makes sense to use it more.’

So will Mathematica be re-shaped into a fully parallel application? Here, Germundsson cautions that it would take time: ‘Mathematica is not intended for a few hundred people with access to supercomputers. Some operations already do run in parallel and over time the whole system will be parallel.’

The theme of connecting to and re-using existing applications and numerical libraries without having to deal with the complexities of parallel programming is very much at the core of what Interactive Supercomputing (ISC) is trying to do. In the middle of June the company announced the launch of a new version of Star-P, its parallel computing ‘platform’ software. According to Ilya Mirman, vice-president of marketing for ISC, ‘what we’re doing is putting power into the hands of desktop users without asking them to jump through a lot of hoops. We believe you should be able to plug in and re-use existing codes.’

Mirman is concerned at what he sees as a widening gap between what hardware is capable of and what software can extract from it. In the past, he said, many end users had to develop their own parallel applications software, because it was both difficult and there were few vendors that had parallelised their code. It used to be a niche area where expensive machines were used to run the largest problems – but now, he believes, all software has to take advantage of parallel processors.

‘What’s really driving the need for supercomputers is that the problems are growing more complex than some years ago and faster than single cores can catch up,’ he says. The price of parallel systems is dropping to make them a cost effective tool – just in time for the engineers and scientists who need them.

He continues: ‘The right paradigm is to decouple the programming – the way the engineer writes the code – from how it is executed. End users want to focus on their science. We’re building a platform [Star-P], not an application. Because it’s an open platform people can plug in different codes – we will interface with all the application software. And it is a hardware-agnostic platform.’ He points out that there are two modes in which parallel applications can work – task parallelism and data parallelism (working with huge data sets). ‘We’re the only solution that enables both,’ he says.

The big applications, as far as ISC is concerned, are in life and earth sciences, aerospace, defence and intelligence work and, to a lesser degree, economics and financial applications. ‘Break-through power – that’s what we’re delivering,’ he concludes.

  • The High Performance Computing division within British systems integrator Compusys has launched its ‘Cluster in a box’ solution. There are three basic variants, all based on AMD Opteron Dual Core Processor technology: eight-node, 32-core; 16-node, 64-core; and 32-node, 128-core. Upgrade options include three processor speed upgrades and one memory upgrade.
  • Tyan Computer Corporation, the US-Taiwanese high-end motherboard and system maker, has introduced Typhoon PSC, a Personal Supercomputer range, that can accommodate up to eight processors and 64Gb, delivering up to 70 Gigaflops performance in a unit no bigger than two desktop PCs.
  • Typhoon PSC is intended for users of compute-intensive applications in graphic rendering, data warehousing, financial and statistical analysis, scientific research, and the oil and gas industries. According to Jeff Smith, UK manager of Tyan Computer, it ‘brings supercomputing into the office’. The Typhoon PSC supports Linux and Microsoft’s Windows Compute Cluster Server 2003. Two models are available immediately, based on AMD Opteron 200 series processors or Pentium D processors. Each is configured as four cluster nodes with two processors each.
  • Microsoft has nominated the US National Center for Supercomputing Applications (NCSA) at the University of Illinois as the 11th Microsoft Institute for High-Performance Computing. NCSA’s new supercomputing system, Lincoln, which is based on Windows Compute Cluster Server 2003, is one of the most powerful supercomputers in the world, and its peak performance will approach 6.0 teraflops.
  • Appro, which provides high-performance enterprise computing servers, storage and high-end workstations, is to provide hardware for the Peloton supercomputing project at the US Lawrence Livermore National Laboratory.
  • The deployment will consist of three Appro 1U Quad XtremeServer Clusters with a total of 16,128 cores based on the latest dual-core AMD Opteron processors to meet the laboratory’s demand for 50-100 Teraflops simulations. The clusters will be used both for secret work on US nuclear weapon design and maintenance and for non-secret work.

  • Airbus, the European commercial aircraft manufacturer, has placed an order with Fujitsu Systems Europe for its SynfiniWay grid middleware to be used at the primary Airbus compute centres in France, Germany, Spain, and the United Kingdom. The SynfiniWay grid framework will be used as the common interface for heterogeneous HPC machines and data transfer. Virtualisation of these systems now gives Airbus greater flexibility in resource deployment, allowing a reduction in project times through meta-scheduling and task interleaving.

Intel predicts a parallel future

According to Intel, it is inevitable that the future of computing will be parallel. And if that is the view of the world’s most significant maker of microprocessors, then the rest of us had better listen and adjust accordingly.

The company expects that dual-core processors will represent the majority of the processors that it ships, by the end of this year. Dual-cores have two processors, sharing a common cache, manufactured on the same integrated circuit die. Dual-cores are only the start – the company is expecting to move quickly to quad-core and then to many cores.

James Reinders, director of Intel Software Development Products, says: ‘This is driven strongly by power considerations. In the past, we have turned up the clock rate and used more power. The performance per watt did not improve, and that could not continue indefinitely. Now, with multiple cores, we can even turn down the clock rate, and thus the power consumption, while increasing performance. This is not arbitrary. It is not a fix. It is here to stay.’

Reinders believes that the scientific community is well-placed to take advantage of these new technological developments, because many scientists and computational staff have had experience in parallel programming in the days when high-performance computing (HPC) was done on large, expensive machines. Now, however, multi-core architectures will become commodity computers.

In light of the transition to multi-core platforms over the coming few years, Intel is providing software development products to help all users create robust, optimised and scalable multi-threaded applications. These development tools include a maths library, for example.

Reinders highlighted scalability as one of three challenges presented by the new multi-core world. ‘If you have two or four cores, then your application does not need to be scalable or even that efficient, but you’ll still get some improvement. But you have to do some thinking to take advantage of 8 or 16 cores, and at 32 the program has to be scalable. There is plenty of experience from the HPC market showing that we can take advantage of machines with dozens of processors.’

Not all these processors may look the same, he said, once one gets much above 16-core. There is interest in ‘pipeline computing’ both from the scientific community and from graphics developers. Scientific computing applications could therefore benefit, he suggested, as a by-product of consumer demand for sophisticated computer graphics.

But the second obstacle that needs to be overcome in this transition to a parallel world is ‘correctness’. Errors can arise in a parallel world that could never have happened in traditional sequential computing. One can get a ‘race condition’, where one part of the program computes a value and the other part consumes it. Sometimes it may happen that the data is not computed in time and so the program will intermittently fail. Race conditions are difficult to detect because the failure may only be intermittent. Another ‘correctness’ issue is that of deadlock conditions – where one part is waiting upon another and vice-versa. Such conditions are ‘a critical problem and we need new tools to cope with it,’ he said, pointing out that Intel has already developed suitable tools for C++ and Fortran.

The final issue is that of ease of programming and maintainability. ‘Intel is a big fan of standards. We are not looking to new programming languages – the majority of people need to know that they can continue with Fortran or C++.’ MPI and Open MP are important here – one of the big uncertainties is whether everyone is using the same procedures for message passing and so Intel has produced its own MPI library based on the Argonne MPI 2.0 as a de facto standard.

Reinders welcomes the introduction of the Microsoft Compute Cluster: ‘An indicator that this is going mainstream. My group at Intel is there to get the best performance out of these machines. Microsoft is saying: ‘There is a market out there for these machines.’

‘Parallelism is coming to the masses and that’s good. I expect to see a lot of novel ideas pop up. There’s lots of work for us all. It’s tremendously exciting to see parallelism become the normal course of computing.’