HPC colonises the desktop
Desktop HPC, personal supercomputers, personal workstations, whatever you call them, there’s no standard definition of what they are. But most suppliers generally agree on what they aren’t: standard office PCs. Simply having lots of cores just isn’t enough of a boost for the applications where desktop HPC is generally applied: typically in engineering design, analysis and visualisation. And as you’ll see in this report, desktop HPC involves not only the familiar tower form factor, but also small racks that accommodate blades and that are designed to sit beside a desk in an office environment.
HP refers to this class of system simply as a ‘workstation’, and Jeff Wood, director of worldwide marketing for workstations, notes some key differences from office PCs. In short, it must have the ability to expand its infrastructure to accommodate large amounts of memory, disk storage, powerful graphics for visualisation and possibly cards with a GPGPU (general-purpose graphical processing units). He also suggests that you not forget about something as mundane as the power supply. Office PCs might have a 300W supply, but when you consider that a high-end graphics card alone might need 250W and the processor can easily need more than 100W, the problems are obvious. Thus, for instance, HP’s Z800 system comes with an 1,100W supply. And while it can run on standard line voltage and without any special cooling – which makes it easy to place anywhere in an office environment – there is a water-cooling option in the event you want to keep fan noise to a minimum.
In the mind of Transtec’s HPC specialist Oliver Tennert, HPC focuses on the application, not the system itself; HPC denotes the goal. He adds that you can’t simply take off-the-shelf components and build an HPC system; it requires special software and, in the case of a GPU card, also a compiler.
It’s reasonable to assume that nobody will ever have the fastest computer at any point in time on their desk, comments Herb Schultz, Deep Computing manager at IBM. The question instead becomes whether a user can run computationally intensive applications that fit within certain constraints – budgets, being in the office environment (low noise, low heat, line power) and having easy-to-manage systems that are standardised and thus run commercial applications.
Who needs desktop HPC?
Which users are most likely to use desktop HPC? According to Ian Miller, senior VP of sales and marketing for Cray’s Productivity Solutions Group, such systems have become particularly affordable and are very attractive for, among others:
- an engineering team looking for its first cluster;
- scientists/engineers hitting the limits of PC performance;
- a software developer looking for a dedicated system for development and testing;
- a design/simulation project that is just starting but needs room to grow.
In this regard, Steve Conway, research VP for HPC at market research firm IDC, points out that users are moving into this class of system both top-down (where users want immediate access to considerable power rather than waiting to have their jobs scheduled on a cluster) and bottom-up (such as a small engineering-services firm where the customer says ‘here’s the part we want’, but the company can’t get the work done efficiently on a standard office PC). In addition, rather than classify desktop HPC by number of cores or other features, IDC categorises these systems as those selling in the range between $10k and $100k. According to IDC, sales for this segment in 2008 were $1.96bn, and despite the global recession that number should grow to $2.7bn by 2013.
The low-end market has been underserved and users have been intimidated says Cray’s Miller. Now it’s no longer so scary, especially with tools such as Windows Cluster Server 2008 and the Intel Cluster Ready (ICR) programme. Configuring a desktop HPC system isn’t simply a matter of putting parts together; making them interact can be a daunting task. Even with a standard desktop PC users can have problems with a disk drive or its driver, and Miller adds that matters become considerably more complex when you have a whole bunch of ‘moving parts’. With ICR, the burden of integration falls on the vendors; users are assured that hardware and software elements all work together, applications will scale and take advantage of advanced architectures.
Almost 1,000 cores in a tower
A recent article in Scientific Computing World (‘A supercomputer chip for every man’, Feb/Mar 2009) gives an extensive introduction into the amazing performing boosts possible with GPGPUs (general-purpose graphic processing units), and Nvidia has packaged one of its processors in a card format that it supplies to various system integrators. Each Tesla card provides 240 computing cores and 1 teraflop of performance, and software written to run on GPGPUs has been shown to run as much as 100x faster than on a conventional processor.
A popular configuration among desktop HPC suppliers is one that combines dual quad-core Xeon 5500 Series (Nehalem) processors in a tower that can accommodate as many as four Tesla cards for a total of 960 cores and 4 teraflops of performance. Prices for such systems are approximately $12k depending on features such as disk space and memory.
One supplier in this category is Amax, which product manager James Huang notes was selected as one of Nvidia’s Tesla launch partners with the PSC-2n personal supercomputer. Its 1,200W power supply runs on 110 to 240V AC without any special cooling. In fact, notes Huang, because each Tesla card can consume as much as 200W, heat requirements mean that you can’t simply adapt any old chassis for such a system; these must be custom designed with adequate cooling for all system components.
Huang further explains that the education and research markets are showing great interest in Tesla systems, but that the biggest obstacle is on the software side. The cost savings with hardware are sometimes offset by the softwaredevelopment costs where it might be necessary to recode 30 per cent of existing programs to take advantage of the Tesla parallel processing. He also adds that these desktop HPC systems are sometimes used as a proof of concept for GPGPU computing, later leading to a full-on cluster deployment.
Super Micro Computer 7046GT-TRF
Along similar lines, Super Micro Computer recently announced a 4-teraflop tower system based on dual Xeon 5500 processors with support for four Tesla cards along with three additional PCIe slots for high-bandwidth I/O. The box also features redundant 93 per cent efficiency 1,400W power supplies. The 7046GT-TRF is housed in a 4U rack-mount convertible tower chassis that supports as many as 11 full-height, full length expansion cards and eight hot-swap 3.5-inch SAS/SATA drives.
A third member of the club offering four Tesla cards with dual Nehalem processors in a tower is Colfax with its CXT5000. The vendor does note that you can use the four Tesla cards only under Linux; Windows requires a discrete graphics card and in that case you can add a maximum of three GPUs.
Further companies with a similarly configured tower include Velocity Micro with its ProMagix VSC240 and Transtec with its 1100R Nehalem supercomputer, which uses vertical rails to accommodate two server cards in a 1U format.
While four appears to be the maximum number of Tesla cards you can plug into a tower, according to the Nvidia website there are even more suppliers offering personal supercomputers with three Tesla cards. They start with some of the players already mentioned (Amax, Colfax and Velocity Micro) and then go on to include Microway (WhisperStation PSC) and Penguin Computing (Niveus HTX). Another member of this group is Silicon Mechanics with its HPCg A2401, which is based on a quad-core AMD Phenom II processor; note, though, that loading three GPU cards results in each PCIe x16 slot operating at x8 speeds – something worth examining no matter which vendor you are looking at.
Nvidia additionally points out that several major OEMs provide Tesla-based personal supercomputers including the HP Z800, the Dell Precision T5500 and T7500 as well as the Lenovo ThinkStation D20 and S20. Consider the Z800 in more detail. This tower configuration comes with two CPU sockets and thus 8 cores using Nehalem processors and it also has support for Intel Hyper-Threading Technology with thread-level parallelism on each processor to give you essentially 16 cores. You can pack the system with up to 192GB of memory and 7.5TB of disk storage, Nvidia Quadro FX graphics and the box can also accommodate two Tesla cards, as well.
HP also points out that at this time the Z800 is the only workstation certified for Parallels Workstation Extreme, software that runs multiple isolated desktops environments simultaneously. You can run multiple operating systems − whether 32- or 64-bit XP, Vista, Windows 7 or even Linux − to support different applications all on the same box and thus to ‘virtualise’ 3D, visualisation and hi-definition video programs at full speed. While running more desktops on each server significantly reduces the total cost of ownership, an interesting concept is that an engineer could, for example, replace multiple workstations sitting on his desk, each one running a different graphics-intensive design or analysis program, such as CAD under Windows and a CFD code under Linux, all on the same hardware.
Users often place a tower next to their desks, so why not put a small rack there? These systems use a ‘blade’ concept so you can populate them with the processing power you need or can afford. They’ve become quiet enough not to be a distraction in an office environment, they plug into standard wall sockets and weigh little enough that they can be safely placed on standard office floor systems. When on casters, they can be moved around for a quick change in the work location without the need for any special tools.
A recent offering in this vein comes from Cray, which has expanded its entry-level systems with the CX1-LC (where ‘LC’ refers to ‘light configuration’, supporting four nodes instead of the eight in the CX1). With a base price of less than $12k, it is available with the Xeon 5500 Series and supports from one to four blades for as many as 32 cores of Intel processing; other blades are available for visualisation and with GPGPUs. The 7U box is the same size as the CX1 (31 x 17.5 x 35.5cm) and thus is field upgradeable to a full 8-slot CX1. It rests on a simple deskside pedestal, its 800W power supply plugs into a standard outlet and the box complies with the ISO NR45 noise regulations for general office areas. The system can be easily expanded by connecting up to three chassis with an internal switch infrastructure. An interesting plus is a front-control panel with a touch-screen LCD that supports a configuration wizard and also provides information about local server nodes, the enclosure and modules.
Another example is the BoxClusterDSN. Measuring 568 x 434 x 640mm, this deskside unit provides four positions, each of which can be populated with two quad-core Xeon L5520 processors for a total of 32 cores; it also holds 384GB of memory and 16 hard disks. It is available in two variations, for either 110V or 220V AC. The four nodes are in a proprietary format, but the external enclosure size is the same as a standard rackmount width so it’s possible to put this system into a server rack, if desired. It draws less than 15A of current, weighs fewer than 95lbs and specs a noise level of 51.2dB(A).
Also suited for optional mounting in a 19-inch rack is SGI’s CloudRack X2. It measures 24.4-inches (14U) high and with a base of 17.6 x 41 inches has a footprint of five square feet; it supplies nine slots for fanless and coverless 1U server trays installed vertically as well as 2U available for networking equipment. It accommodates a maximum of 36 processors (216 cores). Running from 180-250V AC (no 110V AC), the CloudRack X2 is thermally optimised to safely operate in ambients up to 40° C (104°F). The vendor supplies a number of trays that take advantage of the latest processor options, including GPGPUs, and most CloudRack trays include support for up to either six or eight SATA or SAS hard drives.
To address this market, IBM provides the BladeCenter S chassis, a 7U rack-optimised chassis that accommodates 1 to 14 blades and plugs into a standard outlet. The BladeCenter S Office Enablement Kit is the ideal way to deploy it in an everyday office. The kit enables several office-friendly features such as the built-in Acoustical Module, a front locking door, 33 per cent or 4U available for expansion.
The market for computers selling for less than $50k is not nearly as big as people have portrayed it to be claims Herb Schultz of IBM. He believes that desktop supercomputing has not captured the imagination of users. Instead, the model for the delivery of compute cycles is changing. Why buy a system and manage it when you might not always need this extra power? And where users previously had to purchase the service of many nodes for long periods, now they can ‘rent’ hundreds of processors for half a day – computing on demand has become very discrete and inexpensive. Renting cycles in this way can also alleviate a number of problems. There are no worries about sufficient electricity; no system administration is necessary; and there is no liability, because of an extra level of security; data centres supply backup recovery, so you don’t have to worry about what happens if a key computer gets stolen.