With the world of finance in meltdown, relying on HPC solutions to predict the future is vital, as Stephen Mounsey discovers
When IBM discusses computing in the financial services, it draws upon the image of the American Old West. A gunslinger’s life could depend on his ability to ‘shoot straight, shoot fast, and shoot often’, a situation that encouraged the rapid adoption of any technology perceived to give even a marginal advantage, no matter what the expense. The same paradigm has, until recently, existed among financial firms, with banks willing to invest large sums in both software and hardware in order to generate even a small advantage over their competition.
Times are tough for the financial services, and while the drive to upgrade is always going to stem from the need to remain competitive, it is urgently necessary for firms to reduce costs and increase effectiveness. As many companies offering HPC are finding, technologies that increase efficiency are likely to be highly sought after for the duration of the credit crisis and beyond.
Within the financial services, computational power is used for risk modelling, commodities pricing, algorithmic trading, and for data-feed processing. HPC is most often applied to commodity pricing and risk analysis. These are the areas requiring the most complicated simulations, making use of volatile data stemming from billions of trades per day, and which integrate the probabilities of all possible outcomes – outcomes that become exponentially more numerous with every second of analysis. Performance of the HPC solution directly influences a firm’s ability to remain competitive.
Conventional grid-based solutions
Many financial players, be they large multi-national investment banks or smaller, independent hedge funds, make use of grid computing as a cost-effective way of achieving the computational power they need. Grid computing brings a large number of computers to bear on one difficult problem simultaneously. In the financial services, the computers used are usually the desktop machines of the bank’s staff, as these are within the network’s security and are very numerous. The gridding relies upon specialised middleware to divide up the problem, and to delegate small tasks to these constituent elements. Grid systems differ from a conventional cluster in that they tend to be more loosely networked, constituent nodes may vary in specification, and the nodes may also be geographically dispersed. While any given cluster or grid tends to be dedicated to a particular application, grids make use of general-purpose grid software libraries and middleware.
Mitsubishi UFJ Securities makes use of one such middleware product; GridServer, from New York-based DataSynapse. Igor Hlivka, head of the quantitative analysis group at Mitsubishi UFJ, explains that the bank makes use of GridServer in order to co-ordinate the resources of more than 2,500 cores. Although off-the-shelf products exist, the level of control required is such that the group has written its own very high level APIs using Mitsubishi’s own language, M#. These APIs are implemented as a level of control above the GridServer, in order to further streamline and optimise activities depending upon the specific need of the bank at any given moment. Nocturnal requirements, for example, differ greatly from intra-day requirements; overnight, the grid runs the process-heavy risk engine in order to check the bank’s position with respect to its assets, in preparation for the next day’s trading. The huge variations in number and type of commodity in question make this a difficult task. Intra-day calculations lean towards pricing and forecasting, although around 800 cores remain devoted to the risk engine. Hlivka is very aware, however, that the grid’s constituent computers are in day-to-day use by personnel elsewhere in the bank, and so the gridding has to be very carefully optimised to ensure that the drain on resources is not noticed by individual users, a process he calls ‘desktop harvesting.’ Hlivka says ‘there are political implications to this [desktop harvesting] as well; we would not wish to be accused of “stealing” computing power from the end users of the computers.’
Future of gridding
First Derivative is a Newry, Northern Ireland-based company, which provides software and services to the capital markets sector, and has most of the toptier investment banks as customers. The company works closely with KX Systems, a California-based firm providing the underlying high-performance database technology, on top of which First Derivative’s software runs. Michael O’Neill, chief operating officer of First Derivatives, explains that ‘the very high-performance database systems can perform hundreds of thousands of times faster than a solution based upon Oracle or SQL Server.’ Most of the company’s data-handling applications can run happily on four CPUs, but O’Neill adds that some analysis requires complicated simulation, necessitating a ‘big grid deployment’.
The company has been part of a UK-wide experiment looking at the next generation of grid computing, a project known as NextGrid, alongside many other industrial players and stakeholders, including the School of Computer Sciences, Queen’s University, Belfast. The project succeeded in establishing new standards and specifications for the next generation of grid computing. O’Neill considered First Derivative’s involvement a great success, citing the industry-academia crossover as especially useful to the company. He said that ‘in order to develop, financial simulations need to become faster; for us, advanced grid technology is an obvious way of achieving that.’
Impact of tougher times
The recent tightening of purse strings has resulted in a drive to more cost-effective solutions; firms still need to increase the competitiveness of their computerisation by reducing latency (shooting faster), by improving reliability (shooting straighter), and by increasing capacity (shooting more often). The systems in place are still being updated as often as they were five years ago, only now updates need to happen at a reduced cost, requiring fewer man-hours, and minimal additional infrastructure. Data centres necessarily represent a huge investment; a 13,000m2 data centre was said to account for much of the £700m paid in September 2008 by Barclays for the assets of Lehman Brothers, with the stricken bank’s New York headquarters accounting for £500m of that sum. Furthermore, powering a data centre can account for up to 10 per cent of an institution’s IT budget, with market research firm Gartner predicting that this figure will rise to as much as 40 per cent by 2013. Space is also at a premium; a standard rack is able to hold around 42 discrete 1U-sized devices.
Banks are moving their data centres out of their city centre headquarters to locations where space is cheaper and electrical demands can be reduced. Some, such as Citibank in Frankfurt, are going as far as to publicise new installations as sustainable; they have turf roofs, are made of wood, and are powered in part by wind turbines. This added distance does, however, increase latency, and so it is not ideal for high-performance computing.
The corporate and investment banking arm of French bank BNP Paribas recently released news of a promising approach to cutting its electricity bills and increasing performance simultaneously. Around one teraflops of calculation for the global equities and commodity derivatives branch has been moved to graphical processing unit (GPU)-based systems (see HPC Projects, Feb/Mar 2009). The new system achieves a 15x overall speed-up on the simulations run upon it. The calculations are run on two 1U Nvidia Tesla components, each containing four GPUs, which are managed by an X86 server. These eight GPUs, drawing 2kW in total, have replaced a cluster of 500 conventional X86 CPUs, which drew around 25kW. A 100x increase of computing power per watt was achieved, rising to 190x when reduced cooling requirements are considered.
So how does GPU-based computing provide this performance? Monte Carlo methods are useful when dealing with a high degree of uncertainty in input values, making them suitable for risk analysis in business. The approach relies upon many iterations of random sampling to compute their results. Nvidia’s GPUs are based on its massively parallel Compute Unified Device Architecture (Cuda), which is optimised for high iteration. The GPUs each contain a random number generator which, when taken on its own, is 100x faster than a standard X86 solution. When fed back into the Monte Carlo simulations, the speed-up is in the order of 25-50x.
Responding to change
Sumit Gupta, senior product manager for Nvidia’s Tesla group, says: ‘When the markets move, traders need to be able to rapidly build and test a pricing model in order to make new financial products available.’ In order to be useful, the hardware has to be effective when used by a non-specialist. Cuda is currently coded for using a slightly modified version of C, with libraries available for Java, Python, .Net and others. However, most day-to-day users of very high-level languages (VHLLs) within the financial services dabble with programming by necessity, rather than choice, and so optimising a program for a particular architecture eats into time that could be better spent elsewhere. Some developers of VHLLs are beginning to build Cuda compatibility into their mathematical platforms, e.g Mathematica, and stand-alone solutions for running common VHLL functions, particularly Matlab functions, on a GPU have been developed by Nvidia, AccelerEyes, and others.
Of leptons and options
Tech-X is a Colorado-based software company specialising in massively parallel solutions. The company markets its GPULib product, a library of Cuda-optimised mathematical functions, bound to a number of VHLLs, including Matlab and IDL. The company’s main business is within demanding applications such as high-energy physics, and GPULib was initially developed using funding from NASA. However, as Peter Mesener, vice president of space applications, says, a technology that uses low-cost hardware to solve complex systems of equations generates interest from many industries. The libraries are such that no knowledge of GPU programming or memory management is required in order to make use of the speeds offered by a GPU.
Tech-X is in the process of using GPULib to accelerate processes used by the Chicago Trading Company (CTC), a well-respected provider of pricing and liquidity data on all US derivative exchanges. CTC, like all market-making firms, makes money by accurately estimating the value of financial instruments or commodities, and by making trades when its valuations show a discrepancy across two products. An opportunity may only be present for only a matter of seconds, and therefore data must be updated and analysed as quickly as possible. One particular financial model under development by the CTC is required to solve a system of around 500k linear equations in 0.25 seconds; that’s two million linear equations per second, each with several hundred unknown variables. This project is currently in an evaluation state, running on a standard PC and a low-cost gaming GPU, but this is sufficient to give a first glimpse of the achievable performance without a significant investment. Eventually, a Tesla set-up, similar to BNP’s, will provide a more cost-effective solution than a comparable cluster would.
This ease and speed of testing, backtesting, and implementation is important to Igor Hlivka too, who is carrying out research into the benefits of moving some of Mitsubishi UFJ’s operations to GPU-based hardware. Hlivka cites his main motivation as the ‘superior cost-efficiency’ when compared to grid or cluster solutions, both in terms of time and hardware.
Hlivka’s existing grid-based hardware has trouble with exotic derivative products, basket derivatives and exchange derivatives; these assets require the most complex calculations, as the risks involved depend on a great many factors. The required multi-dimensional simulations can be difficult to accelerate using conventional means, as they behave non-linearly, that is, doubling the number of cores available will not double the number of products the grid is able to process within a given time.
‘In contrast to this,’ Hlivka explains, ‘when using GPU-based hardware, processing time required for a simulation can be made to scale linearly with each new product added to it – a highly desirable characteristic.’