Techniques once primarily used for science and engineering are now essential to financial institutions, where a small competitive edge can mean huge profits. Conversely, having the most up-to-date and comprehensive information may be useful in identifying extreme events such as those that led to the banking collapse of 2008.

For simple analysis, and basic mathematical functionality, tools such as Excel are perfectly suited to the job of analysing financial data sets. However, as the data set becomes larger and the analysis more complex, not only is more processing power required but algorithms, low latency programming, HPC, and even techniques such as machine learning are all being employed by the finance industry to make sense of the huge amounts of data.

‘It’s easy to optimise a trivial three or four asset portfolio in something like Excel or VBA (Visual Basic for Applications), but anything beyond that and you need far more processing power; you need better optimisers; and you need multicore or parallel computing as well,’ said Samir Khan, senior application engineer at Maplesoft.

Computational tools, such as Maple or Matlab, are often used for ‘agile and rapid development of quantitative algorithms,’ said Steve Wilcockson, industry manager for financial services at Mathworks. ‘Where industry-type black box solutions fall over, in terms of customisation, or you have lower level programming languages which can take too long to build algorithms, Matlab tends to sit somewhere in the middle of that spectrum. It is used across the financial services industry, anywhere really where you could find Excel, Python or C++.’

‘Maple is essentially a high-level programming language with advanced mathematical libraries and visualisation tools. There is functionality in Maple for pricing options, stochastic modelling, and developing your own applications and packages,’ said Khan.

Software such as Maple and Matlab is used for a range of finance operations by banks, insurers, and asset managers in applications ranging from financial modelling to long term forecasting and predicting risk scenarios for pensions. On the buy-side of finance, there are applications such as asset allocation, asset management, and also, in the smaller hedge funds, systematic trading, explained Wilcockson. Matlab is ‘used across all these applications in relatively equal measure.’

### Early developments

‘The roots of today’s computational finance software lie in the 1950s and 60s when computers were used in portfolio optimisation,’ explained Khan. The early 70s saw the development of modern option pricing; these models grew in sophistication; with that came advanced solution methods, and the associated demand for increasingly sophisticated software.’

According to Wilcockson: ‘Matlab has been used alongside computational finance methods for a long time, typically things with a heavy statistical and/or optimisation component: In the trading world, optimising particular parameters that might form the input to a trading model, for example, or perhaps optimising a cash flow for an insurance organisation, so that it can monitor its long term assets and liabilities.’

‘Both of these tools have a numeric engine and a symbolic math engine and that combination is unique,’ said Khan. He continued: ‘It allows you to develop and validate, say, algorithms for new option pricing models. These models can then be converted into language descriptions like C. At the very high level, you also have software engineers working for investment banks in the city of London, for example writing C#, java and C++, so C and Java is generally used for front-end and desktop applications – big enterprise systems and backend trading platforms.’

‘C# and Java have a number of high-level libraries which make it easy to build enterprise applications that, for example, connect to databases and trading platforms,’ explained Khan. However these generally have a high latency for what would be considered real-time analysis or real-time pricing.

‘It [Maple] is generally not used for real-time pricing, because it has latency issues, as do tools like Mathematica. They are generally not used in real-time applications. A tool like Maple is generally used for building new pricing models, optimising that mathematical description, and then converting them to a different form like C or C#. Then you can take that code and use it in your real-time pricing platform,’ said Khan

He continued: ‘This allows you to get very close to the metal on the computer: you can program right down to the registers on your memory, and that results in very fast, very efficient code with low latency.’

Low latency is important in applications such as algorithmic trading, whereby a computer enters trading orders according to an algorithm which executes pre-programmed trading instructions. ‘Algorithmic trading works on a timescale of microseconds, or even nanoseconds. Financial houses are vying for office spaces that are closer to the actual trading location, because of the amount of time it takes for the signals to travel down a fibre-optic line. Once you get down to that level, you generally cannot rely on high-level programming languages,’ said Khan.

Wilcockson agrees in part, but notes MathWorks strategy is to get closer to production including trading environments. ‘In 2013 we released the Matlab Production Server, a tool that runs Matlab instances directly in production systems, including C# and Java-oriented trading and risk management systems, at millisecond latencies. It’s also a very nice means to port analytics directly into “big data” environments. If you want faster microsecond response times, automatic code generation is another option. One investment bank did just this, reducing its time from concept to market for new FX micro-second trading algorithms from 3 months to 2 weeks.’â€¨â€¨

### Machine learning

‘Machine learning is one of the growth areas at the moment; there is a lot of interest. We see it being applied across financial services, increasingly in risk management, but also in asset management, and internal tasks like data management,’ said Wilcockson. As datasets get larger, it becomes increasingly difficult to find relevant information: ‘It may not find you that needle in the haystack, but maybe an area of interest where the needle might be located.’

Machine learning ‘sits concurrently with the world of big data, so datasets are getting ever larger, databases ever broader. We have seen the rise of the NoSQL database,’ he continued. A NoSQL database provides a mechanism for storage and retrieval of data that is modelled other than the tabular relations used in relational databases. According to Wilcockson: ‘People are wanting to mine those datasets for relevant, useful, business information. In those instances where you have significant amounts of data, machine-learning offers a nice entry point to effectively mining big data.’

The origins of machine learning have been around for a long time: in 1959, Arthur Samuel defined machine learning as a ‘field of study that gives computers the ability to learn without being explicitly programmed.’ Once a sufficiently comprehensive set of instructions has been developed, potentially a machine-learning program or algorithm can be left to disseminate relevant information but also, as more relevant data is available to be analysed, the machine-learning program becomes more effective as it is constantly testing its own model, validating it, and then improving upon discrepancies within the model using the information that is available.

‘So the more data, the more reliable data you have, the likelihood is that the model will be more useful. The example I would draw on here is fraud: it is a common theme in financial services industry at the moment – you can barely pick up the Financial Times without seeing an article about rate rigging,’ said Wilcockson.

He went on to give an example of the Madoff, and similar funds that were exposed as being fraudulent. One of the positive aspects to come out of such scandals was that ‘it gave the world some candidate dataset that regulators and risk managers within say risk management firms or banks can use to potentially detect fraudulent trades or fraudulent funds.’ He continued: ‘There are a number of factors that can drive a machine-learning algorithm. One that we see [in this example] is the over-use of zero returns as opposed to negative returns – so fraudulent.’

These candidate data-sets that contain the key characteristics and trends associated with fraudulent type investment funds can then be used by a risk manager within a financial organisation to train a decision-tree method or other machine-learning method to determine candidate fraudulent funds. ‘In this particular area, there are more datasets coming on, maybe not on a daily basis, but on a regular basis and that’s proving helping models learn to identify these issues,’ said Wilcockson. There are problems associated with this, such as false positives: ‘You will find funds that perhaps may seem fraudulent. You know it’s unlikely that it’s going to pinpoint the exact fraudulent fund, but it can help the CRO, the risk managers, identify the likely candidates for further investigation,’ said Wilcockson. These applications have led to an increase in the use of Matlab: ‘For legal cooperation, we have noticed an uptake in the law community in use of our tools,’ he said.

### What went wrong in 2008?

Khan explained that financial services are a small but significant part of Maplesoft’s business. ‘It’s been relatively stable as well, even given the financial turmoil over the last five years.’ He went on to explain that ‘by and large, people working in the finance industry use our software to develop new pricing models that capture a wider range of effects. A lot of what precipitated the financial crisis in 2008 was the inappropriate use of risk models.’ The ‘normal’ distribution forms the basis of classic option-pricing models and value at risk (VAR) – a technique that quantitative analysts looked at in the 1990s to judge, or predict, the maximum loss in a portfolio to a given confidence interval within a given timeframe. But the assumption of a normal distribution is not well founded.

‘Unfortunately, extreme events happen up to 10 times more often than the normal distribution would have you believe.’ Quantitative analysts have been using Maple software to investigate the effects of different probability distributions on risk prediction models. ‘I think that in itself has secured our revenue stream from the finance market.’ This propensity for extreme events can seem counter-intuitive, but this kind of scenario could also be applied to climate change. In a volatile climate, extreme events become more frequent and this concept is relevant to the financial markets, especially in 2008, when there were a number of compounding factors that caused the instability alongside improper use of risk scenarios.

‘That’s analogous to Murphy’s Law – everything that can go wrong will do, in the long term. It may not happen today, or next week, but it may happen in the next year.’ Khan added: ‘I am finding that people are trying to actually investigate the effects of extensions to standard probability distributions – extensions that capture the effect of skew and kurtosis.’

Kurtosis, from the Greek Kurtos meaning ‘curved or arching’, is a measure of the peak of the probability distribution of a real-valued random variable. In contrast, skew describes the tails in a probability distribution, ‘if you have a higher skew, it means that you capture more of these events,’ said Khan. He concluded: ‘So what I am finding is that financial mathematicians are taking standard risk projection models, like the VAR, and investigating the effects of skew. That would give you a more realistic sense of how much money is on the line.’