From banking to science
From commercial IT, how did you get into bioinformatics?
After schooling in Glasgow, Scotland, in biology and a masters in IT, I've run start ups that integrated the computer systems of big companies all over the world. In a sense it's the same problem but writ large. The data integration needs in drug discovery are vastly bigger. There's less difficulty in making a professional change in course from data integration in banks to the life sciences, but a vastly larger intellectual challenge because of the complexity of the data.
What is GeneticXchange's strategy?
We're the only middleware company in drug discovery. We build technology, DiscoveryHub, that supports applications above - that's the domain that requires expertise and familiarity with the science - and sits on top of the hardware and databases. In mainstream IT we're very familiar with applications on top and at the bottom we have databases that are a commodity, like Oracle. Middleware is agnostic about hardware and about applications - it truly is the meat in the sandwich.
The biologists can have all the applications at the top - visualisation and so on - and this requires huge expertise, but these people shouldn't have to write low-level programs. But they're all writing their own code for data integration and it's all different. It's a classic IT lesson from the 1980s - avoid this Tower of Babel.
If it were just that all the thousands of databases were static and did not change, eventually you could integrate them all. This is the case in mainstream IT, e.g SAP.
In drug discovery, new series of data appear all the time and are uncontrollable - you just have to deal with it. This is the rate of change problem. You need a flexible device that can ask any question of any data at any time. You don't know what you're going to integrate in advance. We're doing this with middleware. Applications give a high-level call to DiscoveryHub, so the biologists don't have to write their own code.
Who are your customers?
There are around 2,000 biotech companies out there, most of them building their own discovery applications themselves and we sell to them. We're the plumbers that help them get all the data. The biologists are as unaware that we're there as they are of the Intel chip in PCs. We sell to the professional bio-informaticians who build systems for their biologists. People building systems want to attach all sorts of things - which we can attach like hoses into a plumbing system.
Other companies offer data integration. How do you fit in?
There's only us, Lion bioscience, and IBM DiscoveryLink left in the middleware business. Lion allows you to pick up all the data and put it in one tub. We allow anyone who creates any application to sit on top of our middleware and get any data anywhere without having to put it in a local 'tub'. You do not need to know what data sources we access. Lion sells 'canned solutions' to the biologist end user. With us at the moment, the bulk of people who use our software are programmers. There isn't a one-size fits all solution, especially in biology. In the biotechs, everyone is building their own software because they want to do different things.
What is your second challenge?
The other challenge is 'ontologies'. There are about 400,000 biologists and they've got 400,000 different names for the same genes and other things. There's a need for an 'upper middleware' to sort out the semantics. That's our next step up into the value chain. We're deeply involved with that already.
How do you see the future?
It's naff that every biotech is writing their own data integration program - like the days before DOS. It's standard evolution in IT: a new platform has to emerge and we think our middleware is that platform. There's a mismatch between the needs of drug discovery data and the effectiveness of traditional IT systems, so there needs to be a seismic shift. We mean to be the platform supporting that advance in productivity.