14 September 2009
The Edge Software Consultancy
Reviewed by Felix Grant
At the risk of repetitiousness, it has to be said: an office spreadsheet product is not, for a wide range of reasons, the best place to do analytic work. On the other hand, office spreadsheets in general and Microsoft's Excel in particular are nevertheless the place where a large and ever increasing amount of analysis is, in the real world, done. For that reason, anything which offers spreadsheet users an independently-implemented set of analytic tools without frightening them away has to be welcomed.
Morphit, from software consultancy The Edge, is a package designed to provide scientists with such an option. It starts from an existing base in pharmaceutical biology (The Edge is a partner in the mixed commercial and open source economy data management system for drug discovery, BioRails, and the source of associated Office tools), but is generic in concept and implementation. It can in many ways be seen as a hybrid offspring of spreadsheet and data analysis software, with a blend of characteristics inherited from both. Curious to see how it would be received by its target market, I introduced my review copy to a dozen science students with analytic tasks to perform, but strong reluctance to abandon unsuitable generic spreadsheets to tackle them in dedicated software.
The base package comes with a sophisticated data management and computation environment, plus a general toolkit of statistical methods, using a purpose-designed engine written from scratch, to which can be added modular plugins. An advanced statistics plugin is currently available and was included with my review copy; a chemistry plugin is, at the time of writing, complete and to be available soon. Fitting is currently handled by a respected third-party engine, but the long-term plan is for an in-house replacement. Microsoft Visual J# 2.0 redistributable package is needed for installation.
Unlike many applications, this one does not colonise Excel, nor even rely on its presence; the comfort zone is extended, instead, by good environment design. Viewed purely from a user's perspective, this is not obvious: in interface 'look and feel' terms it certainly does most things the Excel way, but differences under the surface accumulate as you get to know it more closely. Keyboard navigation for example, has much in common with OpenOffice Calc's approach. Importing data is done from a separate 'Data' menu rather than 'File' (or, in Office 2007, top left button) – conceptually better, but very definitely different. Despite numerous such detail differences, however, this is definitely a place where all spreadsheet users should quickly feel comfortable.
Spreadsheet inheritance doesn't end with appearance. Formula design and entry is, at least on initial acquaintance, instantly familiar to the new arrival from any modern spreadsheet, despite the very different engine called by the result. With use comes, gradually, an awareness of differences, one of which is a data range selection approach that is both more flexible and more rigorous. Introduction of 'ancestor group' ranges (discussed below) simplifies data representation and provides alternative ways (accessed cyclically by a middle mouse button click) to specify the same range.
While I have not embarked on a comprehensive testing programme, using those formulae from a couple of complex known problems produced better results (statistically speaking) than in a spreadsheet.
Data is handled and structured differently, in some ways, from either generic spreadsheets or most statistical analysis packages, and here the uniqueness of the product shows through. What a spreadsheet would call merged cells, for instance, are normally absent from dedicated analytic packages, but appear in Morphit as the 'ancestor group' identifiers mentioned above.
An ancestor group is a type of range. Suppose that (as in one of the test data sets I used for review) 50 subjects are classified by gender: 18 female, 32 male, 7 identifying themselves as 'other'. Instead of entering the gender as a qualitative value in every row, I can enter the first 18 and identify all of them as female using a single 'merged cell' containing the value 'female'. Those 18 rows now behave as a named range with the ancestor group female. Males and others are dealt with similarly. Thereafter, although subsamples can be selected from within either range, it is not possible to highlight partially across the boundary between ranges – I cannot, for example, select all the females and the first 18 males in a single operation. I can, on the other hand, select all males and females very simply using their two ancestor group cells. Subsequent additions to a range inherit that range's ancestor group.
If I seem to be spending a lot of time on one detail, it is because this serves to very effectively illustrated two important things. First, this was one of the aspects that my guinea pig group of reluctant spreadsheet refugees found very appealing. Most of them intuitively took to the idea because it visually imitates a common spreadsheet table structure. Second, on the other hand, it imposes a structured view of data that most habitual spreadsheet users (and a surprising number of scientists!) do not naturally bring with them.
Nothing is ever perfect, especially when it is new. Morphit has a few rough edges to be worked upon; but only a few. I'm told that development so far has taken four years, and I easily believe it: this is a good, well-constructed product that deserves to succeed, and like any other will grow and enrich better with time and user base feedback. Rather than replacing traditional statistical analysis product approaches, it opens up for the first time a large area of practice not currently well served. In the available space here I can only pick out representative details from a rich package but a 30-day evaluation licence is available for further exploration.