|
|
|
|
spreadsheetslooking beyond excel
Felix Grant explores the plethora of alternatives to the tried and tested Excel spreadsheet
From its origins in accountancy, the spreadsheet has come a long way. It is difficult now to imagine day-to-day practice or intercourse in science without it. Standards are essential in such a central role and they are usually set de facto by a market-dominating product. The company name "Lotus" came, in most common usage, to describe both the 1-2-3 product and the generic idea of the spreadsheet. Now, similarly, there is often no distinction between "spreadsheet" and "Excel". This conflation of concept with brand must please shareholders, but it spells loss of diversity and efficiency. Excel is an excellent product for its purpose and may well be the best answer to a user’s needs, but shouldn’t that decision be made consciously rather than by default? Various issues affect a move away from Excel. Will the new application talk to other programs and their data? Will it match Excel's functionality, or at least the parts you need? Will it accept or translate extensions? Much depends on how you see and use your spreadsheet: container, organiser, visualiser, analytic engine? Using a high-end spreadsheet like Excel purely as a data container, as many of us do, is probably overkill. As one of my colleagues picturesquely put it: "Rather like taking a Kalashnikov to kill a cockroach, or housing a pet hamster in Alcatraz." A simpler spreadsheet, a flat database table, or a simple text file, will probably do as well – and provide greater portability with resource savings into the bargain. And if you are making heavy use of your spreadsheet's analytical abilities, perhaps you should look for a more specialist product. Excel's statistical functions, for example, work well in their natural office habitat but are not the most robust or reliable when pushed to the limit. Between these extremes lie the strengths: a spreadsheet is very good at organisation and pre-processing of data. A large, diffuse agrodata survey, predominantly manual, handed from person to person over a three-month period, offered a good opportunity for experiment. There were large quantities of data, interim analysis, lots of dead travelling time, and possible problems were nonfatal. After checking that necessary compatibility could be preserved or retrieved, we replaced Excel with a clutch of alternative software products. The first step away from a standard, regulation-issue spreadsheet is to another one. The best-established competitors come from Lotus and Corel. We selected Corel's Quattro Pro 10 (abbreviated hereafter to either "Quattro Pro" or "Q-Pro"), just released at that time. The most immediately obvious advantage of Q-Pro over Excel is crude capacity: one million rows; 18,278 columns per sheet; 18,278 sheets per notebook; plus links to other notebooks, which can be on disk rather than in RAM. The result is effectively unlimited data or model volume with (if properly and carefully designed) low redundancy, reduced disk space, and simple, hassle-free, protected sharing. Our survey certainly provided well over the Excel limit on rows in some of the variables, and exceeded the variables limit in one or two specific areas. As an example, we gathered detailed but patchy records of variables at every node on a one-metre grid across some sites. This generates 104 cases (i.e. spreadsheet rows) per hectare, which exhausted Excel’s row limit after six hectares whereas 100 hectares of data could be stored in Q-Pro. But data became more manageable if split down to linear arrays in 10 sheets per hectare, and Q-Pro’s easily manipulated linked notebooks were the more valuable asset: one array per sheet; one site per notebook; stored on disk; and cross-study sets assembled or sampled from them into RAM. Nevertheless, the large row limit was invaluable for temporary, ad hoc "fishing trips" into the data. Assembling all the data from one or more sites into one temporary superset allows faster and simpler ways to try out a vague idea. If the idea doesn’t work out, less time and energy have been invested. This is partly a debate about where spreadsheet gives way to database. We found it a very useful mix to have XLS storage files (thus available to other software) on disk, feeding multiple linked notebook QPW dynamic explorations. As a related benefit, Q-Pro tends on the whole to produce smaller disk files than Excel. QPW files were, on average, 40-50 per cent smaller than Excel's XLS equivalents. Using Q-Pro as an alternative front end to open, modify, and save only XLS files (this can be set as a preference, allowing service as a transparent Excel replacement), there was still a substantial file size saving over Excel's own saves, especially with smaller datasets. Q-Pro allows multiple user workspaces, which can be created, imported and exported, offering better interface management than Excel. If you spend a significant amount of time within your spreadsheet, especially as part of a team, this is a definite advantage. A ready-made workspace is also provided to closely mimic Excel, and formula entry is highly tolerant. Q-Pro uses a notation resembling that of 123 rather than Excel, but if the Excel form is entered it is usually accepted and translated on the fly for commonly used functions. There are other usability perks, too; a smoother and more extensive undo and redo, for example. Functions were as complete in their replacement of Excel equivalents as we would wish them to be, broadly equivalent, and just as robust. Surprisingly, considering that Q-Pro comes from Corel, graphics was the one disappointing area – Excel does this much better, particularly in the speed and stability of their handling. This didn't overly concern us (graphics, like analysis, being an area we preferred to place in other software) but you may feel differently. Using Quattro Pro as an Excel replacement was, in our terms, an undiluted success. Most routine Excel constructs pass into Q-Pro satisfactorily, and vice versa, although there are (well documented) exceptions – information from Microsoft's Solver and Corel's equivalent Optimizer, for instance. Macros need some tinkering, but not usually to an onerous degree. An alternative to the competitor spreadsheet is something that is willing to look like one. This may be the specialised data sheets used by most statistics programs; or it may be a database table. As containers, database managers offer advantages over spreadsheets: stricter control of data typing; less demanding of dynamic resources: better damage resistance; and usually larger data sets. A specialist application such as S-Plus will usually make a better job of analysing the data than would a spreadsheet. Despite removal of Excel, we still had Access; Quattro Pro brought with it Paradox; and Star Office (see below) was also present which made available its database manager Adabas. Storage in databases is a natural companion to replacement of spreadsheet by dedicated analytical software. With Insightful's new release 6 of S-Plus just out, the choice seemed obvious. Most scientists have at least a passing acquaintance with previous versions of this package: Lucent's S language wrapped in a GUI shell with a superficially Excel-like worksheet. S-Plus used to be a Mathsoft product, before a recent binary fission of the company. Curiosity over what the new entity had done with this latest release was a factor in its selection. ![]()
S-Plus 6 offers tighter integration with Excel than ever before, even opening Excel within itself and creating an XLS workbook for you in situ. Nevertheless, from a user perspective its datasheet looks, and acts, like a specialised spreadsheet. One major strength of these data sheets is that they see their content not as cells, but as arrays of variables or cases; in other words, like a database manager. Another advantage is a more powerfully specialised set of tools – usually also more rigorous, since the product must survive and evolve in response to a more homogeneously critical market. S-Plus, unlike a generic spreadsheet, which has simply wandered in from commerce and been somewhat adapted, is a scientist's product through and through. S is one of the languages of choice for those involved in science, and it shows. Although the GUI now dominates first start-up, this is still a scientific language product at core and its operational constructs are more direct, powerful, and efficient than spreadsheet macros or cell formulae. The resemblance of the datasheet to Excel is only skin deep. This can be either a strength or a weakness, depending on your point of view. It does offer one of the most accessible graphics generation systems around, sharing the approach and toolbars used in the technical graphics program Axum. A click of the button produces an intelligent first approximation, which can then be adjusted, rotated, data-brushed, and so on. Moving further from the Excel pattern means unpicking conceptual structures. The conventional spreadsheet assumes that every situation can be represented in Cartesian form by discrete, stepped reference to either a rectangular surface or a cuboid. For most data, and many problems, the assumptions are valid, but not for all. Data point series bypass the discreteness problem, but lack intuitive clarity for n-dimensional event spaces of order much above three. Many of the relationship structures encountered in the biodata survey are Cartesian in form but n-dimensional – hypercubes and beyond, requiring not just the depth provided by a workbook but interlinked sets of such workbooks. Quattro Pro's "grouped notebooks" provision manages this more elegantly than Excel, and provides a more organic manipulation internally of grouped entities across layers. There are other relational structures, though, which conform to solids (in whatever dimensionality) other than the cuboid, or which occupy non-Cartesian, non-Euclidean spaces. While these can be (and routinely are) approximated within the Cartesian frame, there is a cost in compromise and distortion. A different structural approach with less attachment to rectangular outlook would be more efficient, flexible and intuitive. One route is through a programming environment like Matlab, or a pure mathematics package, but the supporting structure of a spreadsheet-style approach is then lost. More intriguing is a product that places itself somewhere between Excel and Matlab territories: Dadisp (strictly DaDiSP). Produced by Massachusetts-based DSP Development Corporation in the USA, it is supplied to end-users in the UK by Adept Scientific as a "graphical spreadsheet".
From a spreadsheet background, it is perhaps easiest to think of Dadisp as a spreadsheet containing one hundred cells – but the cells have been removed from their usual row, column, and sheet matrix. Put like that, it’s not a lot of cells; but each "cell", called a Window, can contain an array of data series, of which there can be up to 10,000 altogether (spilling into disk-based virtual memory if RAM is exceeded). The sheet itself is dubbed a "worksheet", which is reassuringly familiar; worksheets are aggregated into a "lab-book", which is comfortably analogous to Excel's "workbooks". The contents of a Window can look like a spreadsheet; but are not. Approached from the other direction, not from Excel but from some kind of mathematics software, the analogies are different. For Ray Girvan’s Matlab clone review, I describe how a student “packaged” data and formulae in separate Matlab documents, then built links between them; this is a good starting point for considering Dadisp. Each Dadisp dataset or window represents a package of data; the relations between windows replace cell formulae in a spreadsheet. Your project, then, resides in a lab-book that is the top of a hierarchic tree containing worksheets, which in turn contain windows, relationships, macros and (usually, but not necessarily) data. Your data are organised into datasets that in turn comprise arrays of data series in (usually) related groups. Only one lab-book can be open at a time; there is no provision for links between one lab-book and another. You can, however, run more than one simultaneous instance of Dadisp. The lab-book can be sophisticated and complex, with a small visible tip rooted in a hidden (probably disk-based) mass. Perhaps it sounds like a Byzantine behemoth, but nothing could be further from the truth. It is simply unfamiliar. Once the new environment has been grasped, it is elegant and lucid. While there was much initial wailing among survey members during the first sessions on Dadisp, by the end of the first week it had a devoted team-wide fan club. The windows within your lab-book's worksheets can contain one or more data series, and these can be viewed in different ways – either as a table, or as a graphic. The simplest way to switch between the views is from a button on the toolbar; the result is illustrated below. Each window can be related to another (or others) using formulae, and this is the similarity to spreadsheet cells. Unlike the standard spreadsheet, however, the formula is applied to a whole data-set – to the entire content of the subject window, in other words – as a set of data vectors rather than to individual scalars. (Actually, this oversimplifies for convenience. Integers, reals, and complex numbers can populate the series, a series can contain a single element, and Dadisp also manipulates matrices.)
There are, of course, all the tools to be expected in such an environment. Results can be directly generated from typed commands, using macros to automate aggregated command constructs, or menus will produce default forms (2D and 3D plots, linear or polynomial fits, trigonometric functions, matrix or complex calls, and so on) for subsequent dialogue-driven refinement. As you move away from the market-dominant integrated suite, data exchange, for example, becomes an issue. The hardest part was making applications from one publisher draw data stored in database managers from another. In these days of ODBC and other connectivity, this should be a snip, but reality in the field can fall short of the dream. Much of this difficulty results from competitive product development. Software will often cope happily with Access 97 data files, but not those from Access 2000. S-Plus reads and writes Quattro Pro files, but only in a very old version. Another issue was the inability of XLS files to cope with the size of dataset which Quattro Pro makes possible – not, of course, Q-Pro's fault, but nevertheless to be borne in mind. The solution, in most cases, is intelligent subset querying at the point of data use – or, of course, abandoning XLS-dependence altogether. Dadisp doesn't concern itself with proprietary file import. Initially I felt this to be a weakness, but once the centrality of a spreadsheet has been questioned the importance of spreadsheet files dwindles. We fairly rapidly moved to CSV text files as the common data store format – at which point import, export and exchange issues evaporated. Even Palm-OS data passes through a CSV converter on its way from PDB to MDB formats, so it was trapped and dealt with at that stage. And what does all this mean for Excel? Nobody felt ready to throw away their copy of Excel; but most had radically shifted their view of it, and the period of"cold turkey" was liberating. Without Excel as a habitual crutch, we had more thoroughly explored the advantages of other approaches. Several (myself included) were WordPerfect users with Quattro Pro languishing unexplored; I, at least, will make up for this in the future. Dadisp inspired everyone, and produced radical shifts in thinking; I found myself turning to it for non-survey tasks where I would normally use Excel or Scilab. Release of Microsoft's Office XP during the trial added another dimension. Many users of 2000, who have upgraded without much demur at each stage thus far, find themselves made uneasy at the aggressive anti-piracy regimes towards which Microsoft appears to be moving. However reasonable copy protection may be in principle, people like me are more exercised by nightmares of a different kind: a dead machine in the middle of a mission-critical task; reinstallation; then persuading a sceptical telephone voice at Microsoft that there is a legitimate reason for reinstallation. (My last call to a Microsoft help line lasted four hours, much of it listening to recorded music, and ended with the words "sorry; can't help" – then a click. It doesn’t inspire confidence.) Most of the team agreed that they were no longer sure whether or not they would upgrade to XP; they would put off the decision for as long as possible, and get to know the opposition in the meantime. Personally, I could live without Excel in terms of the work that I do – though it would not, perhaps, be politically possible while I work for clients who insist upon it. Within the sample tried here, Dadisp was the best modelling environment; Q-Pro was the second best, but is more transparent and adds the ability to replace Excel and mimic it as an XLS file handler, not to mention talking directly to common file formats. S-Plus in partnership with a good database manager can replace any spreadsheet for users whose interest is purely in statistical analysis. Excel’s main advantage lies in its universality; and Star Calc's (see below) in the fact that it does not come from Microsoft but looks as if it does. Taking out the add-insOne hidden implication of removing a well-established core application such as Excel is the collateral loss of all those useful third-party symbionts that inhabit it.My own laptop copy of Excel has gathered add-ins (many of them sporting menu and/or toolbar extensions) for Access, Aliceband, Bioscope, Maple, NAG's statistical routines, Navcalc, Satcall, Unistat, Winfax, and so on. My word processor is the same: its menus and toolbars are colonised by a chorus of reference managers, specialist dictionaries, research tools and the rest. Many add-ins serve purposes which can be addressed in other ways; all that is required is a little thought. Others, though, seem indispensable. What can be done if you are willing to abandon the host program but reluctant to lose the add-in? The most delightfully and elegantly useful Excel add-in I have encountered is the statistical analysis package Unistat 5. This turned out to be the most-missed add-in. In the present trial, statistical muscle was supplied by S-Plus; and Unistat in any case remains as a standalone program; but what of the core principle? I contacted Dr M. A. Toker, of Unistat, to find out. He confirmed that Unistat can be “hooked” by a call to operate in background mode from almost any program. In practice, this is "mainly used by data acquisition hosts, which require repetitive analysis of data files whose format remain unchanged", but it could be applied by another spreadsheet if the demand were there. An organisation wanting to move away from Excel to Quattro Pro could, Dr Toker confirmed, write itself a Unistat add-in for the new host, analogous to the present unistat.xla sheet provided for Excel although "so far there has been no demand for this". Demand for add-ins, obviously, follows the spreadsheet market. As long as Excel remains the dominant spreadsheet program, the market itself will not generate a critical mass of demand for add-ins to alternatives. I can't help thinking, however, that it might pay the publishers of those alternatives best placed to challenge Excel if they invested in supplying ready-made links to accommodate crucial third party products. Six of the bestThe move of a software product from one owner to another is often marked, at first, by new releases with minor cosmetic adjustments and little move on content. S-Plus 6 (it is, confusingly, the next release after S-Plus 2000) is much more than that; Insightful have put in some real work in behind the scenes, incrementally extending some areas and overhauling others. The performance has been enhanced in many areas. Responsiveness to the GUI is crisper. The slow-down, which sometimes used to show up during long evaluation chains, seems to have gone. Handling has also improved in other, more ergonomic ways, with the appearance of a SearchPath object in the explorer frames; session-linked project folders; and the treatment of user databases as "chapters" easily attached, changed or detached. Graphics have benefited from incremental improvement, and Postscript output is more consistent across OS platforms. New statistical content includes version 3 of the NMLE library, allowing better handling of banded structures, plus extended testing of normality and residuals. User-contributed libraries are updated, including the Venables & Ripley spatial libraries (of particular interest to members of our survey). The application's fundamental structure has been changed too, either to accommodate or to enable the improvements. This is the sort of news that usually strikes fear into the hearts of loyal long-term users, but a "migration wizard" is provided and seems to work well. The free alternativeStar Office is a program suite in the open software tradition, free to download from Sun Microsystems Inc. This is not Microsoft Office, but the attempt to make MS Office users feel at home and productive is obvious. There are direct replacements for all of the MS Office components, the Excel equivalent being Star Calc. Like Quattro Pro, Star Calc will transparently open, manipulate and save XLS files if required. But unlike QuattroPro, it does not try to exceed Excel's capacity or capabilities, nor even slavishly match them; it aims, instead, to provide bedrock equivalence across a perceived normal usage range. There is a reassuring air of no-nonsense solidity. The statistical functions list runs to over 70 entries; the mathematical and matrix to just under; roughly comparable to Excel’s equivalents. The charting options are broadly equivalent as well, albeit with less tuning control overall. An add-in architecture is provided, although I haven't explored this thoroughly beyond establishing that it doesn't recognise my most-used Excel extensions. If a 256-sheet notebook of size of 256 columns by 32000 rows will accommodate your needs, and you believe that a spreadsheet should be a spreadsheet while other software takes over for specialised handling, then Star Calc may well be all you need. If not, then perhaps you should be looking to Quattro Pro – or perhaps reviewing your spreadsheet’s remit within your overall software strategy. You can use the online Reader Enquiry service at Scientific Computing World to make contact with organisations referred in this article, or to visit relevant websites. |