Desperately seeking solutions

If I had to isolate one thing in my life which made me a scientist, it would be this. Other children who asked the capital of Burma, or where frogs came from, or why they mustn't stick a finger into the electric light socket, or what Dave and Evie were doing behind the bike shed, would get an answer, a slap, or a command to shut up. My own parents, however, always responded in the same atypical way: 'I don't know - let's find out!' This 'finding out' often involved an experiment, or dabbling in a pond, or visiting a museum, or (most frequently of all) consulting a book or local expert. Much of a lifetime later, the 'let's find out!' reflex remains as strong as ever; but electronic searches of diverse sources increasingly dominate, with bibliographic and mathematical metadata an important part of the mix.

There have always been local search utilities (at the lowest level, there are simple utilities such as Windows search); and out on the web lies the panoply of search engines, the searchable databases known as 'the invisible web', and so on. Convergence between the two is a more recent trend, driven by headlong increases in both local storage and connective bandwidth. Laptop stores of 50 Gigabytes or more are now common, and the explosive growth of wireless networks makes web access an unconscious assumption for much of the time. It increasingly seems artificial to regard the two resources as conceptually separate. Across most computing platforms, the movement is towards 'desktop searching'.

To service this changing psychology, hybrid interfaces have arisen, which present both tasks within one (usually browser-based) front end. Some of these are for purchase, some come free and are funded in other ways; but standards are set largely by ubiquity, and that places the emphasis on a small clutch of front runners. Google Desktop Search (GDS) and Copernic Desktop Search, both of them free to download, are popular choices which have come down, so to speak, from the internet - one piggybacking on a commercially successful internet search engine, the other on a suite of metasearch products. The network-capable X1, and its free-to-download single-machine incarnation as Yahoo Desktop Search, provide one good example of migration the other way, upward from the local base. Survival is as hard to predict as in any other area of IT development, but each offers its own particular strengths.

At the desktop end, all these products build an initial index of stored text and then maintain it on the fly as the user works. Copernic offers a more sophisticated set of search strategies and refinements than Google, including partial or root words. Google maintains an indexed image of each file, updated if the file changes but remaining available even if the file itself is not. Thus far at least, I am feeding both of them and, by careful tuning, keeping the resulting hit on system performance below perceptible levels - if one doesn't find what I want, I can switch to the other. Both provide for 'plug-ins' which expand the range of file-types whose content can be indexed. Copernic offers direct access to a wider range of file formats without recourse to such plug-ins; Google, however, has a higher public visibility, which attracts the feedback loop of support from other vendors.

This whole train of thought was set off by simultaneous arrival of new tools: RefViz 2 and Wolfram's new Notebook Indexer (WNI). One of those initial two, WNI, is a plug-in for GDS (on Windows XP and 2000 machines; under Macintosh OS X, it is also available for Spotlight). It illustrates the development feedback loop: the visibility of GDS has attracted specific support by Wolfram, and thus bolstered its own position. RefViz, on the other hand, is a standalone product which has become more independent.

RefViz, an OmniViz-powered visual analyser and organiser of bibliographic information, is extending along a different path. It doesn't try to be a desktop searcher in itself; it does, however, apply similar blurring of distinction between desktop data and beyond. By basing its searches on generic standards, it also operates independently of proprietary file formats. In release one, the onboard database could be populated from standard bibliography manager export files; release 2 adds, amongst other refinements, direct Z39.50 acquisition. This removes dependence on other installed information stores but, more important, makes fast but deep 'fishing trip' literature searches across the internet productive and feasible. Several thousand references (though with a total handling limit still set at thirty-two thousand) can be quickly and easily sucked in simultaneously from (for example) PubMed and Library of Congress, according to a Boolean search strategy. The results can be visually compared with similarly filtered results from a local bibliographic database or other online searches; the material can be added to or merged with the existing stock or abandoned, selectively or en masse. Like the desktop searchers, this increases the fetch of exploratory information handling by multiple orders of magnitude. I reckon that my productivity on such activities is up by a factor of about fifty over release 1, and by at least a thousand over traditional online methods.

A literature search on lateral DNA transfer from a thesis on file is compared with one from an online library source in RefViz 2. At bottom right, another online source is being queried to generate a third search for further comparison.

WNI is neither necessary nor useful for searching Wolfram content unless you use GDS. Copernic's DS searches Wolfram NBs just as well as the GDS/WNI combination. Either way, the text content of the file is indexed with formatting and command codes ignored and the file can be opened directly in a relevant Wolfram application; neither searches the symbolic content per se, but both permit it for the underlying text tokens. No content of the NB is included in Copernic's quick views (only file metadata being displayed), while GDS with WNI provides the text form. The significance of WNI, however, is elsewhere: as a straw in the wind, representing a conscious move by a main mathematical software provider to place its content within one of the most widely used indexers. The Wolfram notebook format has several advantages: use across the company's whole product range (including the new Mathematica CalcCenter), close to a decade of stable history, and an open architecture amenable to manipulation in generic text environments.

A mislaid Wolfram notebook is located by both Google Desktop Search (background) and Copernic Desktop Search (lower left), then opened in Mathematica 5.1 (lower right).

No product is perfect, and attempting to cater for a broad range of file formats is bound to involve choices. One particular Achilles heel was revealed with the installation of WNI: GDS uses filename extensions for its housekeeping, and will not accept different applications using the same one. Wolfram notebooks and NotaBene word processing documents both use the .NB extension, and I had to choose between them. WNI won out in the end, for a number of reasons: NotaBene has its own specific and powerful content indexing and searching tool (enhanced in the latest release, of which more below); its documents can be assigned a different extension; and Copernic does a much better job of including them in desktop searches than GDS anyway. Although NotaBene is not designed for mathematics or the physical sciences, its integrated-functions loop provides one of the very best available environments in which to carry out research-based writing. The relevant component here is Orbis, a textbase management application (the other being Ibidem, a bibliographic manager exerting automatic style management over the word processor). In the past, Orbis has been a consciously applied tool but, as I write, NotaBene release 8 is on the way (it should have come to market by the time you read this) with a new trick. On closing any document, a dialogue box offers the chance to add it to a dedicated Orbis database; this makes everything you write transparently available in indexed form. Although this is still a single-application solution rather than desktop search, it nevertheless provides an exceptionally efficient way to access huge amounts of information - and its databases (including file metadata) can be made accessible to the desktop search tools as well. Given NotaBene's history and philosophy, increasingly explicit integration of inboard and desktop searches seems likely in the future.

Of course, I dream (like most researchers) of one day having a single point of entry for all this; a single desktop-search front-end that will transparently access all data, regardless of the storage format, perhaps choosing from multiple search or index utilities for the task but without bothering me over such details; that will go out across the ether and add in Z39.50 material without me making a separate decision unless I choose to do so; that will apply visual mapping to all the results when asked, regardless of origin. We're not there yet, but we're getting there. 'Let's find out!' has never been easier or more beguiling than it is now - and it's getting better all the time.

Copernic Desktop Search
www.copernic.com/en/products/desktop-search

Google Desktop Search
desktop.google.com

NotaBene 8 and Orbis
www.notabene.com

RefViz 2
www.refviz.com

Wolfram Notebook Indexer
library.wolfram.com/infocenter/Utilities/5596

X1
www.x1.com

Yahoo Desktop Search
desktop.yahoo.com