Moving ahead with advancing technology
Much of today's amazing progress in search technology stems from the economics of IT, and those economics favour further advances. According to Gartner Inc., between now and 2008, hardware performance will increase 25 per cent annually, while its cost will decrease 10 per cent annually. The implications of this cost/benefit windfall are all around us. Mobile phones can do many things a PDA can do, in addition to searching the web, and taking pictures and transmitting them. And, oh, yes - you can use them to make phone calls, too. The economic advantage will drive further innovation, far beyond what most of us can imagine. Wireless technology will move us closer and closer to universal connectivity. Will twenty-first century humans soon become what amounts to information cyborgs, using the internet almost like a collective consciousness?
Google is the obvious example of search-and-retrieval for the masses, and its utility has already gone beyond gathering the facts on rainforest ecology for a school science project. Today, you can enter a phone number, a universal product code (UPC), a CAS Registry Number, or even your automobile's vehicle identification number into the Google search box and uncover an enormous and even disconcerting amount of information in an instant - though you are not always sure of its quality. But what about scientific and technical information in specialised databases? Here, too, the economics of IT have sparked a revolution.
The availability of inexpensive storage and processing has led to a vast increase in the size of databases. To mention just two examples, the CAS bibliographic files and the American Chemical Society journal files are growing, not only forward, but also backward in time, with archives available for more than a century's worth of information. On the primary information side, full-text documents on the Web are a resource made even more valuable by the addition of supplemental material that the printed version could not possibly include. Conceivably, in the near future, video and actual data from laboratory sensors could be added.
The challenge for secondary databases is no longer to provide a list of relevant publications, but to make sense of these digital riches through analysis and visualisation. What have been the research trends relative to a given topic? Which companies have dominated the patenting activity for a certain technology, and who is now entering the field? Having helped to make these challenges more acute, advances in IT also offer new ways to address them.
Among the enabling technologies of search innovation are thesauri (or lexicons) and other tools that put metadata to use. Powerful hardware, harnessed through grid and cluster computing, will make more CPU-intensive operations feasible. As a result, information services will emphasise data mining, analysis, and visualisation to an increasing degree, and improvements in clustering, relevancy ranking, and detecting relationships, will be crucial. For example, search services must recognise that osteoporosis is the equivalent of bone loss. But while search technology has always done well dealing with nouns (e.g., recognising synonyms), it must also become adept at understanding the verbs that connect them. It may be clear from mere proximity that drug A relates somehow to condition B - but does it alleviate the condition or aggravate it?
Relationships and deep linking will be key features of improved information services. The information seeker wants not only to become aware of a relevant paper, but to be taken to the exact portion of the paper that is relevant. At the same time, he or she wants a search interface as dynamic as possible in its interaction. In chemistry, for example, it would be valuable to see immediately how altering a single component of a molecular structure would change the number of retrieved hits.
However, computing and communications technology alone cannot provide the selectivity and dynamism we desire. Content is still the object, and metadata fuels the results, no matter how they are ultimately represented to the user.
Most information consumers are not highly sensitive to the source of the information they seek - a fact is simply a fact. Thus, information-providers are threatened by the tendency on the part of users to view information as a commodity, as well as by their reliance on giant middlemen, who can marginalise the information producer, a process that might be called 'Wal-martisation'. This can be countered by adding value to datasets and especially by helping the users derive meaning from them.
At CAS, we recognise that the thorough indexing applied by CAS scientists to published papers and patents is a distinguishing feature of our service. This metadata contains not only bibliographic information and concepts but also molecular structures, properties, and sequences for DNA and proteins. Applying analysis and visualisation tools to this material will result in new features and enhanced services in the near future. As the limitations of processing, storing, and delivering information diminish, the possibilities of the digital research environment expand.
Robert J. Massie is President, and Ramond D'Angelo is Senior Scientist, at CAS