Data beyond words - the future of scientific communication

Share this on social media:

Topic tags: 

Paul R. Topping scents a sea change in the way scientific research will be created, published, and read in the coming decade

For hundreds of years, formal scientific communication could be characterised as the publication and exchange of papers consisting of words, mathematical equations, and pictures. The knowledge gained by the readers of such papers consists solely of what can be absorbed through their eyes and into their minds. Although the advent of computers, the internet, and the world wide web has radically changed how scientific papers are catalogued and distributed, the role of scientific papers in the pursuit of knowledge has hardly changed at all - they are still collections of words, mathematical equations, and pictures.

While a modern scientific paper would still be recognisable as such to a 17th century scientist, we have reaped considerable benefit from electronic representation and presentation. We can now search efficiently for words and phrases among collections of papers. We can also embed hyperlinks in papers, making it easy for readers to obtain related material. Although the internet makes it possible to exchange other kinds of content - movies, sounds, spreadsheets, databases - generally, scientists have not taken much advantage of this. This is largely due to the conservative nature of scientific research and the peer-reviewed publishing process. In the next decade, this is all going to change. There are strong forces coming to bear on authors, publishers, and readers that will work together to change the face of scientific publishing:

  • The move from paper to electronic form will be completed. The rising prices of scientific journals are causing libraries to cut down on subscriptions. This is probably already close to the tipping point.
  • The need to make hyperlinks work quickly and painlessly will hasten the move to micropayment systems that allow the reader to pay for access one paper at a time, rather than by journal subscription.
  • The need for publishers to 'add value' to their online offerings in order to compete will cause them to make additional functionality available to their readers. Publishers are also competing with author's own websites in providing this additional functionality.
  • Readers are increasingly making use of electronic tools, such as statistical analysis and computer algebra programs, in their own work. They will want access to scientists' raw data so they can verify the paper's conclusions for themselves, combine the data with that from other papers, or use the author's data to test their own hypotheses.
  • Data sets will continue to grow in size and complexity. Much research attempts to model some aspect of the world. As the accuracy of these models increases, so does the amount of data with which the model is expected to agree.
  • Publishers will see increasing demand to make their products more accessible to readers with disabilities. This will be driven by requirements such as Section 508 of USA's Rehabilitation Act and those of educational institutions for accessible versions of textbooks.

These forces will work together. Putting everything online will give more opportunities for hyperlinking. Micropayment systems will reduce the barriers to hyperlinking. The increase in data set size and complexity will make it more difficult for authors to self-publish and require the professional support that a publisher can provide.

For the reasons cited above, scientific papers will become richer by including data beyond the words, equations, and pictures of traditional papers. Besides attaching raw experimental data to papers, there are several other ways to enrich papers with additional structure:

  • Mathematical equations can be represented in the paper using MathML, rather than images, allowing the equation to be searched, read to the visually impaired, or copied into a computer algebra system, like Mathematica, to be graphed.
  • Bibliographic data can be incorporated into the paper itself. This allows the reader to quickly and accurately add the paper's references to his own knowledge database.
  • Taxonomic metadata embedded in the paper can make it easier for the reader to locate other papers on the same subject. Scientific organisations have maintained their own taxonomic systems for many years. Electronic representation allows us to make more use of them.

For the reasons stated here, expect a sea change in the way scientific research is created, published, and read in the coming decade. The move to online publishing and distribution contributed greatly to the productivity and economic boom of the 1990s. We have every reason to expect the increasing richness of scientific communication to have an equal or greater effect in the next ten years.

Paul R. Topping is president of Design Science, Inc.