Will we see a future where all scientific analysis becomes browser-based? Mark Hahnel offers his thoughts
As both research funders and governments mandate that researchers and institutions store, curate and disseminate their research outputs, the cloud is becoming more and more attractive as a place where all products of academia should live. New directives on both sides of the Atlantic mean that we are on the brink of an avalanche of academic data becoming available online in an open manner. The potential for the progress of academia in general is huge, but with this comes a need for solutions and new technology built on the backbone of the cloud and the browser.
The cloud, with its fast load times, scalability, automated application deployment, multiple back-ups and constantly updated hardware, means that institutions need not create their own server centres with associated running costs and rapid dating of technology. We are moving beyond the merely ‘making academic data openly available’ phase, to one where we can derive new insight from larger data sources. At this stage, the ability of any academic developer to access the processing power of thousands of servers at the click of a button also demonstrates the inherent power of scale that commercial cloud services can provide.
As large numbers of research outputs are being made openly available with appropriate metadata, is linked open data closer to becoming a reality across academia? It is generally accepted that automation of data collection in a machine-processable way across academia globally is the most efficient way to move research forward. Having multiple siloed instances within research institutions hinders the ease with which data from different projects can be pulled together semantically. By making use of service providers such as Amazon Web Services (AWS) and Microsoft Azure, these files need only be stored in one place at one persistent URL. If this is the case, then a future where all scientific analysis becomes browser-based seems like the next logical step.
With much of the research that has already been made available, the limiting factor isn’t the storage space itself but the bandwidth constrictions of the consumers. Microsoft Research is actively going after the academic space, focusing on research groups that need computational power and storage options that Azure can offer. AWS already hosts publicly available datasets including genomics data at no cost to the institution. By engaging with the institutions at this level, it can be assumed that these companies are hoping to build strong relationships so that when research groups start building web-based analytical applications, as Berkeley and Harvard already are, they will sign on at some enterprise level.
An ideal view would see specific web-based apps being developed and applied over these large, open data sets, or even multiple data sets being pulled in via APIs from different persistent locations. This would lead to huge savings in redundant copies of large outputs being stored in closed walled academic institutes around the globe. The potential that linked open data has to revolutionise the efficiency of drug discovery and academic progress in general, cannot be underestimated. The real remaining question is whose responsibility is it to build these browser tools and apps?
What a lot of institutions do have in their research groups and IT departments is skilled developers who understand the often bureaucratic and nonsensical world of academia. If there is a move away from needing infrastructure support to the same level, these software engineers may add additional value within the college by building applications on top of the open research, which in turn would be ideally open to all. The launch of Mozilla science labs and github’s recent move into this area suggest that once again, the institutions will lag behind. As with all developing markets, new players are emerging such as figshare with the recently announced ‘figshare for institutions’ offering that combines a research data management and dissemination solution, based on AWS, or globus online which is focused on data transfer aiming to solve one of many problems which originate in academia but have wide reaching potential across the web in general.
Some 30 years after the web was created for disseminating academic content, organised groups of researchers are making huge inroads into exploiting web-based technology and integrating it into existing workflows. Examples such as SciPy and R take all of the best elements of the open source community and apply it to academic research. As researchers begin to see the benefits and efficiencies of utilising such technologies, the momentum should empower a new age of academic efficiency. It has been a long time coming.
Mark Hahnel is founder of Figshare