An artificial future
Tell us a little about your background and career…
My background is primarily in artificial intelligence (AI), which I have been working with for almost 30 years. One of my first AI programming experiences was in 1990 when I was a student, studying for a batchelor of arts in AI at the School of Cognitive and Computing Sciences at Sussex University. I coded simple artificial neural networks in the Poplog environment, and expert systems and natural language processing parsers in the Prolog language. In the mid-1990s I moved into the application of cognitive science insights in the design of software user-interfaces, something which is now called user experience (UX).
I began my career working on early websites as well as interactive touch screen and CD-ROM systems for clients such as Europcar, Disneyland Paris and Sheraton Hotels. I have been lucky enough to work in start-ups launched during the earliest days of the web and teach interaction design at the Royal College of Art. I left the start-up scene and joined Elsevier in 1999, where I have participated in the transformation of an established scientific print publisher, through to its transition into a predominantly digital information analytics company.
I was excited to join Elsevier as the challenges addressed by its work continually inspire me and show how technology can make a difference in the wider world. It has combined my interests in AI and software interaction design, along with meeting challenging customer demands requiring knowledge management approaches, something now known as semantic technologies. This work focuses on analytics solutions that support research to address many of the world’s most challenging problems in health, drug development, engineering and science more generally.
Since working at Elsevier I’ve been involved in a wide range of innovative projects. Key milestones include: Elsevier’s first electronic reference books; early collaborative social networks in biomedical science with BioMedNet.com; indexing for adverse drug reactions with the development of Pharmapendium; semantic knowledge bases for therapeutic drug design in cardiac arrhythmia, introduction of adaptive learning courses into the National Health Service in the UK with Elsevier Clinical Skills; and, developing the semantic search product Target Insights which has since become Elsevier Text Mining.
You’ve been with Elsevier for nearly 20 years. What is the biggest change you have witnessed in the scholarly communications industry during that time?
Perhaps the biggest change in scholarly publishing is the one we now take for granted – the move to digital. The growth in the value of indexing, alongside the various forms of analytics, has dramatically changed the industry for the better. This development is especially important in the modern world, due to the sheer volume of data available to today’s scientists. The move to digital, when supported with the right tools, has made it far easier for researchers to find answers and insights into increasingly complex problems, filtering out unnecessary data.
The second key change is the growth of social collaboration in science and the ability to share data sets and navigate communities of practice – at Elsevier, this drive is supported by Mendeley. These days, there are elements of science that rely solely on collaboration – no one group has all the data and resources to solve multifaceted problems. Recent research from Elsevier supported this fact, finding 59 per cent of chemists argued the ability to collaborate with researchers in other fields and geographies will be fundamental to scientific research moving forward.
What are the biggest recent changes in the use of data analytics in the industry?
Following the transformation of scholarly communication to digital platforms, the move to create semantic data that captures knowledge has increased significantly. Another way to describe this is as a shift in focus from articles as a whole, towards individual ‘facts’ reported in publications. This has been driven by the maturing and increased productivity of automated approaches to identifying and extracting these facts; as well as the steps to bring AI to fruition in the form of machine learning. Over recent years I have seen the shift from human curation, to rules based automated indexing approaches, through to the applications of statistical approaches such as deep learning and machine reasoning.
Semantic data means we are now able to link facts that are related across papers, and over different domains of knowledge, to deliver insights that might not be obvious from one paper alone. To do so requires normalising the terminology with taxonomies, to allow a network to be created. The increasing reliance on the linked facts that these developments have enabled mean that the demands of the modern researcher are changing. At Elsevier, this has meant we have been increasingly focused on taking the expertise we have in developing semantic data bases and analytical products, and bringing this to bear on delivering bespoke solutions for customers with specific needs. The next phase that we are working on is combining semantic technology methods with various machine learning and machine reasoning approaches to create new insights.
Could you tell us about any recent data projects you have been involved in?
I work in the professional services team developing specific solutions for commercial clients; much of what we do is naturally confidential, however our focus is on identifying and linking data to create networks or graph databases of facts that can then be used to answer questions. A big part of this is tidying up the raw data to be able to link facts together.
Once you have clean data it is possible to build analytical models with which to make predictions. We do this in many diverse areas, examples include identifying drugs to repurpose, in order to cure diseases without current treatments; identifying biomarkers that might be used as an early indicator of a disease progression; determining whether a chemical compound might be toxic or therapeutic when used as a drug.
One of the most exciting data projects we are working on at the moment is with a UK-based charity, Findacure. We are helping them find alternative treatment options for rare diseases such as congenital hyperinsulinism by offering our informatics expertise, and giving them access to published literature and curated data through our online tools, at no charge.
We are also supporting The Pistoia Alliance, a not-for-profit group that aims to lower barriers to collaboration within the pharmaceutical and life science industry. We have been working with its members to collaborate and develop approaches that can bring benefits to the industry. We recently donated our Unified Data Model to the Alliance; with the aim of publishing an open and freely available format for the storage and exchange of drug discovery data. I am still proud of the work I did with them back in 2009 on the SESL project (Semantic Enrichment of Scientific Literature), and my involvement continues as part of the special interest group in AI.
What are your predictions for the industry for the next 10 years?
We are going to see a radical change as AI techniques both become more productive and the tools they deliver become more transparent and user friendly. This will become increasingly important if we are going to overcome the productivity crises that many disciplines of science and research are experiencing. Recent research at Stanford University has indicated that since the 1930s, the effective number of researchers at work has increased by a factor of 23, but annual growth in productivity has declined. As a result, new ideas are becoming more expensive to find.
We are supporting scientists by extracting facts from the scientific literature, and adding data from other sources, to highlight existing knowledge. Building on this, we are focussing on how AI can be made efficient with approaches such as transfer learning to identify new knowledge. At the same time we’re also working to make AI software tools easier to understand.
In the future, I see scientists working together collaboratively across human social networks, alongside AI. They will benefit from broad integrated cross-domain networks of linked facts, which will allow them to draw inferences and identify patterns from the use of machine learning. Working collaboratively with AI has been described variously as ‘centaur science’ and the use of ‘symbiotic technology’. These techniques offer the ability to aim AIs at problems we are interested in solving, and having the means to understand and interpret the answers the AIs are giving us. A key development Elsevier is looking at is called AI neuroscience, where we are trying to build tools that look inside the black box of deep learning models and work out how an individual decision is made.
Overall, these advances should lead to a reverse in the reported productivity crisis in science and R&D, and improve outcomes for humanity by solving the problems we face globally in diverse areas – from antibiotic resistance, to environmental degradation and climate change.
Interview by Tim Gillett