Skip to main content

Beware the black box of scientific software

Lucas Joppa examines how misplaced trust amongst scientists could affect scientific models and the results of research

With the click of a button, the numbers began to roll across the screen. At the time, I was a PhD student working towards my degree in ecology, interested in the network properties of species interactions. The program I was running was written by someone else, years prior, in a language I was unfamiliar with. Had I read the original paper describing the algorithm? Sure. Had I coded it up for myself? Definitely not. This was only one step along a path to show that ecological networks were non-randomly structured – and I wasn’t about to take the time to re-invent what others had already done.

This was not a viewpoint shared by my supervisor, and our subsequent conversation regarding the use and abuse of ‘black-box’ software in scientific computing has stayed with me ever since. At issue was the concept of ‘trust’.

Most people, in some form, trust software in their day-to-day lives without knowing everything about how it works. Yet the pursuit of scientific insight is different – and one could argue that the adoption of scientific software should be held to a higher standard than when people choose which browser or smartphone application to use.

Why did I trust another scientist’s software? My answers were familiar: I knew the scientist; results using the software had been published in a reputable journal; and many before me had used it to pursue their own research. Yet, the software itself had never been subjected to that gold standard of science – peer review – nor had I attempted to replicate the algorithm in a programming language more familiar to me. Was I an exceptionally naïve student, or was I representative of the broader community?

Either way, the experience fundamentally changed the way I interacted with scientific software – approaching each bit of code with healthy scepticism and a desire to replicate things for myself, the same attitude I carried for all the other aspects of my science.

Fast forward several years: I had settled in as a scientist at Microsoft Research in their Computational Ecology and Environmental Sciences (CEES) group. We are tasked not only with ‘pushing back the frontiers of science’, but also creating the software that allows others to do so as well. The time was right to revisit the issue of trust in scientific computing.

As a team of ecologists and social scientists, we asked why scientists chose the software they use. We wanted to look at a domain defined by scientific – rather than computational – problems, so we surveyed more than 400 modellers of complex interactions between species and their environment.

Nearly 30 per cent of scientists in our survey reported they used particular software because it had been ‘validated against other methods in peer-review publications’ – rising to 57 per cent for those who used ‘click and run’ modelling packages with easy to manipulate user interfaces. Further, 7 per cent, 9 per cent and 18 per cent of scientists cited: ‘The developer is well-respected’, ‘Personal recommendation’ and ‘Recommendation from a close colleague’, respectively, as reasons for using software. Only 8 per cent claimed they had validated software against other methods as a primary reason for choice. Nearly 80 per cent of respondents noted that they wanted to learn additional software and programming skills. These results, and more, were published last year in the 17 May 2013 issue of the journal Science, in a piece titled ‘Troubling Trends in the Use of Scientific Software’.

I shouldn’t have been surprised to find many scientists are adopting and using software that is critical to their research for non-scientific reasons. I had done so myself! But our results are troubling. Reliance on personal recommendations and trust is a strategy with risks to science and scientist. Relying on peer review as a reason for adopting software is misplaced, as the software code used to conduct the science is not formally peer-reviewed (with rare exceptions). And software developed by scientists often ignores software engineering standards, leaving trust in the implementation of algorithms potentially misplaced.

Fixing this situation will not be easy. Peer-review of code could become an incorporated aspect of the standard journal reviewing process – but this is easier said than done. With the number of paper submissions constantly increasing, peer review is already carrying a heavy load. Finding enough reviewers with the time and skills to review code properly will be difficult. Yet journals will find a way – as several have already done. They could also become an educational resource, by publishing tutorials on computational standards and methods.

A longer-term solution would be educating scientists in computational methods at a fundamental level. If only academic institutions committed to producing scientists that are capable of instantiating science in code such that other scientists are able to peer-review it, as they would in any other aspect of science! Providing students with formal training in computational methods at an undergraduate level would be a great first step.

None of this will happen overnight. My own transition, from indifference to acute awareness of the importance of software in science, certainly did not. Yet these changes will come – indeed they must. Models, and the software that implements them, are increasingly defining both how science is done and what science is done. Research councils, academic institutions, and journals are now beginning to take this issue seriously. The steps they are taking today will define the science of tomorrow.

Lucas Joppa is a scientist in the Computational Ecology and Environmental Sciences Group at Microsoft Research where he heads the Conservation Science Research Unit.


Media Partners