DATA ANALYSIS: FOOD
Food for a futureTweet
Pollination (image supplied by FERA).
Science and scientific computing can buy us time to cope with the demographic timebomb of population growth, says Felix Grant
It’s fashionable to scoff at Thomas Robert Malthus’ predictions, two hundred years ago, that human populations would grow until stopped by famine, disease or ‘moral restraint’. He wrote before the arrival of modern scientific crop research or contraception, and it’s unfair to blame Malthus for not foreseeing those breakthroughs. However, he was essentially right: the food supply expanded but remains finite, and contraception has not fundamentally disrupted the shape of the population growth curve, which is asymptotically approaching the vertical.
What to do about it is a matter of vigorous debate. To simplify: in the red corner are those who focus on means of increasing supply; in the blue, those who emphasise a dietary shift away from inefficient use of that supply. An Isaac Asimov short story did suggest exploiting the ‘many worlds’ view of quantum physics to disperse a trillion-strong population by placing every family on its own otherwise uninhabited Earth, but that one is a little beyond the reach of even today’s scientific computing power. In the long run, if the upward population curve continues, neither red approach nor blue will do more than defer the problem; in the meantime, pragmatically, both are needed.
Recent months have seen calls, across political, governmental and corporate spectra, to double agricultural output within variously short time spans. The means by which supply is increased are primarily technological, ranging from scientific refinement of traditional methods, through development of new soil enhancement regimes, to genetic modification. Underpinning all of them is a heavy dependence on computerised data analysis.
A corollary of rapid change through application of new technologies is the management of safety, ensuring that increased supply does not trigger more unintended consequences than can be avoided. The sheer volume of data involved in any given study is multiplied by the number of studies being carried out – and, once they are over, by their incorporation into actual day-to-day food supply management. Britain’s Food and Environment Research Agency (FERA), like its equivalents across the world, is a government agency working closely with both public and private sectors within and beyond its own national borders. It quotes 40,000 customers and 1,000 collaboration partners, spread over 100 countries on six continents. At this level, to further compound complexity, food issues interpenetrate with other health, environment and security factors. Managing this sort of information flow involves, of course, further analysis and (see box: ‘LIMiting the risk’) dependence on good information management systems. In FERA’s case, that means progressive rolling out of an integrated system based on the Nautilus LIMS.
How all of this effort, and its results, should be distributed is another divisive issue. Those who lead the research (usually large corporations based in rich countries) have a perfectly understandable desire to retain control of it and derive a return from its fruits. Those who most depend upon it (often small and marginal economies with populations close to famine) equally understandably want to draw control into their own hands. The overall trend is for patents to centralise over time while knowledge, expertise and computing infrastructure distribute. In between, the scientific computing community (commercial or otherwise) is not monolithic; I am continually surprised by the amount of altruism I encounter, with necessary self-interest moderated or modified by combination with moral and ethical concerns.
Nor is the ‘mixed economy’ of enlightened self-interest limited to individuals or small companies. The Gates Foundation, for example, includes among its portfolio of grants support for RUFORUM (Regional Universities Forum for Capacity Building in Agriculture) efforts to develop research expertise through acquisition of data analytic software. The software chosen is GenStat from VSNi, particularly well suited to agricultural work and itself at the heart of a mixed commercial and pro bono approach to markets.
Thermo Fisher Scientific’s Nautilus LIMS in use (image supplied by FERA).
Like most general analytic packages, GenStat retains particular strengths in the research field that spawned it – in this case, agricultural research. It has also developed in line with the life sciences, expanding to explicitly embrace new methods in genetics. Any company or organisation has to fund its activities and existence, and VSNi is no exception – but, in conversation with any of the people who comprise it, an enthusiastic idealism soon bubbles through. During such discussions I have learned as much about generic agricultural issues as about VSNi products, not to mention generous comment upon and pointers to their competitors. ‘The last thing we should be doing in the west,’ a VSNi spokesperson told me passionately, ‘is to stop indigenous development of expertise.’ Though not directly related to data analysis or scientific computing, the same person pointed out that modern ICT channels are also vital in raising awareness that supports research progress; the Forum for Agricultural Research in Africa (FARA, not to be confused with FERA), for example, disseminates information through a blog, internet hosted television portal, and cellular telephony mediation. Cellular telephony is also used on an ad hoc basis, and has potential as a large-scale internet replacement, for agricultural data collection in thinly infrastructured areas.
To customers who can afford it (or can attract the necessary funding), VSNi sells the full commercial GenStat product. To users who can benefit from it, but would be significantly strained by its purchase, a free Discovery Edition (GDE) is made available through partner institutions that undertake to supply and support the software within their sphere of contact. GDE, based on an enhanced previous release, contains updated methods while preserving differentiation in other aspects. Since programs can easily be written in one version of the software to work in the other, coherent two-way synergistic analytic conversations can be transparently maintained within research communities. For the partner institution, the pay-off is expansion of its research and results ‘fetch’; for VSNi, seeding of potential future (albeit next or subsequent generation) markets for the full software sales and an immediate socio-economic contribution to agricultural improvement without ruinous cost.
The policy is one that sees GenStat appearing in a wide range of contexts. GDE started in Africa, is spreading in India, and is currently developing partner links with universities in Vietnam and China. VSNi will train centres of excellence across south-east Asia, which will then cascade expertise and may also sell GenStat on to other customers as local franchises to raise their own income.
The problems of ramping up food production are by no means limited to developing world economies – but the margins there are tighter, the immediate benefits greater, and the potential often less exploited so the potential returns on input greater. Developing economies can also be among the largest net global food contributors; China, for example, produces more grain than the USA.
Education for a crowded future. Teenagers studying population/food sustainability build a public information display around data in GenStat (upper right foreground) and OriginPro (background).
There’s more to maximising food production than simply improving output from traditional arable and livestock land uses. Mycoprotein from moulds or fungi, well established as a vegetarian staple in the form of fusarium graminearum, was originally conceived as a way of converting starch for an expanding population and could well come to serve that purpose. One of my ex-students, in connection with research directed at sustainable extraterrestrial colonies, is working on squeezing the greatest efficiency and minimum biohazards from conversion of sewage and other waste streams into algæ-based food products. This is not a new idea (a cursory search immediately turned up a reference nearly 60 years old), but tends to meet psychological and aesthetic resistance. As food becomes more scarce, need may overrule sensitivity and bring the idea back from the moon and Mars to Earth; algae offer dramatically higher yield densities than any conventional crop, and grow continuously so are not subject to annual harvest cycles. This line of enquiry also holds out the bonus of a potential solution to parts of our growing environmental pollution headaches – including what to do with all this extra food after billions of people and their farm animals have eaten it. Intensive data analytic study of numerous variables holds the key to getting the highest ratio of safe food output to discard loss, while also minimising input of resources such as energy and water. This is the sort of work that proceeds much faster and more effectively if analysis is kept as close as possible to experiment (see box: Returning data to the scientists).
Another alternative production route that is viewed with suspicion, but whose time may soon come is the growing of muscle tissue without all the mess and pasture associated with real animals: in vitro meat. Like algae from waste, this suggests ways of producing food on land not otherwise agriculturally productive; both will no doubt first be applied in industrially developed economies, but could in principle be even more valuable in barren or marginal areas of the developing world. Short of true muscle, aggregations of muscle cells produced in a bioreactor are another option. In the long run, such methods would probably be more economic than growing meat on the hoof, but there is a development bridge to cross first and, once again, a lot of data analysis needs to be done.
Sample handling (image supplied by FERA).
More general biological and ecological research is important, too. With all the rush to extract more food from monocultures, and to expand the space in which to do it, the parallel need to conserve biodiversity becomes ever more urgent. It’s easy to forget that natural genetic populations are the capital upon which bioresearch draws, and every species lost is a potential future lifesaver no longer available. Further, the mechanisms by which diversity develops are not well enough understood. This is another computing intensive area. Tina Sarkinen, at the University of Oxford’s department of plant sciences, investigating the role of geographic isolation and dispersal limitation in generating high endemic species diversity, calls on the UK’s National Grid Service (NGS) as she combines DNA sequencing data, molecular dating and fossil data to reconstruct densely sampled phylogenies over a period of 10 to 15 million years.
Two centuries on, Malthus hasn’t gone away. Two curves are racing each other up the graph’s y-axis: population and food production. Progress in the second doesn’t solve the problems of the first, but it is certainly provides essential breathing space and depends in its turn on growing power, capacity and subtlety of computerised data analysis.
We face tense times ahead.
LIMiting the risk
Increasing quantity also magnifies the challenge of monitoring safety. Colin Thurston, Thermo Fisher Scientific’s director of product strategy, Process Industries, comments: ‘The challenge is that some techniques in improving yields may also have a negative impact on the safety of the consumer. If we take a look at the use of pesticides in agriculture, we can see that the use of pesticides improve crop yields by removing competition for growing plants, but they can also significantly damage the people or animals who eat them. In addition, pesticides ingested by animals reared for meat provide another route of contamination into the human food chain.
‘Most countries have limits on the quantities of pesticides and pesticide residues (for example the European Union Regulation 396/2005 on maximum residue levels of pesticides in or on food and feed of plant and animal origin), however it is the capture and analysis of that data that is a significant challenge. If we consider that there are literally hundreds of different chemical compounds that need to be analysed to make sure the food is safe to eat, multiplied by the quantity of food crossing country borders, multiplied by the different types of food that needs to be analysed, we soon arrive at the conclusion that effective data analysis cannot easily be carried out by hand.
‘The use of laboratory information management systems to automatically highlight dangerous samples of food that can be traced back to its source is an example of where computer systems are significantly better at sorting the wheat from the chaff than any manual review process, thus allowing those responsible to concentrate on the laboratory analysis of contaminants.’
Returning data to the scientists
Carl-Johan Ivarsson, president of Lund University spinoff Qlucore, observes: ‘Without sophisticated interpretation solutions, it can be very difficult to derive meaning from the enormous amount of data produced by future food research studies. Focus on ability to handle ever expanding data sets, passing responsibility to bioinformaticians and biostatisticians, can sideline the scientist/researchers who best understand its implications.
‘Study of functional genomics is currently the most effective way to understand metabolic and adaptive processes in whole cells at a molecular level. Advanced data analysis software is important, analysis of DNA microarrays being a useful tool for the study of food microbes and pathogens within industrial, food and consumer environments. Food microbe genomics based on the latest sequence information generates valuable knowledge leading to metabolic engineering and development of new preservation methods. International projects to sequence the bovine genome pave the way for more sustainable food production.
‘Bioinformatics software now enables scientists to analyse this kind of genomic data, as well as a wide range of proteomic and microarray data, with a combination of statistical methods and visualisation techniques. Instant feedback and 3D presentation enable easy real-time analysis. Data comparison, hypothesis testing, exploration of alternative scenarios, can be done in seconds. Modern data analysis software such as Qlucore’s Omics Explorer can go further, combining 3D graphic representation of high dimensional data with powerful statistical methods and filters in a single mouse click under a user friendly interface. The future challenge will be to explore this data in greater depth, to fully understand the genetic basis of evolutionary success in pursuit of possibilities for efficient and sustainable food production.
‘Data analysis tools also help to examine the future in areas likely to be affected by climate change. Studies are underway to detect DNA markers associated with sensitivity of milk production to rising temperatures, water shortages, or scarcity of consumables like high energy feeds. Biotechnology breakthroughs and advanced data analysis allow researchers to study different combinations of genetic and chemical approaches with specific traits, to protect a seed from the moment that it is planted. Future developments in this area are likely to include drought-tolerance and disease control. International R&D co-operation and possibilities created by sophisticated data analysis software are helping to accelerate the pace of innovation.’
- FARA, Forum for Agricultural Research in Africa, www.fara-africa.org/about-us/contact-us/
- Food and Environment Research Agency, Research services, email@example.com
- National Grid Service, UK research access to computational and data based resources, firstname.lastname@example.org
- OriginLab Corporation, OriginPro, www.originlab.com/www/company/qform.aspx?s=1&
- Qlucore, Omics Explorer, email@example.com
- RUFORUM, Capacity Building in Agriculture, http://ruforum.org/drupal/node/10
- The Bill and Melinda Gates Foundation, Grants, firstname.lastname@example.org
- Thermo Fisher Scientific, Nautilus & SampleManager, www.thermofisher.com
- University of Oxford, Department of Plant Sciences, email@example.com
- VSN International, GenStat, firstname.lastname@example.org