If you asked people on the street to describe their image of chemists working on new compounds, there would likely be some common themes: lab coats, safety glasses, glassware with colourful solutions bubbling away, fume cupboards…
What they might be less likely to describe is a scientist sitting at a computer, big datasets at his or her fingertips, and images of virtual reactions on the screen. Yet, increasingly, chemical discovery involves both types of approach to research.
Earlier this year, scientists and engineers at Harvard University in the USA made headlines after publishing a paper in the journal Nature, describing a new battery technology based on a group of organic compounds known as quinones. The flow-battery system reported in the paper promises cheap, clean and reversible energy storage when it is needed, often seen as a missing piece in the quest for widespread renewable energy.
But a key part of this research was selecting the compounds to use in the first place. Although quinones have been known for a long time to be promising for electrochemical applications – many occur naturally and carry out similar functions in plants – not all will work well in a flow battery. As Michael Aziz, a professor of materials and energy technologies at Harvard School of Engineering and Applied Sciences and one of the research team, noted, to be suitable for the group’s flow battery technology, quinones needed good reduction potential, solubility, and stability.
This posed a challenge: there are many thousands of potential compounds and nowhere near enough hours to make each one and then fabricate a battery using them. Fortunately, the interdisciplinary team included a theoretical chemist, Alán Aspuru-Guzik, professor of chemistry and chemical biology at the university, who was able to whittle down the list using virtual screening.
Aspuru-Guzik ran computational studies on 10,000 quinones to identify a much smaller subset of compounds likely to meet the criteria. The synthetic chemists involved in the project then made these compounds for the engineers to test. The result was that, a year into a grant from the US Department of Energy, the team was able to publish its Nature paper reporting promising early findings. What’s more, the team anticipates having a prototype storage system to test with wind turbines within three years.
This type of interdisciplinary project spanning the boundaries between laboratory and computer has become increasingly important in both academia and industry. It has become particularly significant in the pharmaceutical industry.
‘15 years ago, the ability of modelling to influence drug discovery was very limited. Today chemists routinely use predictive tools. They’ve become routine and ubiquitous,’ noted Adrian Stevens, senior manager, predictive sciences marketing, life sciences for Accelrys, which develops modelling software based on its Pipeline Pilot graphical programming language.
‘Modellers are much more integrated. Typically a modeller [in a pharmaceutical team] will support three to five research teams and projects, with perhaps a day to turn around a model.’ (See the next issue of Scientific Computing World for more on modelling in drug discovery).
In addition to saving time, modelling can reveal many things not possible experimentally. Chris Greenwell, of the Department of Earth Sciences at the University of Durham, UK, for example, uses molecular modelling to study the structure, properties, and chemical reactions occurring on mineral surfaces. ‘We use computational methods as they allow us insight at a level that would otherwise be impossible to achieve, and to probe different possible structures with exquisite control,’ he explained.
This might mean, for example, understanding how organic molecules interact with mineral surfaces, where there might be a hydrated surface or solvents present. Such insight is very useful to industries such as oil and gas. The group also looks at the interactions that occur in chemical catalysis, as well as new materials research.
‘With computers getting more powerful and models getting more sophisticated, we can now get very good size and timescale agreement. Simulation allows us to see something that experimentation can’t allow you to see, for example, behaviour at different layers in a material.’
Another example of this is being able to study materials at temperatures, pressures and other real-world conditions that are hard to replicate in the lab, for example, to model matter as you go down through the Earth’s crust.
Modelling, Greenwell said, also enables you to screen many properties quite quickly and to isolate two or three variables, which is hard to do in a lab experiment. The type of modelling approach depends on the system being modelled, he said. For example, he uses quantum mechanics-based code to study catalysis but this can only be applied to study a very small area. Most of the modelling work in his group, he said, uses molecular mechanics. ‘Most interactions don’t involve bond breaking and this is a simpler model, so much quicker,’ he explained.
He also noted a good availability of modelling code, both from academia and industry. His group, for example, uses the CASTEP package, developed by a group of UK academics, and LAMMPS from Sandia National Laboratory. ‘We write some things but there’s no real need to develop our own code,’ he added.
Gilberto Teobaldi of the Stephenson Institute for Renewable Energy at the UK’s University of Liverpool is another user of computational techniques to study new materials, using the ONETEP package and other academic software (DL_POLY, TINKER). His research focuses on the theory and atomistic-modelling of photo-electro-chemical interfaces and on their potential for renewable energy generation (photovoltaics), storage (batteries and supercapacitors), and more efficient use (photo-catalysis).
‘At the moment there are many challenges that cannot be addressed experimentally, for example, direct and atomically resolved insight into the functioning of buried interfaces in batteries,’ he said. ‘Atomistic modelling is crucial in providing such insight, and expanding our understanding of what causes batteries to degrade.’
Teobaldi uses both ab initio (mostly density functional theory) and force-field based atomistic modelling. ‘By solving, to a different level of accuracy, the nuclear and electronic equation of motions of model-systems, both methods are capable to provide direct access to the atomistic-parameters that control the (mal-)functioning of energy-relevant materials and interfaces,’ he explained.
The use of computational modelling is not restricted to the early stages of a project either, according to Ravi Aglave. He is CPI industry sector manager for CD-adapco, which provides computational fluid dynamic tools aimed at the chemical industry.
He explained: ‘Chemists are all the time developing things in the lab on a small scale but we have to take them to the real world, where tens of thousands of tonnes might need to be made.’
This scaling up is not straightforward and requires significant engineering. For example, if a reaction generates heat in a test tube, the surrounding air will generally dissipate the heat easily. Once the reaction is no longer on the scale of a few grams but a few tonnes, that heat becomes much more of a challenge. As Aglave noted, without careful engineering, it could cause a runaway reaction and explosions. The answer is to try to design systems that avoid such problems and here computers can help.
CD-adapco’s software enables the chemical industry, and others, to model how fluids move, chemical reactions, and mass energy transfer. For this, modellers feed in the geometries and boundaries of the vessels in a chemical plant. They also add into the model some input and output conditions from the laboratory studies. These include the chemical species involved, heat released, reactions, intermediates and any unwanted side products.
‘People are realising the power and value of this type of simulation,’ observed Aglave. ‘Someone who was doing five to 10 simulations per year 10 years ago is probably doing hundreds per year now.’
Part of the reason for this huge increase is progress made in computing power. ‘People usually utilise several cores to achieve solutions,’ noted Aglave. ‘As parallelisation and computing power have increased, the cost of carrying out these type of calculations has gone down.’
Both Greenwell and Teobaldi use a range of high-performance computing (HPC) resources, based in Europe and North America. ‘The advent of codes able to partition simulations over many processors with linear or near linear scaling, coupled to the advent of fast interconnects and high-performance computers, as well as grid-connected, high-performance computers, has led to simulations of a size and complexity of real relevance to industry partners,’ said Greenwell, who added that he can run simulations on tens of thousands of cores.
‘Access to large-scale HPC facilities (ARCHER, HECToR, STFC Hartree and N8 HPC) is a requirement for the fundamental research we are interested in,’ agreed Teobaldi. ‘The substantial increase in computational powers, coupled with continuous progress in novel and more efficient algorithms and numerical libraries has markedly benefit the whole scientific community with interest in atomistic modelling. In the group, we routinely work on systems that are between one and two orders of magnitude larger than I could do when I started my PhD roughly 10 years ago.’
Pushing the limits
But there are still significant limitations, argued Teobaldi. ‘In spite of the remarkable advances in efficiency of atomistic-modelling scientific software and increase in academically-available computational power, the accuracy-viability tradeoff is far from ideal. More method-development work (which in turns requires more dedicated research funds) is needed,’ he explained.
In terms of hardware, Teobaldi would like to see ‘cost- and energy-effective multi-core processors with sufficiently large memory (ideally more than 1GB/core)’.
With software, he would like to see ‘more performing numerical libraries tuned to emerging hardware solutions and the possibility of facile porting and tuning of existing code to novel hardware solutions’.
‘Companies are very active in this but there are huge challenges. It needs a concerted effort between business and academia,’ he explained. (See John Barr’s article on page 18 for more discussion of the role of independent software vendors and porting code to different hardware architectures.)
Mark Mackey, CSO of Cresset, agreed about these challenges: ‘In the computational chemistry industry, the main trend is the acceptance by virtually all modellers that the existing force fields (the sets of parameters that we use to describe molecules) are inadequate. In particular, their modelling of electrostatics is poor, and this leads to very misleading or incorrect results in some cases.’
Mackey noted that there are a number of major academic efforts to produce improved force fields, with varying degrees of success. ‘It’s fair to say that the problem has turned out to be more difficult than was first thought. However, I expect that by the end of the decade there will be a major shift away from the first-generation force fields and towards the second-generation polarisable ones for day-to-day calculations.’
He added that Cresset has its own force field, which, he said takes a slightly unusual approach to the problem with a good deal of success. ‘We are proactively looking into ways to improve our force field in this area, and we are also keeping an eye on the academic research.’
In the area of modelling materials, Greenwell sees another challenge for researchers. ‘By and large, most development has gone into codes for biological applications, which are often not optimal for materials chemistry or geoscience applications.’ He noted that simulations designed for biological molecules tend to be very different. ‘Proteins tend to be discrete molecules, whereas with minerals science you want to have periodic, continuous models. This becomes a challenge, especially in inputting conditions.’
‘It is certainly true, especially for large systems, that inorganic materials are set back compared with biological ones,’ agreed Teobaldi. ‘When you do models of biological systems, the pharmaceutical industry is very interested. I’m not sure that there is such a high realisation of the importance of modelling with inorganic materials.’
And this difference in interest, he said, corresponds to a different level of investment in modelling solutions – and a difference in the accuracy of the resulting models. Another development that would benefit modellers is improvements in usability. Greenwell noted that ‘a lot of tools are developed by academics but take-up by a wider community cannot happen if, for any project, highly-skilled, specialist researchers are needed every time the code/tool is to be deployed.’
Stevens of Accelrys agreed that usability is an important feature for commercial software too. ‘One of the strongest messages we get from users is that they don’t have two weeks to read the manual and learn how to use the software,’ he said. He added that there is a trend towards building guided workflow tools that allow users to do the tasks they are most likely to do by following steps, so they don’t need to read the manual again.
A related usability request is for cross-integration of tools, for example, to be able to open a chemical modelling tool when looking at data in Excel and then export the result. ‘Our products have become much more integrated and cross-product integration is now built in,’ he said.
Meanwhile, as software and computing power develops, there is a corresponding increase in the tasks that modelling is applied to. ‘We definitely expect use of modelling to increase as the range and size of problems you can solve increases,’ noted Aglave from CD-adapco. ‘New problems are being created as new materials – such as nanoparticles, photovoltaics, semiconductor materials and thin films – are being created, and, as you manufacture new materials, there are new processes. You can deploy modelling to shorten the development process.’
Another trend that he observes is the desire for optimisation, doing multiple simulations to come to the best design as quickly as possible.
But computational studies will not answer everything. Stevens of Accelrys pointed to a 19th century quote that says all models are wrong but some are useful. ‘Some modellers have thought in the past that everything is useful. Somewhere between the two is where the reality is,’ he said. In other words, successful projects need to continue to span the boundaries between laboratory and computer.