Mucking about with molecules, and other stories.
Early in the summer, the long academic break lay before me - uncluttered, for once, by other commitments. Elsewhere, in a financially-strapped third-world country, a group of first year university students were preparing plans for an entirely unfunded environmental cleanup project. In an unguarded moment, I rashly agreed to provide some voluntary support for their preparations. It seemed a good idea at the time. My work and interests are in environmental science, but my experience is framed by the British organisations with which I have been involved. This would be an interesting extension from the contexts I knew. A few weeks later, cold reality arrived in the form of a long list of chemical names and a huge box of software.
The list was a 'partial inventory' for deliveries made to a now defunct military-industrial facility built some decades ago in a remote swamp (see Scientific Computing World, Sept/Oct 2002, 'The Return of the Swamp Thing'.) I was taken aback first of all by the sheer variety and range of it: from simple potassium metal to complex compounds, such as the enzyme carboxylesterase. More worrying was the combination of hazards which it represented: most of the items being explosive (picric acid, for instance); highly toxic (such as phosgene, to take just one example); or both. The facility, long abandoned, has begun to fall apart, so anything on this list might or might not be making its unpleasant way out into the surrounding environment.
The box of software was the latest release of ChemOffice (badged '2004') in its top of the range 'Ultra' incarnation, obtained from publishers CambridgeSoft somewhat ahead of market release by Adept Science. This suite is built around two main 'big name' applications (ChemDraw Ultra and Chem3D Ultra), two subsidiary ones providing administrative functions (ChemFinder Ultra and E-Notebook), and a supporting cast of reference databases.
ChemDraw is a tool for producing and handling two dimensional framework drawings of chemical structures; Chem3D models those structures in various ways. ChemFinder handles Access file format (.mdb) databases; and E-Notebook is a tool for collaborative working. Most such boxes turn out, on opening, to contain one disk, one or more hefty paper manuals, and a quantity of empty space for shelf filling purposes. This one has the manuals but none of the space: the actual product comes on not just seven CDs but two DVDs as well.
How to approach this in the most useful way for the inexperienced undergraduate clients? The obvious first step was to flesh out that list of names with supplementary information. First year undergraduates will have a roughly similar level of knowledge anywhere in the world; it would not include most of the items on the list, but information about them should tie into at least some of their general understanding acquired so far. Much of the list wasn't that familiar to me, either. Despite my current work, I am an inorganic chemist by background. Organics dominated here, so such an exploration would enhance my own ability to advise.
Shifting perspective a little, two of these undergraduates would later come to Europe as guests of an industrial explosives manufacturer. They would be given access to the same software; it would probably be their first exposure to such a product. For myself, other products have provided some of the functions that are all provided in one place here. (A colleague uses an earlier version of ChemDraw, providing me with another perspective.) As a newcomer to the product, and as one used to watching students learn chemistry, I felt some kinship with newcomers to the software class as a whole. Many readers of Scientific Computing World will already know the product line well in its previous releases; they will no doubt go straight to the 'new in 2004' feature list. For those, like me, who are considering it for the first time, ChemOffice is a an impressive array of useful and productive power, easy to pick up and use within its presets, although it requires some determined perseverance once the user penetrates past the first layers.
The first port of call was ChemDraw. With a weather eye on my beginner status, I started by working my way through the tutorials. Later on, to get some longitudinal perspective and to test my own impressions, I discussed the software with my colleague who uses an earlier version. The instructions are clear and the exercises give a good introduction to what the package can do, such as illustrating reaction mechanisms with organic intermediates and drawing Fischer and Haworth projections for glucose molecules. (I was interested in these applications from a generic teaching standpoint, quite apart from the job in hand.)
At first acquaintance, there were some queries about presentational conventions; these were resolved, but the help in doing so was sometimes less than obvious. Aromatic (benzene) rings, for instance, are given the cyclohexatriene or Kekul structure, rather than representing the delocalised electrons by a circle inside the ring as I am accustomed to do. Making the choice is simply a matter of pressing the CTRL key, but it took a while to discover this.
For the swamp project, the facility to convert structures into names was obviously important; this is available as a dialogue box in the Structure menu, or as a Paste Special command from the Edit menu. I started a ChemDraw document by simply typing names into the dialogue box, but after some experimentation I was able to import a text file version of the whole list using the Paste Special command to place the list in a text box. I then cut single names from the list and pasted them into separate text boxes, using the 'Name as Structure' command to give me the structures. This worked well, and the work was done quickly once I got the hang of it. There were a number of names for which ChemDraw couldn't come up with structures. Some were multiple compounds, some large biological molecules or polymers. While some common names are recognised, the software wasn't able to give a structure for ATP (a common abbreviation for a biologically significant molecule), but when the full name adenosine triphosphate was used it worked. It's arguable that polymer names carry enough information that the program ought to be able to make a stab at structuring them. On the other hand, my colleague was impressed that it could cope with the names of hormones such as thyroxine and lipids such as tristearin. ChemDraw was also able to cope with a typographical error in one name on the list ('napthalene' rendered as 'naphthalene'), suggesting a corrected version, which I thought impressive. On the whole, this scope seems fair enough. There are bound to be limits, and the usefulness of this facility easily outweighs its limitations.
ChemDraw, like ChemOffice as a whole, appears to be designed for organic chemists. In showing stereochemistry, for example, it uses the indicators 'S and R' for optical isomers rather than 'D and L' used in biochemistry. This emphasis is understandable, given the dominance of organic synthesis in current industrial product discovery, but it is worth bearing in mind. With inorganic complexes (such as tungsten hexafluoride, an item in the swamp list) it gives no representation of the co-ordinate bonds - this was echoed in Chem3D, discussed below.
The software showed aluminium chloride as an ionic compound, which provoked a debate between my colleague and me. The substance exists as covalently-bonded molecules in the vapour phase, the theoretical explanation of which involves starting with an ionic lattice where the polarising aluminium (Al3+) ion distorts the electron clouds of the surrounding chloride ions to result in polar covalent bonding; so justification can be argued for showing it as ionic.
Before I started converting the list to structures, I had tried to arrange the layout by setting up pages using Document Settings from the File menu. Having finished the conversion, I tried to print out the result and found myself engulfed by a grid of 625 pages. This was down to a misunderstanding on my own part, but rectifying it turned out to be less than simple. I had started out by using pages alongside the first (across the top row of the grid) before realising that there were pages vertically below it; there is no obvious way of reformatting so that this material rearranges itself downwards. Adding a header and page numbering are also somewhat clunky compared to most word processors.
Next stop was a reference source in which to bone up on the list items about which I knew little. The 13th Edition of the Merck Index is provided with ChemOffice Ultra - 10,250 monographs describing significant chemicals, drugs and biological substances. There was an initial hiccup over the separate access code needed (since the material is not the property of CambridgeSoft); this code was not included in the box, and installation was delayed by a couple of days while one was obtained. The CD-ROM version comes with the foreword and explanatory notes available as PDF files and a 'First Search Tutorial' with information on browsing and searching the database. Searches can be performed on a wide variety of criteria based on text (names or parts of names, e.g. all names beginning 'benz'); molecular formulae; numerical values for parameters such as molecular weight or structures drawn with ChemDraw (including similar structures and those containing a particular substructure). It's also possible to perform combined searches, using some or all or some of the above.
Continuing my policy of viewing this as a novice, I did find that there were some minor problems following some of the exercises in the First Search Tutorial. In order to start a further search I had to close the form I had displayed or open a new one, contradicting the tutorial. The structure search involved drawing propanone, which was referred to as propane, with corresponding differences between actual search results and those in the tutorial!
Nomenclature in chemistry is complicated by the fact that, although there is an internationally agreed system of nomenclature generation, the previous names are still in widespread use - particularly in industry and in the US. (This is, of course, reflected across science; in the US and many industrial contexts, a confusing plethora of older unit systems have resisted replacement by SI standards). For example, the compound CH3CHO has the systematic name of ethanal and the old fashioned name of acetaldehyde. ChemDraw will convert the systematic name to a structure drawing but, when asked to go the other way, labels the structure as acetaldehyde. Similarly, the index of Merck monographs can be searched using the systematic name ethanal in the Additional Names section, but is listed under the old name of acetaldehyde.
Nevertheless, a great deal of invaluable information is available here. There is also a series of Additional Tables which include useful things like a table of acid-base indicator data and a section with a list of organic reactions known by names (e.g. Ziegler-Natta polymerisation), with a description, reaction scheme and literature references. This will become useful as the tasks at hand develop, and I can see interesting possibilities for teaching use as well. In the immediate context of the swamp project, I used the partial inventory list of chemicals to search the Index, the entries including relevant information on such things as toxicity and other hazards, and uses of chemicals. Other tables (such as The Russian Alphabet) seem less obviously relevant, but no doubt are there for those whose contexts require them.
Eventually, all of this material being generated would be merged and supplemented to provide the swamp undergraduates with a task-specific database relevant to their undertaking. With this idea in mind, I worked my way through the tutorials for ChemFinder, a 'chemically-intelligent database manager and search engine'. Integrated into Excel as an add-in, it creates searchable spreadsheets while an addition to the MS Word toolbar makes available searching of a wide variety of document files. Included is CombiChem, for combinatorial library generation in chemical spreadsheets. ChemFinder connects to data in Oracle and MS Access, or will import and export MDL RD and SD files. Sample databases are provided of approximately 300 organic and inorganic compounds and about 250 reactions extracted from ISI's ChemPrep database of Current Chemical Reactions (the ISICCR database). Searches can be by reactant, product or reaction type, allowing a synthetic chemist to get information on how to get from a particular starting material to a desired product. Though the relevance of the specific sample content to the particular project might be limited, the ChemFinder application itself is an exceptionally useful information tool and I am only just getting to grips with the possibilities.
While it did not follow sequentially in my investigation of ChemOffice, Chem3D is most obviously a companion to ChemDraw. Once again, I worked through the tutorials in the manual. Where ChemDraw produces 2D structure drawings, Chem3D visualises them as solid models in wire-frame, stick, ball-and-stick, cylindrical-bond, space-filling, ribbon and cartoon forms (the last two being appropriate for large molecules such as proteins). Information can be transferred from one application to the other, and back again, which is a great advantage - ChemDraw structure diagrams, for instance, can be the most convenient way of creating a Chem3D model. At the same time, being able to create models by typing atom labels (e.g. 'CH3CH(CH3)CH2CH(OH)CH3') into a Chem3D text box is very useful.
The package is able to calculate a lot of valuable data; steric energy values for different conformations of a molecule (via MM2 menu, 'compute properties'), for instance. In one of the tutorials, the eclipsed and staggered conformations of ethane are created, with the lower value for the staggered form indicating it is the most likely to exist. Other options provide computational tools for models optimisation, molecular dynamics and single point energy calculations. Some of what Chem3D can do in this direction stretches my memory of theoretical concepts that I have not encountered since university. One of the tutorials (involving mapping properties onto molecular surfaces) requires some separate software, which is not included in the package.
A number of 3D visualisation aids are provided: perspective rendering; a 'distance haze' effect, which dims further atoms; and stereoscopic binocular views. If you are not one of those people who can cross your eyes and focus them at the same time to view full colour-separated stereo pairs, the glasses provided give a reasonable 3D image of molecules in conjunction with the appropriate red and blue glasses icon from the toolbar. Looking further down the line, after my immediate need for advisory material is met, it offers a lot of promising potential not only for study and prediction but for education and training as well.
There was a problem with a persistent error on quitting Chem3D, requiring a machine reboot, which was a pain. The frequency of this varied with system RAM, ranging from 'every time' at the ChemOffice base spec of 64Mbytes of RAM to 'occasional' at 256 Mbytes. Consultation with CambridgeSoft support identified this as a video driver issue, curable by upgrading to the latest version. Be prepared!
A similar problem prevented me from using E-Notebook within the review period; it seems to be fussy about the version of Microsoft Data Access (MDAC) installed. The required system spec only asks for MDAC 2.0, but the program itself seems to be more picky that that. This may well be an issue with my own setup only, but it means that I cannot comment directly on the software - an electronic laboratory notebook to allow the recording of experimental notes and sharing information with colleagues. E-Notebook would be at its most useful in a multi-user environment, and a multi-user ChemOffice environment at that; but a single user option is provided. ChemOffice has not been in my hands for very long and, as with any new toy, most of the early days have been taken up with familiarisation. At the time of writing, however, I have all the kit laid out (and most of the skills learned) for compilation of a task-specific field reference and crash instructional course on the range of pollutants that these young trainee chemists are expecting to encounter in their swamp. By the time they have finished, they will no doubt know a great deal more; but that experience can go into the pot with the rest. ChemOffice is, in many ways, less an application than a working environment; a chemist's discipline-specific analogue of the generic research information manager. Looking ahead, as I get to know it better, I foresee a number of other promising roles beyond this initial priority.
Dr Alan Wicks is a consulting analytical chemist specialising in environmental work.
ChemACX: (Available Chemicals eXchange) is 'a class of databases available from chemical manufacturers and distributors, featuring complete catalogues of major world suppliers of fine research, speciality and industrial chemicals'. It contains more than 420,000 chemical products representing about 187,000 substances and can be searched by a number of parameters including compound name, molecular formula, or chemical structure. As with 'The Merck Index', it's available as a ChemOffice Webserver application and, as my colleague pointed out, is presumably better accessed through the internet to avoid it going out of date. The product information for a particular supplier can be accessed by clicking on a product name in the Index form.
ChemMSDX: contains health and safety information. It can be accessed from the Index form but is on a DVD, not a CD, so requires an appropriate drive.
ChemSCX: (Chem ACX Screening Compound databases) is designed to complement ChemACX, 'allowing you to locate and order compounds used for high-throughput screening programs'. Contains more than 500,000 compounds from 15 suppliers.
ChemRXN: This CD contains two database folders, ChemPrep and ChemSelect, primarily of interest to those involved in synthetic organic research. It is a reaction database, assembled from Reaction Citation Index data by the ISI. Reactions are listed with literature references, containing 'what we believe to be the 13,000 most important organic reactions'. Various inclusion criteria are given, for example, it is the most popular form of its 'type', it appears in one of the 60 leading scientific journals, it represents at least five reactions appearing in five different journals, and has a yield greater than 50 per cent.
ChemINDEX: a CD of four database folders; ChemIndex Net, NCI, Buckybase and Samples, representing an eclectic collection of compounds such as drugs, common organics, and research chemicals. It is, essentially, a worldwide web home page of relevant hyperlinks.
The other databases are for particular specialised audiences:
NCI: the National Cancer Institute database, containing structures for more than 240,000 compounds along with synonyms and screening results.
BuckyBase: a dataset of complete and partial Chem3D models of buckyballs, for those working with or interested in fullerene chemistry. These can be copied from ChemFinder, and used as building blocks for rapid generation of desired constructs.
Samples: fragments from ChemInfo-compatible databases (e.g., Antibase a database of natural compounds, and BARK information services Flavours and Fragrances), some of them large enough to be useful but all intended as tasters for full commercially-available reference products.