AI-Designed Syntheses of Natural Products as Good as a Human’s
Computer-generated synthetic routes pass the Turing test
A previous Eureka blog highlighted progress in the development of computer programs that can assist chemists in their planning of synthetic routes towards the compounds they aim to make in the lab. Just recently, another significant step forward in the capabilities of this kind of technology has been reported in the journal Nature.
Many molecules that occur in the natural world (so-called “natural products”) are complex, often possessing multiple ring systems and chiral centres. An example of this, the structure of ciguatoxin CTX3C – containing 30 rings and 13 chiral centres! - is shown in Figure. 1.
Figure 1: the chemical structure of Ciguatoxin CTX3C, a highly potent marine toxin produced by the dinoflagellate Gambierdiscus toxicus. (Remarkably, ciguatoxin CTX3C was synthesised in 2001 by a group from Japan.)
To develop a laboratory (or “ total ”) synthesis of such compounds often requires a heroic effort over many years. For instance, the first total synthesis of Vitamin B12 in 1972 (Figure 2) is reported to have taken 12 years and involved more than 90 separate reactions performed by over 100 co-workers!
Figure 2: the chemical structure of vitamin B12.
Up until now, computer programs for synthesis planning have generally been restricted to comparatively simple “drug-like” molecules. This is largely due to their tendency to plan synthetic routes one step at a time. Planning the syntheses of larger, more complex natural products requires a different strategy.
The developers of the Chematica/Synthia synthesis planning program realised that a key feature required to enable it to find routes to complex natural products was the ability to construct its strategy over several steps simultaneously, taking into account the implications of its choices at one step for those subsequent to it.
Taking inspiration from syntheses derived historically by human experts, the developers encoded four heuristics (“rules of thumb”) to help the program to mimic better the strategic reasoning required for such complex syntheses. With these modifications incorporated, the program was able to come up with plausible and original routes to challenging natural products like callyspongiolide (Figure 3).
Figure 3: the chemical structure of callyspongiolide.
The program’s developers next devised a “ Turing test” to see if the computer-derived synthetic routes could be distinguished from those devised by humans. To accomplish this, a set of 40 organic syntheses was compiled: 20 from the published literature and 20 devised by Chematica/Synthia. A team of 18 experts in synthetic chemistry was then asked to assign two scores to each synthetic route. First, a score from 0-10 depending on how likely the expert thought the route was to have been derived by a human (score = 0) or machine (score = 10). In addition, the experts were asked to judge the elegance of each synthesis and assign it score ranging from 0 (“uninspired”) to 10 (“remarkable”).
Overall, for the 20 Chematica/Synthia routes, the experts decided that 10 were of human origin and that the rest were computer-derived. For the 20 literature routes, 12 were judged to be by humans and eight by computer. The conclusion drawn from this, and other analyses of the data, was that the program had passed the Turing test – in other words, the experts were generally not able to discern the origin of the natural-product syntheses. What’s more, the computer-derived routes were judged to be slightly more elegant than those from the literature.
This is a remarkable result and marks a very significant step forward in the development and capabilities of software for synthesis planning. When such powerful tools become widely available to practising synthetic chemists, it will surely boost their creativity and productivity and thereby help to enhance the efficiency and effectiveness of the drug discovery process.