What's Hot in 2020: Very Large-Scale Virtual Screening
David Clark, PhD

What's Hot in 2020: Very Large-Scale Virtual Screening

Two new software systems can dock millions of compounds, and billions of molecules respectively  

Virtual screening has been an established approach in the hit-finding toolbox within pharma and biotech for almost two decades and many successes have been reported in the literature over that time (see, for example, some case studies published recently by scientists at Janssen).

In the past, such virtual screening exercises would have been applied typically to a company’s corporate screening collection (perhaps numbering up to around a million or so compounds) or perhaps a collection of a few million commercially available compounds.

However, during the last year, reports have begun to emerge of groups taking virtual screening to a whole new level. This advance has been enabled by two factors: access to huge numbers of CPUs via Cloud computing plus the collation of enormous collections of chemical structures of compounds that can either be purchased immediately or, more usually, synthesised “on demand”. The latter, so-called “virtual” libraries are based upon well-established synthetic chemistry and available reagents so the success rates when synthesis is requested are usually at least 85%.

Two reports in particular have caught my eye. The first was published in Nature and reported the docking of 138 million compounds into an X-ray structure of the dopamine D4 receptor. In terms of elapsed time, this exercise required 1.2 calendar days, making use of 1500 CPUs. 589 molecules were selected from the virtual library, of which 549 (93%) were successfully synthesised leading to several interesting hits, including a full D4 agonist with a potency of 180pM.

More recently, OpenEye – a CADD software company – has reported the docking of an even larger compound collection: of 1.43 billion molecules of the REAL Enamine dataset into X-ray structures of two different targets: purine nucleoside phosphorylase (PNP) and heat shock protein 90 (HSP90). In this case, the figures are quite staggering. The PNP case was completed in just 24 hours using up to 27,000 CPUs (total CPU time: 48 years, cost: $12,000). The HSP90 case took even less time – 18 hours – using up to 45,000 CPUs (total CPU time: 55 years, cost $15,000). The top ranked compound from the HSP90 example proved to be a 4mM inhibitor of the enzyme.

Lest we be dazzled by the stats, let’s remember that what is really important to determine here is whether or not these approaches help to speed the discovery of high-quality drug candidates for unmet medical needs. This will of course take quite a long time to establish, but in the meantime, it seems that a new era of virtual screening is upon us and 2020 will surely bring further examples of such large-scale computation applied to drug discovery.