Computational Origami
David Clark, PhD

Computational Origami

AI brings about a “watershed moment” for protein folding

Elementary biology teaches us that proteins are made up of a specific sequence of amino acids joined like a “string of pearls”. And, further, that proteins are 3-D objects and that their 3-D structures are crucial for their functions in the body. One of the great challenges to science over the last decades has been to try to understand how that linear sequence of amino acids reliably and spontaneously folds – typically over a timescale of a few milliseconds - to give a precise, functional 3-D structure. This is the so-called “protein folding problem”.

As well as experimental studies, a great number of computational algorithms have been developed over the years to investigate this conundrum, including “gamification” approaches such as the Fold-It program that encouraged the participation of citizen scientists. It should therefore come as no surprise to learn that, more recently, artificial intelligence (AI) methods have been applied to the protein folding problem – and that, according to a recent publication, they seem to have come up with the goods, once again.

A commentary in Nature calls this “a watershed moment” for protein structure prediction. The AI program that has generated such excitement is called AlphaFold. If that sounds familiar, it’s because it emanates from the same DeepMind research group that gave us AlphaGo (the first computer program to defeat a professional human Go player) and, more recently, AlphaStar (the first AI to reach the top league of the widely popular real-time strategy game StarCraft II).

So, how – in simple terms - has AlphaFold managed to make this breakthrough? Let us imagine that the process of protein folding is like navigating a landscape, seeking the lowest point above sea level (what is termed “the global energy minimum” in search and optimization problems like protein folding). If there are many hills and valleys in our landscape, the journey will be difficult, time-consuming and frustrating. However, if the landscape is smooth, with perhaps just one or two valleys, then finding the lowest point will be easy.

In essence, what AlphaFold has done is to recast the problem so that it predicts an energy landscape which is much smoother than the true landscape but where the lowest valley still overlaps with the true one. The relative smoothness of the landscape means that AlphaFold can find the global energy minimum much more quickly and easily than previous methods (even so, a single prediction may still take 10s to 100s of hours). This was demonstrated in a computational protein-folding competition called CASP13; AlphaFold beat the other contestants hands-down. It generated the best prediction for 25 out of the 43 protein targets. The next-best method generated the best prediction for just three of the targets.

While the folded protein structures generated by AlphaFold are not yet of the level of accuracy needed for structure-based drug design, they are already good enough to help generate biological insights and hypotheses for experimental verification.

In this instance at least, the almost unbelievable levels of hype around AI seem to be justified and it will be fascinating to see what scientific problems the DeepMind team turn their attention to next!