Measuring Machine Learning’s Potential
Mary Parker

Measuring Machine Learning’s Potential

In drug discovery, what factors into the usefulness of an AI? A discussion with Roche scientist Bryn Roberts

When it comes to artificial intelligence (AI) in drug discovery we don’t yet know the limits of its abilities, though we are making progress. What can it do, what can’t it do, and what factors will determine the answers to those questions? With the help of Bryn Roberts, Global Head of Operations for Roche Pharmaceutical Research & Early Development in Basel, Switzerland, here we discuss some of the factors that will help us set the upper and lower limits of AI’s capabilities.

Data quality

In order for a machine to learn, it must be fed categorized data. For example, if you wanted to train a machine to recognize human faces in pictures, you would need to upload pictures and point out the faces in each one in a way the computer can understand. The more pictures, and the more accurately you show the computer which parts of the image are faces, the better it will be when given an uncategorized picture.

You’ve probably heard the expression “garbage in, garbage out.” When training an AI, the quality of the data (whether it is chemical formulas, images, text, or experimental results) will directly affect its ability to do whatever job you are asking. That means if you trained the machine with images of doll faces, or used only one race of people in the training, or threw in some images of stop signs and mislabeled them as faces, the results may no longer be robust.

“It's about curation,” said Roberts. “We invest a lot of effort into curating data to ensure that they are ‘fit for purpose’ in terms of scientific and medical relevance. This requires domain expertise, with curators who really understand what kind of questions we're going to ask of the data, or what insights we expect the AI to reveal from the data, and the implications these expectations have on data context, structure, and quality, as well as the content.”

Data structure

According to Roberts, building robust AI models generally requires large data-sets, but a relatively small amount of well curated data is usually more valuable for training than enormous poorly-annotated data lakes. For a number of years there has been a lot of hype over “Big Data,” but for research and training there is a growing emphasis on “FAIR Data” (Findable, Accessible, Interoperable, and Reusable). Scientific and medical data are particularly challenging to make interoperable and reusable.

That’s where an ontology can help. A taxonomy provides simple rules for the structuring of data, with defined relationships between concepts: for example, “pugs” is a subtype of “dogs,” which is an equivalent term to “canines.” An ontology is a souped-up taxonomy, which allows for more nuanced relationships between concepts. An ontology could contain “pugs” as a historical concept detailing how the breed came about, or “pugs” as a veterinary concept detailing the medical needs of pugs, or “pugs” in popular culture. The ontology can describe how each of these concepts is related to different sets of data - breeding records, medical charts, cartoons – while understanding that they are all referring to dogs called pugs.

“We have developed our Roche Terminology Services over the last few years,” Roberts said. “In addition to the technical solutions for managing and interfacing with the ontologies, there’s an invaluable expert team of semantics and ontology specialists who can advise the domain-specific curators on how to use the terminology services, and who can extend and engineer ontologies as science and requirements evolve. It's absolutely fundamental to all aspects of data interoperability and integration. When it comes to AI, this notion of computable ontologies is particularly important.”

Training and Tuning

Not every drug discovery researcher needs extensive computer science training to use AI effectively, but it is useful to have a basic understanding of how machine learning works in order to wield it effectively, and tune the program to get what you need.

“We have data scientists with deep understanding of the fundamentals of AI and the experience required to engineer tools that others can apply in their research. However, I think it's useful for all scientists, clinicians, and others within the organization to have an appreciation of what AI is, and what it can do,” said Roberts. “And also, therefore, what it can't do. And we're still on that journey.”

Roberts uses the analogy of a microscope in thinking about ways an AI can contribute to research. A biologist doesn’t need to know the details of the physics involved in a microscope in order to use it effectively when looking at a sample, and the same holds true for AI.

“If you move the slide around, you see it moving, you can zoom in and zoom out, and so on,” he said. “And this is kind of like how human users can interact with an AI. So as you tune it in different ways, you see how it's behaving.”

So where are we now?

Roberts says that Roche uses AI in “guided decision support.” In this type of interactive workflow, the machine is fed with data manually or automatically, depending on the specific application, and provides insights that enable key decisions to be made more efficiently and effectively. For example, an AI can continuously monitor and categorize the activities of a Parkinson’s patient enrolled in a study for a new drug. Using a variety of sensor data, the machine can determine whether the subject is sitting, walking, etc. and for how long. It can compare data over time, and relay conclusions to the investigators about the patient’s functional performance or overall quality of life.

It’s still a challenge for scientists to understand the algorithms sufficiently to trust their outputs and suggestions.

“Not everybody needs to be an expert at everything,” Roberts said. “The chemist has to trust the biologist, and then the toxicologist likewise. And so too the data scientist, or the AI expert, who sits within the team, needs to be trusted. It is becoming an established discipline alongside chemistry, biology and all the other essential capabilities involved in developing a new medicine. Trust will develop further with more experience working together and experiencing the remarkable capabilities of machine learning and AI.”