
In my last two posts, I’ve discussed how AI and machine learning can help healthcare professionals make well-informed decisions. We are working on some amazing models that can make highly accurate predictions and recommendations. While this is something I feel hugely optimistic about, I would like to use today’s post to explain that using AI and machine learning in medicine isn’t just about developing accurate models and rolling them out.
The reality is that most models can’t be used as-is by medical professionals. This is because, on their own, they are black boxes that are hard for their intended users to apply, understand, and trust.
This was the topic of a lecture I gave for The Alan Turing Institute (where I’m a fellow) at the Royal Society of Medicine in early March, shortly before COVID-19 social distancing measures kicked in. The full presentation is at the bottom of this post, but here’s a summary.
If a model is accurate, what’s the problem?
The problem we face is neatly summarized in a 2018 editorial from The Lancet: “Machine learning is frequently referred to as a black box—data goes in, decisions come out, but the processes between input and output are opaque.”

Photo by Alina Grubnyak on Unsplash
Frustratingly, even if you have created a highly accurate machine learning model, you may not be able to understand why that model is accurate, or whether the model will still be as accurate when applied in different situations. It may also be hard to use such a model to gain a deeper understanding of a problem or an area you’re trying to learn more about. In a sense, machine learning models can sometimes answer the little questions while providing insight into the big questions.
This is exactly what happened a couple years ago, when my team developed AutoPrognosis, an AI-enabled decision-support system (initially for cardiovascular issues, but subsequently also for cystic fibrosis and breast cancer, among others). AutoPrognosis significantly outperformed existing state-of-the-art methods, which was great. But it raised additional questions: why was it better, and what could we learn from it?
When you consider the potential users of machine learning in healthcare, these questions pose a problem for everyone. If you’re a clinician, you’ll lack confidence because you still won’t know why a model is recommending a certain course of action. If you’re a researcher, you’re not helped to make new discoveries because the model hasn’t shared its insights in a usable manner. If you’re a patient, it’ll be hard to use a machine learning model’s recommendations when giving informed consent or planning any lifestyle improvements.
In short, arriving at an accurate recommendation is useful, but it isn’t enough. What is desperately needed is interpretability. Users of machine learning models need to know why the recommendation was reached, how it was reached, what can be learned, and how the model will perform in a range of situations.

Photo by Patrick Tomasso on Unsplash
From black boxes to white boxes
A clinician recently told me that he’d rather use an older and less accurate—but more trustworthy—model over a newer, more accurate model that he couldn’t understand or trust. I completely empathize: if a model were telling me that patient A has a high risk of mortality after receiving a heart transplant, and that therefore patient B should be prioritized for transplantation, I’d also want to know why that recommendation was being made before making a life-or-death decision.
For me and my team, this is a challenge we take very seriously, but it also represents a huge opportunity. This is why we have been working to solve the challenge of interpretability for years. We have boiled our requirements for machine learning models in medicine down into several key criteria. In our view, user-friendly interpretable models should:
Ensure transparency: users need to understand how the model makes predictions
Enable risk understanding: users need to understand, quantify and manage risk
Avoid implicit bias: users should be confident that the model won’t learn biases
Support discovery: users need to distil insights and new knowledge from the model
In addition to the above, models should be explainable (information about them should be tailorable to the needs and purposes of different users) and trustworthy (users should have a good idea of how reliable they are).

Photo by Alvaro Pinot on Unsplash
Last year, my team made an initial breakthrough with the development of INVASE. INVASE is a new method that uses reinforcement learning (remember AlphaGo?) to examine black box machine learning models and work out why they make specific predictions for patients. It does this by using an actor-critic method, which simultaneously makes decisions and evaluates the effectiveness of those decisions. Specifically, the “actor” looks at recommendations made by a black box model, and evaluates the importance of selected patient features (e.g. age, weight, blood pressure). The “critic” then assesses the effectiveness of the actor’s selections, and compares the outcome to the original recommendations made by the black box model. This process is repeated until INVASE has determined which features are most important and reached a level of accuracy comparable to the original black box model.
While there are other methods that also examine the importance of individual patient features, the unique thing about INVASE is that it can also determine the set of important features for each patient.
Building on our work with INVASE, we have made astonishing progress over the last year using a technique called symbolic metamodeling. This is an approach that takes black boxes and unpacks them into transparent equations. In essence, symbolic metamodeling replaces an accurate but opaque model with a similarly accurate and transparent model.

Photo by Simon King on Unsplash
As a result of the work we have done so far, we are actually very close to solving the problem of interpretability. In a sense, we can now have our cake and eat it: we can keep the highly-accurate black box models that reach conclusions humans can’t, but at the same time we can now gain insight into how those conclusions were reached, and repurpose that insight for different users with specific needs.
Who will benefit?
Ultimately, no-one loses when black box models are made interpretable and transparent. If it’s your model, you can learn more about how it actually works, and maybe make further improvements. If you’re a clinician, you can make decisions based on an understanding of why a recommendation was reached, and how reliable that recommendation actually is. If you’re a researcher, you might be able to learn something new about an area of clinical study by gaining a glimpse into patterns humans don’t ordinarily observe (for example, the nuances of how specific patient features interact with one another). If you’re a patient, you’ll have a more solid basis for giving informed consent or making lifestyle changes.
You can find a much more complete explanation below, in the form of the Turing lecture I gave a month ago.
[…] small example from a recent post: by using INVASE and subsequently symbolic metamodeling to turn machine learning black boxes into […]
[…] or to rely on experts. In addition, many models are not interpretable (see my previous post on black boxes!), meaning that their predictions or recommendations are hard to understand and trust, and are […]