This post distils and shares some fascinating insights from a series of wide-ranging discussions with clinicians on the topic of interpretable machine learning for healthcare. These insights could help change the way we design machine learning models for clinical applications.
In September 2020, ahead of our lab’s first Revolutionizing Healthcare engagement session, I wrote an open letter to clinicians in which I shared our lab’s vision of jointly creating “an interdisciplinary community based on curiosity, shared goals, and mutual understanding.” I expressed our hope that our clinical colleagues would “lead us … and become agents of change” by helping to explore, frame, and solve important real-world problems in the domain of healthcare.
With 7 Revolutionizing Healthcare sessions now under our belt, we have had many enlightening and inspiring exchanges with clinicians. Nowhere has this been more true than the area of interpretable machine learning for healthcare.
Interpretability: a hot topic, for good reason
Interpretability is absolutely crucial to the success of machine learning for healthcare. As I explained in a recent post on the topic, the reality is that most models (no matter how brilliant) can’t be used out of the box in healthcare settings. This is because, on their own, they are black boxes that are hard for their intended users (whether clinicians, patients, or medical researchers) to apply, understand, and trust.
It is perfectly understandable that potential users of machine learning technologies feel that, without the ability to understand, test, and debug the outputs of a model, they cannot and should not take action on the basis of that model’s recommendations.
You can find a much more complete explanation in the form of a lecture I gave for The Alan Turing Institute (where I’m a fellow) at the Royal Society of Medicine in March 2020.
Letting clinicians drive the debate
On March 31 and April 27, 2021, our lab held a series of two live roundtables on interpretable machine learning for healthcare. In addition to an audience of clinicians from around the globe, we were very fortunate to be joined by a panel of expert clinicians representing a diverse range of specialties:
- Alexander Gimson, MD FRCP (Consultant transplant hepatologist, Cambridge University Hospitals NHS Foundation Trust)
- Bingnan Zhang, MD MBA (Hematology/oncology fellow, University of Texas MD Anderson Cancer Center) – first session only
- Prof Henk van Weert, MD PhD (Professor, general practice, Amsterdam UMC; Research programs in oncology and cardiovascular diseases)
- Martin Cadeiras, MD (Associate professor, medical director, heart failure, heart transplantation and mechanical circulatory support, University of California, Davis)
- Maxime Cannesson, MD PhD (Chair, Department of Anesthesiology & Perioperative Medicine, University of California, Los Angeles) – second session only
In our first session, we aimed to have a relatively high-level conversation about different definitions and types of interpretability, whereas the second session focused more on how interpretability can help build trust in machine learning models and benefit healthcare stakeholders. Underlying both of these were two recurring questions: to what degree can interpretable machine learning really benefit healthcare stakeholders, and will it provide the key to acceptance of machine learning technologies?
Both roundtables yielded spirited discussions and remarkable insights that could genuinely change the way we design machine learning models for clinical applications. I have distilled these insights on this page, as I believe they could be of use to the machine learning and healthcare communities at large.
Interpretability means different things to different people
An exploration of interpretability should naturally begin with the question: what do we mean by interpretability?
This is far from a simple question, and it was clear from our discussions with clinicians that they did not share a single definition. This is not to say that our panelists held misconceptions about the underlying concepts; rather, the term “interpretability” is nebulous and multifaceted, and is used as a catch-all for a range of concepts, such as adding informational content to a prediction, making the workings of a model transparent, or making a model or prediction easy to understand.
Several of our panelists noted the variation between definitions of interpretability is likely to depend heavily on individual needs. This was particularly well expressed by Alexander Gimson:

“With respect to the clinician, with respect to the researcher, and particular with respect to patients, each of those will have a slightly different understanding of what ‘interpretable’ might mean, so we might need to use [different] sorts of interpretability in different ways for each of them.”
– Alexander Gimson
This is an excellent point: a clinician may want to know why a recommendation was made for a particular patient; a patient may need to use information as a basis upon which to give informed consent or make lifestyle changes to reduce risks to their health; a researcher may seek to use the output of a model to develop and pursue a data-induced hypothesis and drive scientific discovery.

For each of these groups, receiving information tailored to another group is not helpful, so such information arguably could not be considered interpretable. As a result, machine learning models need to go beyond one-size fits-all approaches to interpretability and offer explainability based on likely users and use cases.
Our own lab has already ventured into this territory in the last few years. In one project, for example, we created a decision support system using reinforcement learning to learn what is interpretable to a wide variety of different users and, consequently, build their trust in ML models. Further details can be found here.
Exploring the types of interpretability
To help navigate the range of user-specific needs from interpretability models through our discussions, I proposed splitting interpretability into 4 “types” of information and asked our panelists for their thoughts. The four types are listed below, as are some key points from the conversations on each.

1. Explanatory patient features
This type of interpretability involves identifying and showing which patient-specific features the machine learning model has considered when issuing a prediction for a patient (i.e., individualized feature importance). We can do this either by identifying features that are important for an entire population or by identifying features the model has considered specifically for the patient at hand.
Our lab has already developed a number of models offering this type of interpretability. One such approach is INVASE, which was first introduced in a paper published at ICLR 2019.
Building on our work with INVASE, we have made further progress in developing methods that offer interpretations based on explanatory patient features. In a paper recently accepted for publication at ICML 2021, for example, we adapted this approach into a time series setting. I would also recommend taking a look at our work on symbolic metamodeling, which is introduced further on in this page.
Henk van Weert’s response about this type of interpretability perfectly illustrates my point about different groups of people having different needs:

“As a doctor, I will need explanatory features, because that gives me the idea that I am in control: I can think of what’s best for a patient. But those features will not be the ones that the patient is waiting to hear from me.”
– Henk van Weert
As Henk rightly points out, feature importance may be helpful for doctors who are trying to work out how to best treat a patient; conversely, a patient may benefit far less from knowing that the most important features determining her cancer mortality risk are her age and ER status.
2. Similarity classification
Through similarity classification, we seek to identify and explain which similar patients a machine learning model has provided the same–or different–predictions for. An approach based on similarity classification would involve cross-referencing the black box model’s prediction with available observational data regarding the features and outcomes of similar patients, and then explaining the model’s prediction in terms of those features and outcomes.
Our clinicians generally agreed that this type of interpretability has far more value to patients than explanatory patient features.

“A patient … might want to how the equation works for patients that are similar to them, but they don’t necessarily need to know exactly all the particular drivers of an equation.”
– Alexander Gimson

“For the explanation to patients [regarding] why you should do something, my preference will be [similar patients] because it gives me a very easy way of explaining why you should do something.”
– Henk van Weert
Several of our lab’s projects have sought to provide interpretable explanations based on similarity classification. Most notable among these is an approach using deep learning to cluster time series data, where each cluster comprises patients who share similar future outcomes of interest. This was introduced in a paper published at ICML 2020. We also have research projects underway in this area at the time of writing.
3. Unraveled rules and laws
With this type of interpretability, we seek to discover “rules” and “laws” learned by the machine model. These can be in the form of decision rules, or even “counterfactual” explanations in the form of “What if?” question-answer pairs that describe the smallest change to the patient’s features that would change the model’s prediction to a predefined output. For example, a clinician could use this type of interpretability to establish the smallest difference in tumor size that would change the model’s prediction for a patient with cancer. At the time of writing, our lab’s work in this area is in its early stages, with the first papers expected to be published in the coming weeks or months.
Bingnan Zhang viewed this type of interpretability as “underutilized” at present, but saw significant potential for both clinicians and researchers to make use of such insights:

“Unraveled rules and laws I think is very interesting, and probably … a bit underutilized in our current machine learning in healthcare situation. But it will be very helpful, I think, to discover and help discover the relationships and hypotheses generating the causality of underlying variables. It kind of mimics how humans think, which is to develop hypotheses based on the evidence we see, and it can be further tested with experiments in the lab, for example, and … in real clinical scenarios.”
– Bingnan Zhang
4. Transparent risk equations
This approach to interpretability allows us to turn black box models into white boxes by generating transparent risk equations that describe the predictions made by machine learning models. Unlike regression models, this involves mapping non-linear interactions between different features. We can then discard the black box model, and rely on the transparent risk equation to issue predictions.
The bulk of our own work focusing on this type of interpretability has involved symbolic metamodeling framework for expressing black-box models in terms of transparent mathematical equations that can be easily understood and analyzed by human subjects. A symbolic metamodel is a model of a model—a surrogate model of a trained (machine learning) model expressed through a succinct symbolic expression that comprises familiar mathematical functions and can be subjected to symbolic manipulation. We first introduced symbolic metamodels in a paper published at NeurIPS 2019.
It should be noted that transparent risk equations created in this manner (for example, using symbolic metamodeling) can be applied to the other three types of interpretability listed above. Using patient features as inputs and risk as outputs, we can identify variable importance, classify similarities, discover variable interactions, and enable hypothesis induction.
We built on our symbolic metamodeling work by developing Symbolic Pursuit, which was first introduced in a paper published at NeurIPS 2020. The Symbolic Pursuit algorithm benefits from the ability to produce parsimonious expressions that involve a small number of terms. Such interpretations permit easy understanding of the relative importance of features and feature interactions.
In our discussion on this type of interpretability, Alexander Gimson noted that (while he fully recognized the benefit of producing transparent risk equations), he still had doubts regarding their applicability to particularly small subgroups of patients:

“There’s no question that having fully transparent equations is the optimum, but I would have some … questions about exactly the degree to which that can be accurate for smaller and smaller subgroups for which may only contain a few patients.”
– Alexander Gimson
Alex makes an entirely valid point here: we must strive to ensure that transparent equations work reliably when scaled up or down across populations, subgroups, and individuals. This is something I talk about a little bit here, and is also addressed in an article that will soon appear in Nature Machine Intelligence.
Henk van Weert also commented that transparent risk equations could also be invaluable when attempting to adapt a model from one setting to another:

“The transparent risk equations are very important if you try to provide your machine learning models to doctors all over the world in different situations with different kinds of patients in different cultures and different health care systems, because in those circumstances you have to be able to run your equations every time again and again and again, because they have to learn, they have to adapt themselves to different situations and that’s very important in medicine.”
– Henk van Weert
Does the value of interpretability depend on context?
Martin Cadeiras provided an observation that added a dimension to the discussion about varying needs for interpretability: he explained that, in addition to an individual’s perspective as a patient, a clinician, or a researcher, these needs may depend on that individual’s specific situation or context at a given point in time.

“I had an experience of changing from a large medical center with a large set of patients, colleagues, and experience embedded in the system, to a place where I had just to start to do things by myself. So my interpretation, my level of confidence, how this adds to me as a clinician is different when I’m embedded in that context, when I’m in a context when I have just to make those decisions without having that experience, without those systems behind me that support my decision.”
– Martin Cadeiras
As Martin pointed out, an individual’s need for interpretability could even vary according to their professional circumstances and the level of support they have access to. Alexander Gimson made a somewhat similar point about context-specific need for transparency:

“We do need to have explainability—you do need some degree of transparency—but I think depending on the context, depending on the situation, the degree of transparency may vary. You may need to be more transparent in certain contexts when it has a really critical role, and less transparent when it has a less critical role.”
– Alexander Gimson
Balancing interpretability against accuracy of prediction
So far, I have shared some of our panelists’ insights regarding how the type of information required from interpretability can depend on an individual’s role, or how an individual’s overall need for interpretability can vary on a case-by-case basis or depending on their professional situation.
Our roundtable discussions also made it clear that the value of interpretability depends on potential sacrifices against the accuracy of a model.
This was captured perfectly in a remark by Alexander Gimson:

“What is the trade-off for a clinician between the accuracy of the prediction versus their ability to understand why it is accurate? … If you know all the factors that are involved in an equation that is still relatively inaccurate, is that more important than having an equation that is much more accurate but where you don’t really understand all the full implications of all the different factors that are going into it?”
– Alexander Gimson
Alexander further noted that every individual has their own way of balancing the how they value interpretability against the accuracy of a prediction.

“I think that trade-off also probably is different for different clinicians. How the clinician interacts with the prediction is extremely important. People feel that accuracy is more important because they are, as it were, just wanting to be able to give their patient an answer, but some only feel satisfied with that if they really understand in greater depth (or interpretability) why the answer they’re giving their patient is as it is predicted.”
– Alexander Gimson
This added an interesting dimension to our discussion, and our panelists agreed that the interpretability-accuracy balance is an area has yet to be thoroughly researched. It also brought us to the borders of a new and absolutely pivotal domain of discussion: interpretability and trustworthiness.
Interpretability and the trust equation
If a model is interpretable but a decision-maker chooses not to follow its guidance, we may assume that the clinician does not trust the model. Conversely, if a decision-maker follows the predictions or recommendations of a model without requiring further information, we must assume that they trust the model implicitly—despite interpretability not playing a role.
Both of the scenarios above are plausible, and in both cases trust and interpretability are decoupled. This raises numerous questions, including:
– To what degree does interpretability actually build trust, and how does this translate to acceptance?
– What other factors determine an individual’s trust in a model?
– What standards should AI and machine learning be held to in order to enhance perceived trustworthiness?
Our conversations with our panelists touched on all of these topics. Some key insights are shared below; you can also view some of our papers on this topic here.
Interpretability vs. trustworthiness
Defining the areas of overlap and difference between interpretability and trustworthiness presents us with a philosophical challenge: we must examine the broader question of how trust is engendered, and what features (besides openness) make us feel able to trust a person or a technology.
Henk van Weert offered a comparison from the realm of consumer technology:

“We all use iPhones and we don’t know how they work, but I am sure that I can speak to you through an iPhone. So there’s no mistrust in that anymore. But you have to have some, I think, as a clinician, because you’re doing things that you need to explain to other people (namely your patients). And you can’t just tell them, “well my algorithm tells me to do so.” I think that’s not enough, so you have to have some clue why the algorithm does what it does, and if you don’t have the slightest idea how it works, you will not use it.”
– Henk van Weert
Alexander Gimson offered a similar example related to trust between clinicians:

“One knows of many clinicians who may make decisions that I may not trust. Some of the reason for that is because they can’t explain why they make that decision, and if they can’t explain, I don’t necessarily trust them as much.”
– Alexander Gimson
On the surface, this line of thinking seems to support the argument that interpretability can indeed build trust in machine learning models by providing explanations for decisions. It also, however, raises corollary concerns: does the clinician inherently trust the model more because it is interpretable, or is interpretability serving as a kind of back-up? And does the patient actually value the information provided by interpretability, or are they simply comforted by the knowledge that some kind of basis exists for a prediction?
These are extremely interesting issues, as they raise the broader question of whether interpretability should be valued for the actual content of the information provided, or whether its primary source of value is rather to counteract the fear of trusting black-box models. Pursuing this line of questioning and gaining further insights would doubtless enable us to develop machine learning models that are better tailored to the needs of healthcare stakeholders.
What other factors determine an individual’s trust in a model?
In many cases, trust decisions involve factors that go beyond simple demonstration of capabilities and provision of useful information. While such factors may not even be entirely logical, that does not make them unreasonable or any in any way unimportant.
One such factor is the value of a positive relationship with a partner from the AI and machine learning communities. As observed by Maxime Cannesson, this is reliant on the perception of their ethical standards, behavior, and adherence to values:

“There is a part of trust that you that you gain with intelligence and reasonable explanation, but you have a part of trust that you gain by the demonstration of how you’ve developed a relationship … can you pay attention to the ethics of developing algorithm in a big data environment? … Do you take things like diversity of your patient population in terms of gender, race, ethnicity, into account when you develop? … How are you going to develop? … What level of ethics have [you] demonstrated in the past? What kind of team have you built to develop this algorithm to avoid biases?”
– Maxime Cannesson
Another element of the trust equation is the perception that a technology has achieved proven results in the field. This was highlighted by Alexander Gimson and Henk van Weert:

“For many clinicians, if you know that [a technology] has been shown to be beneficial, I’m not sure whether it matters that … you really understand the algorithm behind it. On a much deeper level, I think peer pressure makes a difference: if you see everybody else using it, then you’re more likely to use it yourself, if it comes with a prior reputation you’re maybe more likely to use it.”
– Alexander Gimson

“The rise of evidence-based medicine has taught us that if we know that there is a ‘what’ to do, the only thing you have to do is to prove that it works better than something else that you can do. So I think that the explanation might in part be found in comparisons: what happens to patients with, and what happens to patients without using the machine learning models. And I have a strong guess that machine learning will win.”
– Henk van Weert
These are very valid observations, and it is completely reasonable for a user of a relatively new technology to feel reassured by the positive experiences of other users. This does, however, present a chicken-and-egg problem: acceptance of the technology depends on real-world results, but achieving those results in the first place requires the technology to be (on some level) accepted for use.
The last (and perhaps least logical) of the trust factors we discussed is the very human bias in favor of recommendations that support our own beliefs. This was summarized very effectively by Alexander Gimson:

“The decisions we make about technologies are not entirely rationally driven. Some of them are to do with our current cognitive biases … Studies have shown that that you trust people you agree with far more than you trust people who don’t agree with you, and the same applies to equations that give you answers that satisfy your own prejudices.”
– Alexander Gimson
This part of the trust equation is difficult to overcome, since obviously the need to confirm a decision-maker’s existing beliefs should never factor into the design or output of a machine learning model!
Henk van Weert then added a further dimension to this discussion by pointing out that confirmation bias even plays a role in our acceptance of existing rules and approaches:

“We trust [established] rules because they apply to our gut feelings. We agree with the rules, we don’t trust them. And I think it’s something historical, because we are used to acting by those rules, and they became rules after we thought that we should act like this.”
– Henk van Weert
This is a brilliant observation from Henk, and without doubt extends beyond the domains of machine learning and healthcare. It also opened up a rich vein of discussion with our panelists: how do the expected standards for machine learning models compare to those for established approaches within healthcare?
Should machine learning be held to higher standards than apply elsewhere in healthcare?
This question is, in many ways, at the heart of our discussion regarding interpretability, trust, and the acceptance of machine learning technologies in healthcare. It is also a question that can only be addressed by domain experts, rather than the machine learning and AI communities. The domain experts determine and communicate their needs and standards, and we develop our tools and methodologies accordingly.
Maxime Cannesson took the lead in this debate by introducing and explaining an editorial he wrote for Anesthesia and Analgesia in 2016, entitled “All Boxes are Black.” In his editorial, Maxime compared the barriers to entry for machine learning to other technologies now prevalent in medicine, such as the pulse oximeter, which have gained trust and widespread acceptance despite the fact that their workings are often not understood by their users.
Maxime expanded on this view during our discussions:

“A lot of clinicians would argue that they need to understand how a technology or medication works in order for them to accept it. My argument in this manuscript was to say, most of us do not have the expertise to understand how most of the technologies we are using actually work. When it comes to machine learning algorithms… do we really need to know and understand how the system works in order to accept it, or is clinical testing the only thing that’s going to make this system be accepted/be efficient or not? I believe the clinical testing is going to win, I think that’s going to be much more important to actually understanding how the system works.”
– Maxime Cannesson
A further dimension was added to this debate by a member of the audience, Venkat Reddy (a neurodevelopmental pediatrician with Cambridgeshire and Peterborough NHS Foundation Trust), who asked our panel for their views about standards for interpretability from clinicians:

“Are we expecting AI and machine learning to be even more explainable, even more transparent, than human clinicians?”
– Venkat Reddy
Note: machine learning and human interpretability
Putting aside the topic of machine learning interpretability very briefly, Venkat’s question here is particularly fascinating because it touches on the idea that decisions made by humans could be made more interpretable. This is something I believe can be done by machine learning, and, in fact is the focus of an entirely new area of research created by our lab. We recently unveiled our plan to conduct long-term strategic research in this area, which we call quantitative epistemology. For further information, please take a look at our recent announcement.
Maxime and Venkat suggested that the expected standards for machine learning are relatively higher than elsewhere in medicine. If this is the case, parts of the machine learning and AI communities may feel frustrated by the notion that they face barriers to entry that other healthcare technologies, approaches, or practitioners have not been subjected to.
Henk van Weert flipped this point on its head, however, explaining that meeting these higher standards could demonstrate the value of machine learning and lead to its acceptance in healthcare:

“I think every clinician knows the example of Semmelweis and maternity fever, of smallpox and Jennings … where much progress in medicine has been made without knowing why. And in recent times, we don’t accept that anymore; we want to have proof that something works before we start administering it. And I think that the idea that we want to have proof can be reinforcement for artificial intelligence.”
– Henk van Weert
Henk’s comment provides grounds for optimism: despite being held to particularly high evidentiary standards, machine learning may be uniquely positioned to prove its value by showing that it can meet those standards.
Even if we assume, however, that we can prove the value of machine learning by making models interpretable and showing the basis of their outputs, clinical stakeholders will still need to learn how to interpret those outputs in order to actually trust them. This leads to the final area of discussion covered in this post: integrating interpretability (and understanding of machine learning in general) into medical education and training.
Can education and training complete the trust equation?
It is very clear that trust in machine learning models cannot be engendered simply by making these models interpretable and expecting people to use them. As suggested above, we can fill many of the remaining gaps in the trust equation by building relationships carefully and responsibly, by demonstrating proven results, and by willingly meeting the high bar that has been set for machine learning.
We must also, however, ensure that future potential users of such models have access to early career training and education that will enable them to gain a degree of familiarity with machine learning and other technologies.
This point was emphasized by both Maxime and Alexander:

“If we really want these systems to be understood by clinicians, what we need to do is to increase the level of medical education around technology … People will go to medical school, and will keep on learning about physiology, about pharmacology, about anatomy, and they will practice in a world surrounded by technology and they will have no clue how this technology works.”
– Maxime Cannesson

“I think part of it will be education, will be trying to train people so that they can … understand the benefits of technologies, whatever the technology may be; machine learning is just one of those technologies. How we understand technology better is not part of standard medical education at the moment; it needs to be.”
– Alexander Gimson
Incorporating technologies such as machine learning into education and training for healthcare will undoubtedly require a concerted interdisciplinary effort and long-term investment, but is ultimately the most reliable path to cultivating lasting trust. In addition to building trust, familiarization through exposure will also equip future clinicians and researchers to make decisions confidently based on the interpretable outputs of machine learning models, and will also enable them to instinctively determine the extent and type of interpretability they need on a case-by-case basis.
Interpretability and trust: what next?
Throughout our wide-ranging exploration of interpretability, we have consistently returned to the two questions mentioned at the top of this page: “To what degree can interpretable machine learning really benefit healthcare stakeholders, and will it provide the key to acceptance of machine learning technologies?”
The insightful comments and guidance from our clinical colleagues have helped us answer these questions and given us a new sense of direction. On one hand, we can be confident that interpretability will enable clinicians to understand and act on the outputs of machine learning models, to explain the decisions they make to patients and one another, and to discover hitherto hidden rules and laws. On the other hand, the acceptance of machine learning in healthcare will require a concerted and interdisciplinary effort on multiple fronts—one of which is interpretability.
These discussions with clinicians have honed our focus as a lab, leaving us with a plan of action directly related to the development of methodologies for interpretability. This includes:
– refining how interpretability can be tailored to specific types of users and usage cases (explainability);
– adapting how our models show the importance of specific patient features in producing predictions;
– developing similarity classification methods that can be readily explained to patients;
– promoting the use of interpretability as a tool for uncovering rules and laws;
– improving on current methods for turning black-box models into transparent risk equations; and
– considering how we can accommodate individual preferences for accuracy versus interpretability.
We are enormously grateful to our panelists for helping us define a new agenda for interpretability, and for giving us a sense of the broader issues on the road to ensuring that machine learning technologies are trusted and accepted in healthcare.
Learn more
The discussion above represents just a part of our exploration of interpretability with clinicians over the course of two roundtables. You can view the full sessions below:
If you are a clinician and would like to learn more about how machine learning can be applied to real-world healthcare problems, please sign up for our Revolutionizing Healthcare online engagement sessions (no machine learning knowledge required).
For a full list of the van der Schaar Lab’s publications, click here.