Machine learning is capable of enabling truly personalized healthcare; this is what our lab calls “bespoke medicine.”
More info on bespoke medicine can be found here.
Interpretability is essential to the success of the machine learning and AI models that will make bespoke medicine a reality. Despite its acknowledged importance and value, the actual concept of interpretability has resisted definition and is not well understood.
Our lab has conducted field-leading research into a variety of forms of interpretability for years, and has developed a unique and cohesive framework for categorizing and developing interpretable machine learning models. Our framework is presented on this page, alongside much of the accompanying research, in the hope of advancing the discussion on this crucial topic and inspiring readers to engage in new projects and research.
The content of this page is designed to be accessible and useful to a wide range of readers, from machine learning novices to experts.
You can find our publications on interpretability and explainability here.
- Interpretability: a concept with clear value but an unclear definition
- Type 1 interpretability: feature importance
- Type 2 interpretability: similarity classification
- Type 3 interpretability: unraveled rules and laws
- Type 4 interpretability: transparent risk equations
- Type 5 interpretability: concept-based explainability
- Robust and trustworthy interpretations
- Peering into the ultimate black box
- Find out more and get involved
This page is one of several introductions to areas that we see as “research pillars” for our lab. It is a living document, and the content here will evolve as we continue to reach out to the machine learning and healthcare communities, building a shared vision for the future of healthcare.
Our primary means of building this shared vision is through two groups of online engagement sessions: Inspiration Exchange (for machine learning students) and Revolutionizing Healthcare (for the healthcare community). If you would like to get involved, please visit the page below.
This page proposes a unique and coherent framework for categorizing and developing interpretable machine learning models. We will demonstrate this framework using a range of examples from the van der Schaar Lab’s extensive research into interpretability, and our ongoing interdisciplinary discussions with members of the clinical and other non-ML communities.
First, we will discuss the many potential definitions and uses of interpretability. We will then lay out a framework of four distinct types of interpretability, and explain the potential roles and applications of each type. Finally, we will turn the debate on its head by examining how interpretability can also be applied to understand and support humans, rather than AI and machine learning models.
Interpretability: a concept with clear value but an unclear definition
There are several reasons to make a “black box” machine learning model interpretable. First, an interpretable output can be more readily understood and trusted by its users (for example, clinicians deciding whether to prescribe a treatment), making its outputs more actionable. Second, a model’s outputs often need to be explained by its users to the subjects of its outputs (for example, patients deciding whether to accept a proposed treatment course) . Third, by uncovering valuable information that otherwise would have remained hidden within the model’s opaque inner workings, an interpretable output can empower users such as researchers with powerful new insights.
The value of interpretability as a broad concept is, therefore, clear. Yet despite this, the meaning of the term itself is too seldom discussed and too often oversimplified. There is no single “type” of interpretability, after all, since there are many potential ways to extract and present information from the output of a model, and many types of information to choose to extract.
This is something we explored in 2018, when we designed a reinforcement learning system capable of learning from its interactions with users and accurately predicting which outputs would maximize their confidence in the underlying clinical risk prediction model. This work was introduced in a paper entitled “What is Interpretable? Using Machine Learning to Design Interpretable Decision-Support Systems.”
What is Interpretable? Using Machine Learning to Design Interpretable Decision-Support Systems
Owen Lahav, Nicholas Mastronarde, Mihaela van der Schaar
Recent efforts in Machine Learning (ML) interpretability have focused on creating methods for explaining black-box ML models. However, these methods rely on the assumption that simple approximations, such as linear models or decision-trees, are inherently human-interpretable, which has not been empirically tested. Additionally, past efforts have focused exclusively on comprehension, neglecting to explore the trust component necessary to convince non-technical experts, such as clinicians, to utilize ML models in practice.
In this paper, we posit that reinforcement learning (RL) can be used to learn what is interpretable to different users and, consequently, build their trust in ML models. To validate this idea, we first train a neural network to provide risk assessments for heart failure patients. We then design a RL-based clinical decision-support system (DSS) around the neural network model, which can learn from its interactions with users. We conduct an experiment involving a diverse set of clinicians from multiple institutions in three different countries.
Our results demonstrate that ML experts cannot accurately predict which system outputs will maximize clinicians’ confidence in the underlying neural network model, and suggest additional findings that have broad implications to the future of research into ML interpretability and the use of ML in medicine.
Our lab has been researching interpretability methods and approaches (for application in healthcare and beyond) for many years. Our work so far has led us to a unique but powerful framework for considering the multiple types of interpretability.
Our framework divides interpretability into 5 broad “types”:
1) feature importance;
2) similarity classification;
3) unraveled rules and laws;
4) transparent risk equations; and
5) concept-based explanations
Each of these types of interpretability represents a distinct set of challenges from a model development perspective and can benefit different users in a variety of applications. These will be explored below, but an in-depth discussion on each type—driven by insights from colleagues from the clinical community—can be found in a recent piece of content entitled “Making machine learning interpretable: a dialog with clinicians.”
Type 1 interpretability: feature importance
This type of interpretability involves identifying and showing which patient-specific features the machine learning model has considered when issuing a prediction for a patient. We can do this either by identifying features that are important for an entire population or by identifying features the model has considered specifically for the patient at hand.
Our lab has already developed a number of models offering this type of interpretability. One such approach is INVASE, which was first introduced in a paper published at ICLR 2019.
INVASE: Instance-wise Variable Selection using Neural Networks
Jinsung Yoon, James Jordon, Mihaela van der Schaar
The advent of big data brings with it data with more and more dimensions and thus a growing need to be able to efficiently select which features to use for a variety of problems. While global feature selection has been a well-studied problem for quite some time, only recently has the paradigm of instance-wise feature selection been developed.
In this paper, we propose a new instance-wise feature selection method, which we term INVASE. INVASE consists of 3 neural networks, a selector network, a predictor network and a baseline network which are used to train the selector network using the actor-critic methodology. Using this methodology, INVASE is capable of flexibly discovering feature subsets of a different size for each instance, which is a key limitation of existing state-of-the-art methods.
We demonstrate through a mixture of synthetic and real data experiments that INVASE significantly outperforms state-of-the-art benchmarks.
We have continued to make progress in developing methods that offer interpretations based on explanatory patient features. In a paper recently accepted for publication at ICML 2021, for example, we introduced an approach specifically designed for multivariate time series, using saliency masks to identify and highlight important features at each time step.
Explaining Time Series Predictions with Dynamic Masks
Jonathan Crabbé, Mihaela van der Schaar
How can we explain the predictions of a machine learning model? When the data is structured as a multivariate time series, this question induces additional difficulties such as the necessity for the explanation to embody the time dependency and the large number of inputs.
To address these challenges, we propose dynamic masks (Dynamask). This method produces instance-wise importance scores for each feature at each time step by fitting a perturbation mask to the input sequence. In order to incorporate the time dependency of the data, Dynamask studies the effects of dynamic perturbation operators. In order to tackle the large number of inputs, we propose a scheme to make the feature selection parsimonious (to select no more feature than necessary) and legible (a notion that we detail by making a parallel with information theory).
With synthetic and real-world data, we demonstrate that the dynamic underpinning of Dynamask, together with its parsimony, offer a neat improvement in the identification of feature importance over time. The modularity of Dynamask makes it ideal as a plug-in to increase the transparency of a wide range of machine learning models in areas such as medicine and finance, where time series are abundant.
While feature importance methods are typically introduced for supervised models, they can be extended to the unsupervised setting. Our lab has formalized this extension by introducing the notion of Label-Free Explainability. Note that this extension also covers Type 2 interpretability, described below.
Label-Free Explainability for Unsupervised Models
Jonathan Crabbé, Mihaela van der Schaar
Unsupervised black-box models are challenging to interpret. Indeed, most existing explainability methods require labels to select which component(s) of the black-box’s output to interpret. In the absence of labels, black-box outputs often are representation vectors whose components do not correspond to any meaningful quantity. Hence, choosing which component(s) to interpret in a label-free unsupervised/self-supervised setting is an important, yet unsolved problem.
To bridge this gap in the literature, we introduce two crucial extensions of post-hoc explanation techniques: (1) label-free feature importance and (2) label-free example importance that respectively highlight influential features and training examples for a black-box to construct representations at inference time.
We demonstrate that our extensions can be successfully implemented as simple wrappers around many existing feature and example importance methods. We illustrate the utility of our label-free explainability paradigm through a qualitative and quantitative comparison of representation spaces learned by various autoencoders trained on distinct unsupervised tasks.
In a recent paper, we have demonstrated that feature importance methods have a practical interest in the context of treatment effect estimation. We use feature importance to benchmark treatment effect models on their ability to discover covariates that are predictive of the individual treatment effect.
Benchmarking Heterogeneous Treatment Effect Models through the Lens of Interpretability
Jonathan Crabbé, Alicia Curth, Ioana Bica, Mihaela van der Schaar
NeurIPS 2022 (Datasets and Benchmarks)
Estimating personalized effects of treatments is a complex, yet pervasive problem. To tackle it, recent developments in the machine learning (ML) literature on heterogeneous treatment effect estimation gave rise to many sophisticated, but opaque, tools: due to their flexibility, modularity and ability to learn constrained representations, neural networks in particular have become central to this literature.
Unfortunately, the assets of such black boxes come at a cost: models typically involve countless nontrivial operations, making it difficult to understand what they have learned. Yet, understanding these models can be crucial — in a medical context, for example, discovered knowledge on treatment effect heterogeneity could inform treatment prescription in clinical practice.
In this work, we therefore use post-hoc feature importance methods to identify features that influence the model’s predictions. This allows us to evaluate treatment effect estimators along a new and important dimension that has been overlooked in previous work: We construct a benchmarking environment to empirically investigate the ability of personalized treatment effect models to identify predictive covariates — covariates that determine differential responses to treatment.
Our benchmarking environment then enables us to provide new insight into the strengths and weaknesses of different types of treatment effects models as we modulate different challenges specific to treatment effect estimation — e.g. the ratio of prognostic to predictive information, the possible nonlinearity of potential outcomes and the presence and type of confounding.
Clinicians have explained to us that this type of interpretability would be particularly valuable to them: since they are required to work out the best way to treat a patient, it is helpful to understand the features that influenced a model’s output. By contrast, clinicians see the value of this type of interpretability for patients as lower. Patients may not consider it particularly useful to know the relative importance of their features: for example, a patient may not benefit from knowing that the most important features determining her cancer mortality risk are her age and ER status.
Type 2 interpretability: similarity classification
Through similarity classification, we seek to identify and explain which similar patients a machine learning model has provided the same–or different–predictions for. An approach based on similarity classification would involve cross-referencing the black box model’s prediction with available observational data regarding the features and outcomes of similar patients, and then explaining the model’s prediction in terms of those features and outcomes.
Several of our lab’s projects to date have sought to provide interpretable explanations based on similarity classification. Some—such as the two outlined immediately below—are tailor-made for particular medical problems.
For instance, temporal phenotyping targets the problem of disease progression; it uses deep learning to cluster time series data, where each cluster comprises patients who share similar future outcomes of interest. Meanwhile, SyncTwin is designed to provide interpretable treatment effect estimation; it issues counterfactual predictions for a target patient by constructing a synthetic twin that closely matches the target in representation.
Temporal Phenotyping using Deep Predictive Clustering of Disease Progression
Changhee Lee, Mihaela van der Schaar
Due to the wider availability of modern electronic health records, patient care data is often being stored in the form of time-series. Clustering such time-series data is crucial for patient phenotyping, anticipating patients’ prognoses by identifying “similar” patients, and designing treatment guidelines that are tailored to homogeneous patient subgroups.
In this paper, we develop a deep learning approach for clustering time-series data, where each cluster comprises patients who share similar future outcomes of interest (e.g., adverse events, the onset of comorbidities). To encourage each cluster to have homogeneous future outcomes, the clustering is carried out by learning discrete representations that best describe the future outcome distribution based on novel loss functions.
Experiments on two real-world datasets show that our model achieves superior clustering performance over state-of-the-art benchmarks and identifies meaningful clusters that can be translated into actionable information for clinical decision-making.
SyncTwin: Treatment Effect Estimation with Longitudinal Outcomes
Zhaozhi Qian, Yao Zhang, Ioana Bica, Angela Wood, Mihaela van der Schaar
Most of the medical observational studies estimate the causal treatment effects using electronic health records (EHR), where a patient’s covariates and outcomes are both observed longitudinally. However, previous methods focus only on adjusting for the covariates while neglecting the temporal structure in the outcomes.
To bridge the gap, this paper develops a new method, SyncTwin, that learns a patient-specific time-constant representation from the pre-treatment observations. SyncTwin issues counterfactual prediction of a target patient by constructing a synthetic twin that closely matches the target in representation. The reliability of the estimated treatment effect can be assessed by comparing the observed and synthetic pre-treatment outcomes. The medical experts can interpret the estimate by examining the most important contributing individuals to the synthetic twin.
In the real-data experiment, SyncTwin successfully reproduced the findings of a randomized controlled clinical trial using observational data, which demonstrates its usability in the complex real-world EHR.
Not all similarity classification methods are created to address a specific need, however. SimplEx, introduced below, is an example of a general approach that enables explanation for models that are not task- or problem-specific: in essence, it can be seen as a post-hoc explainability module that could be used as a plug-in for almost any machine learning model.
Explaining Latent Representations with a Corpus of Examples
Jonathan Crabbé, Zhaozhi Qian, Fergus Imrie, Mihaela van der Schaar
Modern machine learning models are complicated. Most of them rely on convoluted latent representations of their input to issue a prediction. To achieve greater transparency than a black-box that connects inputs to predictions, it is necessary to gain a deeper understanding of these latent representations.
To that aim, we propose SimplEx: a user-centred method that provides example-based explanations with reference to a freely selected set of examples, called the corpus. SimplEx uses the corpus to improve the user’s understanding of the latent space with post-hoc explanations answering two questions: (1) Which corpus examples explain the prediction issued for a given test example? (2) What features of these corpus examples are relevant for the model to relate them to the test example? SimplEx provides an answer by reconstructing the test latent representation as a mixture of corpus latent representations. Further, we propose a novel approach, the integrated Jacobian, that allows SimplEx to make explicit the contribution of each corpus feature in the mixture.
Through experiments on tasks ranging from mortality prediction to image classification, we demonstrate that these decompositions are robust and accurate. With illustrative use cases in medicine, we show that SimplEx empowers the user by highlighting relevant patterns in the corpus that explain model representations. Moreover, we demonstrate how the freedom in choosing the corpus allows the user to have personalized explanations in terms of examples that are meaningful for them.
In our discussions with clinicians, they generally felt that this type of interpretability has far more value to patients than feature importance (type 1). Patients generally find it easier to make a decision based on a prediction or recommendation when it is explained with reference to similarities or differences with patients like them.
Type 3 interpretability: unraveled rules and laws
With this type of interpretability, we seek to discover “rules” and “laws” learned by the machine model. These can be in the form of decision rules, or even “counterfactual” explanations in the form of “What if?” question-answer pairs that describe the smallest adjustment to the patient’s features that would change the model’s prediction to a predefined output. For example, a clinician could use this type of interpretability to establish the smallest difference in tumor size that would change the model’s prediction for a patient with cancer.
Our lab’s work at the forefront of research into this type of interpretability is in its early stages, but one particularly relevant recent paper can be found below.
Integrating Expert ODEs into Neural ODEs: Pharmacology and Disease Progression
Zhaozhi Qian, William R. Zame, Lucas M. Fleuren, Paul Elbers, Mihaela van der Schaar
Modeling a system’s temporal behaviour in reaction to external stimuli is a fundamental problem in many areas. Pure Machine Learning (ML) approaches often fail in the small sample regime and cannot provide actionable insights beyond predictions. A promising modification has been to incorporate expert domain knowledge into ML models.
The application we consider is predicting the progression of disease under medications, where a plethora of domain knowledge is available from pharmacology. Pharmacological models describe the dynamics of carefully-chosen medically meaningful variables in terms of systems of Ordinary Differential Equations (ODEs). However, these models only describe a limited collection of variables, and these variables are often not observable in clinical environments. To close this gap, we propose the latent hybridisation model (LHM) that integrates a system of expert-designed ODEs with machine-learned Neural ODEs to fully describe the dynamics of the system and to link the expert and latent variables to observable quantities.
We evaluated LHM on synthetic data as well as real-world intensive care data of COVID-19 patients. LHM consistently outperforms previous works, especially when few training samples are available such as at the beginning of the pandemic.
Type 4 interpretability: transparent risk equations
This approach to interpretability allows us to turn black box models into white boxes by generating transparent risk equations that describe the predictions made by machine learning models. Unlike regression models, this involves mapping non-linear interactions between different features. We can then discard the black box model, and rely on the transparent risk equation to issue predictions.
The bulk of our own work focusing on this type of interpretability has involved symbolic metamodeling frameworks for expressing black-box models in terms of transparent mathematical equations that can be easily understood and analyzed by human subjects. A symbolic metamodel is a model of a model—a surrogate model of a trained (machine learning) model expressed through a succinct symbolic expression that comprises familiar mathematical functions and can be subjected to symbolic manipulation. We first introduced symbolic metamodels in a paper published at NeurIPS 2019.
Demystifying Black-box Models with Symbolic Metamodels
Ahmed Alaa, Mihaela van der Schaar
Understanding the predictions of a machine learning model can be as crucial as the model’s accuracy in many application domains. However, the black-box nature of most highly-accurate (complex) models is a major hindrance to their interpretability.
To address this issue, we introduce the symbolic metamodeling framework — a general methodology for interpreting predictions by converting “black-box” models into “white-box” functions that are understandable to human subjects. A symbolic metamodel is a model of a model, i.e., a surrogate model of a trained (machine learning) model expressed through a succinct symbolic expression that comprises familiar mathematical functions and can be subjected to symbolic manipulation.
We parameterize symbolic metamodels using Meijer G-functions — a class of complex-valued contour integrals that depend on scalar parameters, and whose solutions reduce to familiar elementary, algebraic, analytic and closed-form functions for different parameter settings. This parameterization enables efficient optimization of metamodels via gradient descent, and allows discovering the functional forms learned by a machine learning model with minimal a priori assumptions.
We show that symbolic metamodeling provides an all-encompassing framework for model interpretation — all common forms of global and local explanations of a model can be analytically derived from its symbolic metamodel.
We built on our symbolic metamodeling work by developing Symbolic Pursuit, which was first introduced in a paper published at NeurIPS 2020. The Symbolic Pursuit algorithm benefits from the ability to produce parsimonious expressions that involve a small number of terms. Such interpretations permit easy understanding of the relative importance of features and feature interactions.
Learning outside the Black-Box: The pursuit of interpretable models
Jonathan Crabbé,, Yao Zhang, William Zame, Mihaela van der Schaar
Machine learning has proved its ability to produce accurate models — but the deployment of these models outside the machine learning community has been hindered by the difficulties of interpreting these models.
This paper proposes an algorithm that produces a continuous global interpretation of any given continuous black-box function. Our algorithm employs a variation of projection pursuit in which the ridge functions are chosen to be Meijer G-functions, rather than the usual polynomial splines. Because Meijer G-functions are differentiable in their parameters, we can “tune” the parameters of the representation by gradient descent; as a consequence, our algorithm is efficient.
Using five familiar data sets from the UCI repository and two familiar machine learning algorithms, we demonstrate that our algorithm produces global interpretations that are both faithful (highly accurate) and parsimonious (involve a small number of terms). Our interpretations permit easy understanding of the relative importance of features and feature interactions. Our interpretation algorithm represents a leap forward from the previous state of the art.
It should be noted that transparent risk equations can be applied to the other three types of interpretability listed above. Using patient features as inputs and risk as outputs, we can identify variable importance, classify similarities, discover variable interactions, and enable hypothesis induction.
Type 5 interpretability: concept-based explainability
Human beings tend to use high-level concepts to explain their decisions. The purpose of concept-based explainability is to extend this approach to neural networks. This type of explanation permits to analyse how the model relates high-level concepts defined by the user to its predictions. A typical example is an image classifier that identifies zebras through their stripes. In this example, “zebra” is the model’s prediction and “stripes” is a concept. Concepts can be defined arbitrarily by the user through relevant examples illustrating the concept.
We have developed an extension of the existing formalism for concept-based explainability, called Concept Activation Regions (CARs). This extension permits to relax stringent assumptions made by previous works, such as the linear separability of concept sets in the neural network’s representation space. We also illustrate the interest of concept-based explanations in a medical context by showing that neural networks implicitly rediscover medical concepts, such as the prostate cancer grading system.
Concept Activation Regions: A Generalized Framework For Concept-Based Explanations
Jonathan Crabbé, Mihaela van der Schaar
Concept-based explanations permit to understand the predictions of a deep neural network (DNN) through the lens of concepts specified by users. Existing methods assume that the examples illustrating a concept are mapped in a fixed direction of the DNN’s latent space. When this holds true, the concept can be represented by a concept activation vector (CAV) pointing in that direction.
In this work, we propose to relax this assumption by allowing concept examples to be scattered across different clusters in the DNN’s latent space. Each concept is then represented by a region of the DNN’s latent space that includes these clusters and that we call concept activation region (CAR). To formalize this idea, we introduce an extension of the CAV formalism that is based on the kernel trick and support vector classifiers. This CAR formalism yields global concept-based explanations and local concept-based feature importance. We prove that CAR explanations built with radial kernels are invariant under latent space isometries. In this way, CAR assigns the same explanations to latent spaces that have the same geometry.
We further demonstrate empirically that CARs offer (1) more accurate descriptions of how concepts are scattered in the DNN’s latent space; (2) global explanations that are closer to human concept annotations and (3) concept-based feature importance that meaningfully relate concepts with each other. Finally, we use CARs to show that DNNs can autonomously rediscover known scientific concepts, such as the prostate cancer grading system.
Robust and trustworthy interpretations
All the interpretability techniques described above are useful only if they are faithful to the model they explain. Indeed, failing in this basic criterion implies that the explanations could be inconsistent with the true model behaviour, hence leading to false insights about the model. For this reason, we believe that guaranteeing an alignment between interpretability methods and the model is just as important as the interpretability methods themselves.
In a work presented at NeurIPS 2023, we explore this faithfulness through the lens of model symmetries. In particular, we consider neural networks whose predictions are invariant under a specific symmetry group. This includes popular architectures, ranging from convolutional to graph neural networks. Any explanation that faithfully explains this type of model needs to be in agreement with this invariance property. We formalise this intuition through the notion of explanation invariance and equivariance by leveraging the formalism from geometric deep learning.
Through this rigorous formalism, we derive (1) two metrics to measure the robustness of any interpretability method with respect to the model symmetry group; (2) theoretical robustness guarantees for some popular interpretability methods and (3) a systematic approach to increase the invariance of any interpretability method with respect to a symmetry group. By empirically measuring our metrics for explanations of models associated with various modalities and symmetry groups, we derive a set of 5 guidelines that we present in-depth to allow users and developers of interpretability methods to produce robust explanations.
Evaluating the Robustness of Interpretability Methods through Explanation Invariance and Equivariance
Jonathan Crabbé, Mihaela van der Schaar
Interpretability methods are valuable only if their explanations faithfully describe the explained model. In this work, we consider neural networks whose predictions are invariant under a specific symmetry group. This includes popular architectures, ranging from convolutional to graph neural networks.
Any explanation that faithfully explains this type of model needs to be in agreement with this invariance property. We formalize this intuition through the notion of explanation invariance and equivariance by leveraging the formalism from geometric deep learning. Through this rigorous formalism, we derive (1) two metrics to measure the robustness of any interpretability method with respect to the model symmetry group; (2) theoretical robustness guarantees for some popular interpretability methods and (3) a systematic approach to increase the invariance of any interpretability method with respect to a symmetry group.
By empirically measuring our metrics for explanations of models associated with various modalities and symmetry groups, we derive a set of 5 guidelines to allow users and developers of interpretability methods to produce robust explanations.
Peering into the ultimate black box
The bulk of this page has been dedicated to exploring what it means to make machine learning models “interpretable,” and showing how this can be done in a variety of ways. In our view, this is still premised on a relatively blinkered view that ignores some very exciting possibilities for interpretability and machine learning—namely, for humans to use interpretability to understand our own decision-making process.
This possibility is at the heart of quantitative epistemology, a new and transformationally significant research pillar pioneered by our lab. The purpose of this research is to develop a strand of machine learning aimed at understanding, supporting, and improving human decision-making. We aim to do so by building machine learning models of decision-making, including how humans acquire and learn from new information, establish and update their beliefs, and act on the basis of their cumulative knowledge. Because our approach is driven by observational data in studying knowledge as well as using machine learning methods for supporting and improving knowledge acquisition and its impact on decision-making, we call this “quantitative epistemology.”
We develop machine learning models that capture how humans acquire new information, how they pay attention to such information, how their beliefs may be represented, how their internal models may be structured, how these different levels of knowledge are leveraged in the form of actions, and how such knowledge is learned and updated over time. Our methods are aimed at studying human decision-making, identifying potential suboptimalities in beliefs and decision processes (such as cognitive biases, selective attention, imperfect retention of past experience, etc.), and understanding risk attitudes and their implications for learning and decision-making. This would allow us to construct decision support systems that provide humans with information pertinent to their intended actions, their possible alternatives and counterfactual outcomes, as well as other evidence to empower better decision-making.
You can learn more about quantitative epistemology and explore some of our first papers in this area in the article below.
Find out more and get involved
Interpretability is one of the van der Schaar Lab’s core research pillars, and we are constantly pushing forward our understanding of the area. We have produced a great deal of content on the topic, some of which has been shared below.
Codebase for Interpretability
We have gathered relevant code from our lab and beyond into an Interpretability Suite. The GitHub repository for this can be viewed here. The front page of the GitHub provides information about when a user may want to apply each method and the repository itself contains an interface to help users implement the a few of the methods. A talk introducing this suite of Interpretability methods can be viewed at the bottom of this page.
Lecture on interpretability at The Alan Turing Institute and related blog post
A Turing Lecture (delivered March 11, 2020) introducing a number of cutting edge approaches our lab have developed to turn machine learning’s opaque black boxes into transparent and understandable white boxes. A written companion piece from April 2020 can also be found below.
Roundtables on interpretability with clinicians
In March and April, 2021, our lab held two roundtables in which we discussed the topic of interpretability with clinicians.
In our first session, we aimed to have a relatively high-level conversation about different definitions and types of interpretability, whereas the second session focused more on how interpretability can help build trust in machine learning models and benefit healthcare stakeholders. Underlying both of these were two recurring questions: to what degree can interpretable machine learning really benefit healthcare stakeholders, and will it provide the key to acceptance of machine learning technologies?
Both roundtables yielded spirited discussions and remarkable insights that could genuinely change the way we design machine learning models for clinical applications. They can be viewed below.
Rob Davis on the ML Interpretability Suite
This is a quick intro to our Interpretability Suite by Rob Davis, research engineer at CCAIM. It discusses why ML interpretability is so important and shows the array of different methods developed by the van der Schaar Lab and CCAIM that are available on the van der Schaar lab GitHub.
Click here for the Interpretability Suite
Click here for the SimplEx Demonstrator
Our engagement sessions
We encourage you to stay abreast of ongoing developments in this and other areas of machine learning for healthcare by signing up to take part in one of our two streams of online engagement sessions.
If you are a practicing clinician, please sign up for Revolutionizing Healthcare, which is a forum for members of the clinical community to share ideas and discuss topics that will define the future of machine learning in healthcare (no machine learning experience required).
If you are a machine learning student, you can join our Inspiration Exchange engagement sessions, in which we introduce and discuss new ideas and development of new methods, approaches, and techniques in machine learning for healthcare.
A full list of our papers on interpretability and related topics can be found here.