
Machine learning is capable of enabling truly personalized healthcare; this is what our lab calls “bespoke medicine.”
More info on bespoke medicine can be found here.
Bespoke medicine entails far more than providing predictions for individual patients: we also need to understand the effect of specific treatments on specific patients at specific times. This is what we call individualized treatment effect inference. It is a substantially more complex undertaking than prediction, and every bit as important.
Our lab has built a position of leadership in this area. We have defined the research agenda by outlining and addressing key complexities and challenges, and by laying the theoretical groundwork for model development. In our development of algorithms, we have identified and targeted an extensive range of potential clinical applications using both clinical trials and observational data as inputs.
The page below provides an introduction to individualized treatment effect inference, as well as an overview of some key projects that have driven the entire research area forward.
- Individualized treatment effect inference: a brief introduction
- Treatment effects: from the average to the individual
- Why is individualized treatment effect inference so complicated?
- Estimating response surfaces
- Including treatment effects in outcome models and handling bias
- Selecting optimal models for individualized treatment effect inference
- Individualized treatment effect estimation using time-series data
- ML-assisted clinical trials
- Learn more and get involved
- Our work so far
- Continuous-Time Modeling of Counterfactual Outcomes Using Neural Controlled Differential Equations
- Identifiable Energy-based Representations: An Application to Estimating Heterogeneous Causal Effects
- SyncTwin: Treatment Effect Estimation with Longitudinal Outcomes
- Estimating Multi-cause Treatment Effects via Single-cause Perturbation
- Nonparametric Estimation of Heterogeneous Treatment Effects: From Theory to Learning Algorithms
- On Inductive Biases for Heterogeneous Treatment Effect Estimation
- SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data
- Really Doing Great at Estimating CATE? A Critical Look at ML Benchmarking Practices in Treatment Effect Estimation
- Benchmarking Heterogeneous Treatment Effect Models through the Lens of Interpretability
- Policy Analysis using Synthetic Controls in Continuous-Time
This page is one of several introductions to areas that we see as “research pillars” for our lab. It is a living document, and the content here will evolve as we continue to reach out to the machine learning and healthcare communities, building a shared vision for the future of healthcare.
Our primary means of building this shared vision is through two groups of online engagement sessions: Inspiration Exchange (for machine learning students) and Revolutionizing Healthcare (for the healthcare community). If you would like to get involved, please visit the page below.
This page is authored and maintained by Mihaela van der Schaar and Nick Maxfield.
Individualized treatment effect inference: a brief introduction
This page introduces individualized treatment effect inference — which we could also refer to as causal inference of individualized treatment effects — as one of our lab’s key research areas, and offers an overview of a range of relevant projects we have undertaken.
The broader area of “causal inference” in machine learning can be broken down into two sub-fields: (i) causal discovery and (ii) individualized treatment effect inference. While (i) is concerned with discovering which variables affect another in what direction, (ii) is concerned with quantifying the association between variables that by (i) are related by estimating the effect of one (or more) variables on another. Here we focus exclusively on (ii).
In creating this page, we aim to raise and discuss issues related to both the static (cross‐sectional) setting and the longitudinal setting (where patient history and treatment timing are taken into account). We describe the challenges associated with learning from observational data, such as confounding bias, as well as the modeling choices used by machine learning methods to handle them in both settings.
Our lab is also deeply interested in what we call “causal machine learning,” a related but distinct area where the focus is on using causal graphs to improve the robustness of machine learning for prediction, domain adaptation, transfer learning, and more. For an example of our work in this area, please take a look at CASTLE, a NeurIPS 2020 paper.
Causal machine learning will also form the basis of a new piece of content in the future.
Treatment effects: from the average to the individual
A major challenge in the domain of healthcare is ascertaining whether a given treatment influences or determines an outcome—for instance, whether there is a survival benefit to prescribing a certain medication, such as the ability of a statin to lower the risk of cardiovascular disease.
Current treatment guidelines have been developed with the “average” patient in mind (on the basis of randomized control trials), but there is ample evidence that different treatments result in different effects and outcomes from one individual to another: for any given treatment, it is quite likely that only a small proportion of people will actually respond in a manner that resembles the “average” patient.

Since the advent of precision medicine and the availability of large amounts of observational data from electronic health records, the research community has started to explore more quantitative individual-level problems, such as the magnitude of the effect of a treatment on a condition for an individual (one example might be the survival benefit of weight loss for a 60-year-old cardiovascular patient with diabetes). Rather than making treatment decisions based on blanket assumptions about “average” patients, the goal of clinical decision-makers is now to determine the optimal treatment course for any given patient at any given time. Methods for doing so in a quantitative fashion based on insights from machine learning are in the formative stages of development (our lab’s work in this area will be covered below).
There are two ways to determine whether a treatment works: observational datasets, and post-hoc analysis of clinical trials. Each method has its own strengths and weaknesses.

Observational datasets
At the moment, doctors can learn from experience and time which treatments work for each individual, but there is no mechanism for sharing this knowledge on a population level in a way that allows the extraction of valuable insights on treatment effects. The increasing availability of observational data has, however, encouraged the development of various machine learning algorithms tailored for inferring treatment effects. It is worth noting, however, that observation datasets are prone to treatment assignment bias, as explained in more detail later on.
Clinical trials
Randomized Controlled Trials (RCTs) are the gold standard for comparing the effectiveness of a new treatment to the current one. Clinical trials may, however, not always be the most practical option for evaluating certain treatments, since they are costly and time-consuming to implement, and they do not always recruit representative patients.
This makes external validity an issue for RCTs, as findings sometimes fail to generalize beyond the study population. This may be due to the narrow inclusion criteria in RCTs compared with the real world, where historically, population restrictions with respect to disease severity, comorbidities, elderly patients, and ethnic minorities can be under‐represented. By contrast, when drugs are US Food and Drug Administration (FDA)‐approved after the clinical trials stage, they start being administered to a much larger and varied population of patients.
Although there is increasing awareness of this issue and global regulatory authorities are encouraging wider inclusion criteria in clinical trials, it remains an issue that is unlikely to be solved by RCTs and associated integrated and model‐based analyses alone. There is scope to add an adaptive element to clinical trials through the use of machine learning.
To summarize the above: our goal is to support a shift from a focus on average treatment effects to individualized treatment effects by optimizing the use of observational datasets and clinical trial design. Estimating individualized treatment effects from EHR data represents a thriving area of research, in which machine learning methods are primed to take center stage.
Clinical example:
Breast cancer treatment outcomes
When deciding on a treatment for a given form of cancer, clinical decisions are often made on the basis of results from randomized controlled trials of treatments involving that cancer.
As explained above, this approach assumes a response to treatment based on the response of the “average patient,” rather than taking into account the health history and specific features of the individual.

The figure above shows a range of recommendations for a specific cancer patient. On the left-hand side, we see the risk of recurrence within one year for several treatment options based on population-level data. Based on these recommendations, the optimal treatment choice would be a combination of chemotherapy and radiotherapy.
This stands in contrast to the chart in the middle, which is an individualized recommendation based on observational data using a range of machine learning techniques (many of which are outlined later on this page). This individualized recommendation shows that chemotherapy (without radiotherapy) would in fact yield the lowest likelihood of recurrence within one year.
The figure above is an example taken from a live demonstrator system based on breast cancer, fed by anonymized real-world data. More details on this project are available in the video below, taken from a presentation given by Mihaela van der Schaar at the Royal College of Physicians in 2019.
Clinical example:
LVAD implantation
The implantation of left ventricular assist devices (LVADs) in many ways demonstrates the difficulties and pitfalls that are commonplace in medical decision-making, and the importance of being able to estimate individualized treatment effects.
LVADs can serve as a “stopgap” measure for patients on heart transplant waiting lists, but the procedure is costly and invasive. On top of this, there is substantial evidence challenging the conventional assumption that all patients with LVADs will benefit from them equally: in fact, it is clear that both the outcome and the optimal timing of the LVAD implantation vary extensively from individual to individual.
There is a clear benefit, therefore, to being able to learn the individualized survival benefits of LVADs for cardiac patients waiting for a heart. Clinical trials are not well-suited to this purpose: they are expensive, rely on small data samples, and focus on short-term outcomes. In fact, in the case of LVADs it might not even be possible to conduct clinical trials, making learning survival benefits from observational data the only viable option.
This is what our lab did in 2017, in a study featured at NeurIPS entitled “Bayesian Inference of Individualized Treatment Effects using Multi-task Gaussian Processes.” We proposed learning from observational datasets as an alternative to conducting clinical trials using a multi-task learning framework, and demonstrated the effectiveness of this approach.
With regard to LVAD implantation, we were able to use causal multi-task gaussian processes (CGMP) to identify patients who had received LVADs at a time that was suboptimal in terms of survivability. In a number of such cases, patients died while awaiting heart transplants, with a substantial likelihood that this could have been prevented through optimal timing of LVAD implantation.

In other cases, patients died without having received LVADs in the first place, whereas analysis using CGMP suggested a high likelihood of survival until transplant availability in the event of timely LVAD implantation.
LVAD implantation is just one example of how moving beyond assumptions regarding “average” treatment benefits and instead estimating individual effects using machine learning can lead to personalized survival predictions and a finer priority scheme.
Why is individualized treatment effect inference so complicated?
Our goal is to use machine learning to estimate the effect of a treatment on an individual using static or time-series observational data.

The problem of estimating individual-level causal effects is usually formulated within the classical potential outcomes framework, first introduced by Neyman in 1923 and subsequently expanded by Rubin into a broader causal model. The framework is based on observational data consisting of patient features, treatment assignment and outcome.

Note:
Binary treatment options vs. multiple treatment options
The discussion here is, for the purpose of simplicity, focused on binary treatments (untreated = 0, treated = 1).
It is worth bearing in mind, however, that in application the framework and associated methods for individualized treatment effect inference can be extended to any number of potential treatments.
The apparent simplicity of this framework belies the true complexity of the problem of individualized treatment effect inference; we believe there are three key reasons for this:
– we must work in the absence of counterfactual outcomes;
– bias in observational datasets must be addressed; and
– there is no single preferred way to include treatment indicators in outcome models.
Furthermore, little work has been done to develop a comprehensive theory for individualized treatment effect inference, including principles for optimal model selection.
Overcoming these challenges will require not just methodological advances but also new ways of thinking. In the sections below, we will provide an explanation of each of these issues, while highlighting some of the ways in which our lab’s projects have made progress towards their resolution.
Note:
Assumptions for individualized treatment effect inference
While this topic is not discussed in depth here, it is worth noting that performing Individualized treatment effect inference requires us to make 2 assumptions: 1) overlap and 2) lack of hidden/unmeasured confounders.

For reference, our most recent work on estimating treatment effects over time in the presence of hidden confounders was presented at ICML 2020 (related paper here).
Estimating response surfaces
In the potential outcomes framework outlined above, every subject (individual) in the observational dataset possesses a number of potential outcomes: the subject’s outcome under the application of various treatments, and the subject’s outcome when no treatment is applied. The treatment effect is the difference between the two potential outcomes, but since we only observe the “factual” outcome for a specific treatment assignment, and never observe the corresponding “counterfactual” outcome, we never observe any examples of the true treatment effect in an observational dataset. This is what makes the problem of individualized treatment effect inference fundamentally different from standard supervised learning (regression).

It is important, therefore, to understand from the outset that any method to estimate individualized treatment effects is limited to using the data available at hand, which is entirely composed of factuals, and not counterfactuals. For instance, note that the figure above shows us the factual outcome, but not the counterfactual.
The majority of existing methods for estimating individualized treatment effects from observational data focus on the binary or categorical treatment settings and very few methods consider more complex treatment scenarios. However, it is often the case that treatments have an associated dosage which requires us to estimate the causal effects of continuous-valued interventions.
Additionally, for organ transplantation, it is necessary to estimate the effect of high dimensional, and potentially unique, organs on the patient’s survival such that we assign the organ to the patient that would have the highest survival benefit. Our lab has done work to handle these more complex treatment scenarios in two recent papers published at NeurIPS 2020 on individualized dose-response estimation (SCIGAN) and estimating the individualized effect of transplant-organs (high-dimensional treatments) on patients’ survival (OrganITE).
Research focus:
Using GANs to compensate for the absence of counterfactual outcomes
While numerous approaches to individualized treatment effect estimation have produced strong results (including many developed by our own lab) in the absence of counterfactuals, it is also possible to employ generative adversarial networks (GANs) to attempt to account for these unseen counterfactual outcomes.
This was the focus of our work on GANITE, a method first outlined in a 2018 ICLR paper.
The defining feature of the GAN framework is the existence of a generator and discriminator, trained in an adversarial fashion against each other. The generator tries to generate synthetic samples that the discriminator is incapable of distinguishing from the real samples, while the discriminator tries to identify which of the samples are the synthetic ones. This framework can be formulated as a minimax game and at the optimal point of this game, generated samples follow the real data distribution.
As a result, the GAN framework provides a powerful platform for inference based on the factual data while allowing us to capture the uncertainty in the counterfactual distributions by attempting to learn them. GANITE consists of two blocks: a counterfactual imputation block and an individualized treatment effect block, each of which consists of a generator and a discriminator. We view the factual outcome as an observed label and consider the counterfactual outcomes to be missing labels; the counterfactual generator of GANITE attempts to generate counterfactual outcomes in such a way that when given the combined vector of factual and generated counterfactual outcomes the discriminator of GANITE cannot determine which of the components is the factual outcome. (After all, if the generated counterfactuals follow the underlying distribution, it should not be possible to discriminate the real outcome from the generated outcomes.)
With the complete labels (combined factual and estimated counterfactual outcomes), the individualized treatment effect estimation function can then be trained for inferring the potential outcomes of the individual based on the feature information in a supervised way. By also modelling this individualized treatment effect estimation function using a GAN framework, we are able not only to predict the expected outcomes but also quantify the uncertainty in the predictions, which is particularly important in the medical setting.
Unlike many other state-of-the-art methods, GANITE naturally extends to – and in fact is defined in the first place for – any number of treatments.
An additional feature of GANITE is its relative robustness to treatment assignment bias; this is addressed in more detail in the full ICLR 2018 paper below.
Including treatment effects in outcome models and handling bias
When modeling individualized treatment effects, we face further issues related to handling treatment bias in observational datasets, and a multitude of choices regarding approaches to handling treatment indicators when estimating patient outcomes.
The former challenge results from the fact that, when estimating individualized treatment effects, assignment bias creates a discrepancy in the feature distributions for treated and control patient groups. Simply put: decision-making by doctors introduces bias into the data.
Modeling the treatment assignment, and its impact on the outcome, is a similarly complex proposition: several approaches exist, with the simplest being to split data into separate models (treated and untreated), or to use the assignment variable as a feature to augment the feature dimension.

A third solution, which has been adopted in a number of papers by our own lab’s researchers, is to learn shared representations, where the treatment assignment indexes these shared representations. This enables us to learn jointly across the treated and untreated populations.
Research focus:
Building shared representations with non-stationary Gaussian processes
In an ICML 2018 paper, entitled “Limits of Estimating Heterogeneous Treatment Effects: Guidelines for Practical Algorithm Design,” we provided a characterization of the fundamental limits of estimating heterogeneous treatment effects, and established conditions under which these limits can be achieved.
Our analysis revealed that the relative importance of the different aspects of observational data vary with the sample size. For instance, we showed that assignment bias matters only in small-sample regimes, whereas with a large sample size, the way an algorithm models the control and treated outcomes is what bottlenecks its performance. Guided by our analysis, we built a practical algorithm for estimating treatment effects using non-stationary Gaussian processes (NSGP) with doubly-robust hyperparameters.
By employing NSGPs and using a Gaussian process prior across the two response surfaces, we are modeling these two response surfaces jointly. The advantage of doing this is that now we have a shared representation that is learned effectively in the small sample regime. Also, Gaussian processes are very flexible, and they enable us to learn the parameters that are important (so we are able to learn different sparsity and also adapt the model to the different smoothness of the response surfaces).
Using a standard semi-synthetic simulation setup, we demonstrated that our algorithm outperforms the state-of-the-art, and that the behavior of existing algorithms conforms with our analysis.
Many methods have been developed that take one of the three approaches above to modeling treatments, and they also handle bias differently. For all of this research, however, there still remains a great deal of work to be done in developing a comprehensive theory (i.e. a principled guideline for building estimators of treatment effects using machine learning algorithms). This makes it very difficult to verify the types of algorithms we should be developing, or how to deal with the two problems of modeling treatments and handling bias.
This is why, a few years ago, our lab developed the first theory for individualized treatment effect inference. To do this, we first tried to develop a theoretical understanding of the limits of this problem. Then, guided by this, we sought to identify unique principles that can guide the development of algorithms.

Research focus:
Under the hood of our comprehensive theory for individualized treatment effect inference
In a 2018 paper entitled “Bayesian Nonparametric Causal Inference: Information Rates and Learning Algorithms,” we addressed the individualized causal effect estimation problem on the basis of the Neyman-Rubin potential outcomes model, and established the fundamental limits on the amount of information that a learning algorithm can gather about the causal effect of an intervention given an observational data sample. We also provided guidelines for building proper individualized treatment effect inference models that “do not leave any information on the table” because of poor modeling choices.
We set this into a non-parametric Bayesian estimation framework, where we put a prior over the two response surfaces: treated and untreated. And then we computed point estimates induced by the Bayesian posterior on the basis of the data that we have available. We computed the errors associated with the different estimation problems by determining the precision of estimating heterogeneous effects (PEHE) to estimate the efficiency of these different types of individualized treatment effects models.
We characterized the optimal information rate that can be achieved by any learning procedure, and showed that it depends on the dimensionality of the feature space, and the smoothness of the “rougher” of the two potential outcomes.
We also used the conclusions drawn from our analysis and designed a practical Bayesian causal inference algorithm with a multi-task Gaussian process, and showed that it significantly outperforms the state-of-the-art models through experiments conducted on a standard semisynthetic dataset.
The theory we developed guides our model design in two ways: in the small sample regime, we need to have methods that are effectively handling assignment bias, and are hence able to share the training data effectively between the response surfaces. In the large sample regime, we need models that are able to flexibly learn from the available data and do hyperparameter tuning effectively.

We continue to push the boundaries of our understanding of different strategies for treatment effect estimation. More recently, we investigated the strengths and weaknesses of a number of so-called meta-learners (model-agnostic learning strategies) both theoretically and empirically, providing further guidance towards principled algorithm design. Our recent paper on this topic was accepted for publication at AISTATS 2021, and can be found here.
A firm theoretical foundation for individualized treatment effect inference will make it possible to carry out reliable estimation of individualized treatment effects. Such reliable estimation will have obvious implications for the treatment of patients, but it will also have less-obvious implications for clinical trials. The first is that it will enable more reliable post-hoc analysis (such as understanding which groups of patients benefit least or most from the trial treatment). The second is that it may better inform the process of sequentially recruiting patients into clinical trials, thereby enabling better design, both in terms of maximizing overall statistical power and in terms of maximizing the information learned for patients with specific covariates.
Research focus:
Estimating the effects of continuous interventions from observational data
It is highly common for treatment decisions to involve not only determining which intervention to make (e.g. whether to treat cancer with radiotherapy, chemotherapy or surgery) but also determining the value of some continuous parameter associated with intervening (e.g. the dosage of radiotherapy to be administered).
Despite this, relatively little work has been done in the setting of continuous-valued interventions, while much attention has been given to the problem of estimating the effect of discrete interventions from observational data.
Since continuous interventions arise in many practical scenarios, the impact of this problem in the healthcare setting is clear: being able to better estimate individual responses to dosages would help us select treatments that result in improved patient outcomes. Moreover, clinicians and patients will often need to consider several different outcomes (such as potential side effects); better estimates of such outcomes allow the patients to make a more informed decision that is suitable for them.
In a paper accepted for publication at NeurIPS 2020, entitled “Estimating the Effects of Continuous-valued Interventions using Generative Adversarial Network,” we proposed a novel framework called SCIGAN for estimating response curves for continuous interventions from observational data. As the name suggests, SCIGAN is built on a modified GAN framework (for an introduction to GANs, see the related section above)
To address the challenges presented by shifting to continuous interventions, we proposed a novel architecture for our discriminator: a hierarchical discriminator that leverages the structure of the continuous intervention setting. Our approach is very flexible, and can simultaneously estimate counterfactual outcomes for several different continuous interventions.
Our proposed model represents an important step forward. Nevertheless, this work is on the theoretical side and significant testing, potentially through clinical trials, will be needed before such methods can be used in practice.
Selecting optimal models for individualized treatment effect inference
In the sections above, we have introduced the challenges inherent in developing approaches to individualized treatment effect inference; we have explained how these challenges can be compensated for or handled, and have outlined a theory for building effective models.
This still leaves us, however, with a further challenge in implementation: a wide variety of models to choose from, and a potentially limitless array of application types and datasets. Choosing “one best model” is impossible, since no single method will ever outperform all others across all datasets, so the challenge becomes selecting the best-performing model for each particular task and dataset. This is further complicated by the fact that we lack access to the counterfactuals and we cannot compute ground truth individualized treatment effects estimates to evaluate the model’s predictions against. This is in contrast to predictive models, where one can use the mean squared error between the model’s predictions and the ground truth label.
The answer to this problem is to use automated machine learning (AutoML) to compare models and select the best model for the task at hand. In experiments applying our own AutoML framework for individualized treatment effect inference (details of which are provided in the box below), we found that the best model selected by the framework tended to significantly outperform other commonly-used methods. This is shown below in a comparison of the performance of methods published at ICML, NeurIPS and ICLR conferences from 2016 to 2018 on 77 datasets.

Research focus:
Automated causal inference using influence functions
In an ICML 2019 paper, entitled “Validating Causal Inference Models via Influence Functions,” our lab introduced a first-of-its-kind validation procedure for estimating the performance of causal inference methods using influence functions (IFs)—the functional derivatives of a loss function.
The procedure we introduced utilizes a Taylor-like expansion to approximate the loss function of a method on a given dataset in terms of the influence functions of its loss on a “synthesized”, proximal dataset with known causal effects.
This automated and data-driven approach to model selection enables confident deployment of (black-box) machine learning-based methods, and safeguards against naïve modeling choices.
Using AutoML enables practitioners such as epidemiologists and applied statisticians to use our validation procedure to select the best model for the observational study at hand.
Moreover, it is often the case that the observational data used to train a treatments effect model may come from a setting where the distribution of patient features is different from the one in the deployment (target) environment, for example, when transferring models across hospitals or countries. Because of this, it is important to be able to also select models that are robust to these covariate shifts across disparate patient populations.
In a recent paper from our lab, we propose leveraging the invariance of causal structures across domains to introduce a novel model selection metric specifically designed for treatment effects models under the unsupervised domain adaptation setting. Experimentally, our method selects treatment effects models that are more robust to covariate shifts on several synthetic and real healthcare datasets, including on estimating the effect of ventilation in COVID-19 patients from different geographic locations.
Individualized treatment effect estimation using time-series data
While the majority of previous work focuses on the effects of interventions at a single point in time, observational data also capture information on complex time-dependent treatment scenarios, such as where the efficacy of treatments changes over time (for example, drug resistance in cancer patients), or where patients receive multiple interventions administered at different points in time (such as joint prescriptions of chemotherapy and radiotherapy).
Estimating the effects of treatments over time therefore presents unique opportunities, such as understanding how diseases evolve under different treatment plans, how individual patients respond to medication over time, and which timings may be optimal for assigning treatments, thus providing new tools to improve clinical decision support systems.

Electronic health records provide a rich source of data for machine learning methods to learn dynamic treatment responses over time. These records, collected over time as part of regular follow-ups, provide a more cost-effective method to gather insights on the effectiveness of past treatment regimens.
Estimating counterfactual patient outcomes over time is challenging due to the presence of time-dependent confounders in observational datasets. Time-dependent confounders are patient covariates that affect the treatment assignments and are themselves affected by past treatments.
For instance, imagine a patient is given treatment A when a certain covariate (let’s say, white blood cell count) has been outside of normal range values for a while. Now, also imagine that the white blood cell count was itself affected by the past administration of a different treatment, treatment B. If this patient is more likely to die, without adjusting for the time-dependent confounding (e.g. the changes in the white blood cell count over time), we could incorrectly conclude that treatment A is harmful to patients.
To make this even more challenging, estimating the effect of a different sequence of treatments on the patient would require not only adjusting for the bias at the current step (in treatment A), but also for the bias introduced by the previous application of treatment B.
Using standard supervised learning methods to estimate these treatment effects will be biased by the treatment assignment policy present in the observational dataset and will not be able to generalize well to changes in the treatment policy in order to generate counterfactuals.
Research focus:
Two approaches to handling time-depending confounders
Existing methods for individualized treatment effect inference in the static setting cannot be applied in the longitudinal setting since they are designed to handle the cross-sectional set-up, where the treatment and outcome depend only on a static value of the patient covariates. By contrast, any direct estimation of individualized treatment effects using time-series observational data is hampered by the presence of time-dependent confounders (as mentioned directly above), where actions taken are dependent on time-varying variables related to the outcome of interest.
Models developed to estimate treatment effects based on static data would not be able to model how the changes in patient covariates over time affect the assignment of treatments, and would also be unable to estimate the effect of a sequence of treatments on the patient outcome. Different models that can handle these temporal dependencies in the observational data and varying-length patient histories are, therefore, needed for estimating treatment effects over time
Approach 1: Recurrent Marginal Structural Network
In a NeurIPS 2018 paper, entitled “Forecasting Treatment Responses Over Time Using Recurrent Marginal Structural Networks,” we proposed a new deep learning model, which we refer to as Recurrent Marginal Structural Networks (RMSN). Drawing inspiration from marginal structural models, a class of methods in epidemiology which use propensity weighting to adjust for time dependent confounders, the RMSN adopts a sequence to-sequence architecture to directly learn time-dependent treatment responses from observational data.
We used two sets of deep neural networks to build our RMSN: 1) a set propensity networks to compute treatment probabilities used for IPTW, and 2) a prediction network used to determine the treatment response for a given set of planned interventions.
Using simulations of a state-of-the-art pharmacokinetic pharmacodynamic (PK-PD) model of tumor growth, we demonstrated the ability of our network to accurately learn unbiased treatment responses from observational data – even under changes in the policy of treatment assignments – and performance gains over benchmarks.
Approach 2: counterfactual recurrent network
In an ICLR 2020 paper, entitled “Estimating counterfactual treatment outcomes over time through adversarially balanced representations,” we introduced the Counterfactual Recurrent Network (CRN), a novel sequence-to-sequence model that leverages the increasing availability of patient observational data, as well as recent advances in representation learning and domain adversarial training, to estimate treatment effects over time.
To handle the bias from time varying confounders, CRN uses domain adversarial training to build balancing representations of the patient history. At each timestep, CRN constructs a treatment invariant representation which removes the association between patient history and treatment assignments and thus can be reliably used for making counterfactual predictions.
Using a model of tumor growth, we validated CRN in realistic medical scenarios, demonstrating that, when compared with existing state-of-the-art methods, CRN achieves lower error in estimating counterfactuals and in choosing the correct treatment and timing of treatment.
The ability to accurately estimate treatment effects over time using machine learning allows clinicians to determine, in a manner tailored to each individual patient, both the treatments to prescribe and the optimal time at which to administer them, given their observational history.
Both new methods and theory are necessary to be able to harness the full potential of observational data for learning individualized effects of complex treatment scenarios. Further work in this direction is needed for proposing alternative methods for handling time-dependent confounders, for modelling combinations of treatments assigned over time or for estimating the individualized effects of time-dependent treatments with associated dosage.
ML-assisted clinical trials
Understanding treatment effects can play an important role in the post-hoc analysis of clinical trials into interventions and treatments, as well as influencing the design of more effective clinical trials.
The implementation of clinical trials is a setting in which the relevant population is diverse, and different parts of the population display different reactions to treatment. In such settings, heterogeneous treatment effect (HTE) analysis, also called subgroup analysis, is used to find subgroups consisting of subjects who have similar covariates and display similar treatment responses. The identification of subgroups improves the interpretation of treatment effects across the entire population, and makes it possible to develop more effective interventions and treatments and to improve the design of further experiments. In clinical trials, HTE analysis can identify subgroups of the population for which the studied treatment is effective, even when it is found to be ineffective for the population in general.
Clinical example:
Machine learning for clinical trials in the era of COVID-19
The COVID-19 pandemic has presented enormous challenge to clinical trials in particular, given the need for expedited development, approval, and distribution.
In a 2020 paper co-authored with some of our collaborators, published in Statistics in Biopharmaceutical Research, we identified ways in which machine learning can respond to the challenges inherent in clinical trials of COVID-19 treatments and vaccines.
We identified three key areas for support: ongoing clinical trials for non-COVID-19 drugs; clinical trials for repurposing drugs to treat COVID-19, and clinical trials for new drugs to treat COVID-19. Many of the research projects outlined above feature in the paper.
Research focus:
Robust Recursive Partitioning (R2P): a method to support adaptive clinical trial design
To identify subjects who have similar covariates and display similar treatment responses, it is necessary to create reliable estimates of the treatment responses of individual subjects; i.e. of individualized treatment effects.
Most of the current methods for HTE analysis begin with a particular algorithm for estimating individualized treatment effects, and identify subgroups by maximizing the differences across subgroups of the average treatment effect in each subgroup, under the assumption that treatment effects are homogeneous within subgroups. These approaches have several weaknesses: they rely on a particular algorithm for estimating treatment effects, they ignore (in)homogeneity within identified subgroups, and they do not produce good confidence estimates.
In a 2020 NeurIPS paper, entitled “Robust Recursive Partitioning for Heterogeneous Treatment Effects with Uncertainty Quantification,” we introduced Robust Recursive Partitioning (R2P), a new method for subgroup analysis, that addresses all these weaknesses. R2P recursively partitions the entire population by taking into account both heterogeneity across subgroups and homogeneity within subgroups, using the novel criterion of confident homogeneity that is based on the quantification of uncertainty of the individualized treatment effect estimation.
Experiments using synthetic and semi synthetic datasets (based on real data) have demonstrated that R2P constructs partitions that are simultaneously more homogeneous within groups and more heterogeneous across groups than the partitions produced by other methods. Moreover, because R2P can employ any individualized treatment effect estimator, it also produces much narrower confidence intervals with a prescribed coverage guarantee than other methods.
Experiments using synthetic and semi-synthetic datasets (the latter based on real data) demonstrate that R2P outperforms state-of-the-art baseline algorithms in every dimension: greater heterogeneity across subgroups, greater homogeneity within subgroups, and narrower confidence intervals.
An additional strength of R2P is that it can employ any method for interpretable individualized treatment effect estimation, including improved methods that will undoubtedly be developed in the future.
We have created an additional page for next-generation clinical trials as a key research pillar for our lab, have a look here if you’d like to learn more.
Learn more and get involved
This page has served as an introduction to individualized treatment effect inference—from the perspective of both healthcare and machine learning.
We have demonstrated the importance of estimating individualized treatment effects in enabling “bespoke medicine” and truly moving beyond one-size-fits-all approaches. In particular, there is great potential to influence and improve the design of clinical trials, and to make effective use of observational data even in the absence of clinical trials. There are further applications to explore, such as modeling individualized treatment effects for organ transplants (as most recently highlighted in a paper accepted for presentation at NeurIPS 2020).
We have also outlined the numerous intricacies and challenges that have complicated the development of machine learning methods and techniques for individualized treatment effect inference, not only due to the lack of counterfactuals, but also due to the lack of a governing theory, the ubiquity of bias in observational data, the choice between several options for modeling treatments, and the difficulty of adapting from static to dynamic datasets. We have also summarized our own lab’s projects seeking to address these challenges.
If you would like to learn more about this topic, we would recommend reading a somewhat more detailed (but still accessible) overview of our work on individualized treatment effect inference, entitled “From Real‐World Patient Data to Individualized Treatment Effects Using Machine Learning: Current and Future Methods to Address Underlying Challenges” (published in Clinical Pharmacology & Therapeutics in 2020).
We have also created a video tutorial series on individualized treatment effect inference, which we will continue to update over time.
We would also encourage you to stay abreast of ongoing developments in this and other areas of machine learning for healthcare by signing up to take part in one of our two streams of online engagement sessions.
If you are a practicing clinician, please sign up for Revolutionizing Healthcare, which is a forum for members of the clinical community to share ideas and discuss topics that will define the future of machine learning in healthcare (no machine learning experience required).
If you are a machine learning student, you can join our Inspiration Exchange engagement sessions, in which we introduce and discuss new ideas and development of new methods, approaches, and techniques in machine learning for healthcare.
Our work so far
Individualized treatment effects have become an area of significant focus for our lab’s researchers in recent years. Some of our first papers are shared below.
Continuous-Time Modeling of Counterfactual Outcomes Using Neural Controlled Differential Equations
Continuous-Time Modeling of Counterfactual Outcomes Using Neural Controlled Differential Equations
Nabeel Seedat, Fergus Imrie, Alexis Bellot, Zhaozhi Qian, Mihaela van der Schaar
ICML 2022
Abstract
Estimating counterfactual outcomes over time has the potential to unlock personalized healthcare by assisting decision-makers to answer “what-if” questions. Existing causal inference approaches typically consider regular, discrete-time intervals between observations and treatment decisions and hence are unable to naturally model irregularly sampled data, which is the common setting in practice.
To handle arbitrary observation patterns, we interpret the data as samples from an underlying continuous-time process and propose to model its latent trajectory explicitly using the mathematics of controlled differential equations. This leads to a new approach, the Treatment Effect Neural Controlled Differential Equation (TE-CDE), that allows the potential outcomes to be evaluated at any time point. In addition, adversarial training is used to adjust for time-dependent confounding which is critical in longitudinal settings and is an added challenge not encountered in conventional time-series.
To assess solutions to this problem, we propose a controllable simulation environment based on a model of tumor growth for a range of scenarios with irregular sampling reflective of a variety of clinical scenarios. TE-CDE consistently outperforms existing approaches in all simulated scenarios with irregular sampling.
Identifiable Energy-based Representations: An Application to Estimating Heterogeneous Causal Effects
Identifiable Energy-based Representations: An Application to Estimating Heterogeneous Causal Effects
Yao Zhang, Jeroen Berrevoets, Mihaela van der Schaar
AISTATS 2022
Abstract
Conditional average treatment effects (CATEs) allow us to understand the effect heterogeneity across a large population of individuals. However, typical CATE learners assume all confounding variables are measured in order for the CATE to be identifiable. This requirement can be satisfied by collecting many variables, at the expense of increased sample complexity for estimating CATEs.
To combat this, we propose an energy-based model (EBM) that learns a low-dimensional representation of the variables by employing a noise contrastive loss function. With our EBM we introduce a preprocessing step that alleviates the dimensionality curse for any existing learner developed for estimating CATEs. We prove that our EBM keeps the representations partially identifiable up to some universal constant, as well as having universal approximation capability.
These properties enable the representations to converge and keep the CATE estimates consistent. Experiments demonstrate the convergence of the representations, as well as show that estimating CATEs on our representations performs better than on the variables or the representations obtained through other dimensionality reduction methods.
SyncTwin: Treatment Effect Estimation with Longitudinal Outcomes
SyncTwin: Treatment Effect Estimation with Longitudinal Outcomes
Zhaozhi Qian, Yao Zhang, Ioana Bica, Angela Wood, Mihaela van der Schaar
NeurIPS 2021
Abstract
Most of the medical observational studies estimate the causal treatment effects using electronic health records (EHR), where a patient’s covariates and outcomes are both observed longitudinally. However, previous methods focus only on adjusting for the covariates while neglecting the temporal structure in the outcomes.
To bridge the gap, this paper develops a new method, SyncTwin, that learns a patient-specific time-constant representation from the pre-treatment observations. SyncTwin issues counterfactual prediction of a target patient by constructing a synthetic twin that closely matches the target in representation. The reliability of the estimated treatment effect can be assessed by comparing the observed and synthetic pre-treatment outcomes.
The medical experts can interpret the estimate by examining the most important contributing individuals to the synthetic twin. In the real-data experiment, SyncTwin successfully reproduced the findings of a randomized controlled clinical trial using observational data, which demonstrates its usability in the complex real-world EHR.
Estimating Multi-cause Treatment Effects via Single-cause Perturbation
Estimating Multi-cause Treatment Effects via Single-cause Perturbation
Zhaozhi Qian, Alicia Curth, Mihaela van der Schaar
NeurIPS 2021
Abstract
Most existing methods for conditional average treatment effect estimation are designed to estimate the effect of a single cause – only one variable can be intervened on at one time. However, many applications involve simultaneous intervention on multiple variables, which leads to multi-cause treatment effect problems.
The multi-cause problem is challenging because one needs to overcome the confounding bias for a large number of treatment groups, each with a different cause combination. The combinatorial nature of the problem also leads to severe data scarcity – we only observe one factual outcome out of many potential outcomes. In this work, we propose Single-cause Perturbation (SCP), a novel two-step procedure to estimate the multi-cause treatment effect. SCP starts by augmenting the observational dataset with the estimated potential outcomes under single-cause interventions.
It then performs covariate adjustment on the augmented dataset to obtain the estimator. SCP is agnostic to the exact choice of algorithm in either step. We show formally that the procedure is valid under standard assumptions in causal inference. We demonstrate the performance gain of SCP on extensive synthetic and semi-synthetic experiments.
Nonparametric Estimation of Heterogeneous Treatment Effects: From Theory to Learning Algorithms
We continue to push the boundaries of our understanding of different strategies for treatment effect estimation. More recently, we therefore investigated the strengths and weaknesses of a number of so-called meta-learners (model-agnostic strategies) that have been proposed in recent years. Such learners decompose the treatment effect estimation problem into separate sub-problems, each solvable using standard supervised learning methods. Choosing between different meta-learners in a data-driven manner is difficult, as it requires access to counterfactual information. Therefore, with the ultimate goal of building better understanding of the conditions under which some learners can be expected to perform better than others a priori, we theoretically analyze four broad meta-learning strategies which rely on plug-in estimation and pseudo-outcome regression. We highlight how this theoretical reasoning can be used to guide principled algorithm design and translate our analyses into practice by considering a variety of neural network architectures as base-learners for the discussed meta-learner strategies. In a simulation study, we showcase the relative strengths of the learners under different data-generating processes.
Nonparametric Estimation of Heterogeneous Treatment Effects: From Theory to Learning Algorithms
Alicia Curth, Mihaela van der Schaar
AISTATS 2021
Abstract
The need to evaluate treatment effectiveness is ubiquitous in most of empirical science, and interest in flexibly investigating effect heterogeneity is growing rapidly. To do so, a multitude of model-agnostic, nonparametric meta-learners have been proposed in recent years. Such learners decompose the treatment effect estimation problem into separate sub-problems, each solvable using standard supervised learning methods.
Choosing between different meta-learners in a data-driven manner is difficult, as it requires access to counterfactual information. Therefore, with the ultimate goal of building better understanding of the conditions under which some learners can be expected to perform better than others a priori, we theoretically analyze four broad meta-learning strategies which rely on plug-in estimation and pseudo-outcome regression.
We highlight how this theoretical reasoning can be used to guide principled algorithm design and translate our analyses into practice by considering a variety of neural network architectures as base-learners for the discussed meta-learning strategies. In a simulation study, we showcase the relative strengths of the learners under different data-generating processes.
On Inductive Biases for Heterogeneous Treatment Effect Estimation
As an alternative to meta-learner strategies, which separate the treatment effect estimation task into separate estimation stages, we also considered end-to-end learning solutions. We investigate how to exploit structural similarities of an individual’s potential outcomes (POs) under different treatments to obtain better estimates of conditional average treatment effects in finite samples. Especially when it is unknown whether a treatment has an effect at all, it is natural to hypothesize that the POs are similar – yet, some existing strategies for treatment effect estimation employ regularization schemes that implicitly encourage heterogeneity even when it does not exist and fail to fully make use of shared structure. In this paper, we investigate and compare three end-to-end learning strategies to overcome this problem – based on regularization, reparametrization and a flexible multi-task architecture – each encoding inductive bias favoring shared behavior across POs. To build understanding of their relative strengths, we implement all strategies using neural networks and conduct a wide range of semi-synthetic experiments. We observe that all three approaches can lead to substantial improvements upon numerous baselines — including meta-learner strategies — and gain insight into performance differences across various experimental settings.
On Inductive Biases for Heterogeneous Treatment Effect Estimation
Alicia Curth, Mihaela van der Schaar
NeurIPS 2021
Abstract
We investigate how to exploit structural similarities of an individual’s potential outcomes (POs) under different treatments to obtain better estimates of conditional average treatment effects in finite samples. Especially when it is unknown whether a treatment has an effect at all, it is natural to hypothesize that the POs are similar – yet, some existing strategies for treatment effect estimation employ regularization schemes that implicitly encourage heterogeneity even when it does not exist and fail to fully make use of shared structure.
In this paper, we investigate and compare three end-to-end learning strategies to overcome this problem – based on regularization, reparametrization and a flexible multi-task architecture – each encoding inductive bias favoring shared behavior across POs. To build understanding of their relative strengths, we implement all strategies using neural networks and conduct a wide range of semi-synthetic experiments.
We observe that all three approaches can lead to substantial improvements upon numerous baselines and gain insight into performance differences across various experimental settings.
SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data
In this work, we study the problem of inferring heterogeneous treatment effects from time-to-event outcome data. While both the related problems of (i) estimating treatment effects for binary or continuous outcomes and (ii) predicting survival outcomes have been well studied in the recent machine learning literature, their combination – albeit of high practical relevance – has received considerably less attention. With the ultimate goal of reliably estimating the effects of treatments on instantaneous risk and survival probabilities, we focus on the problem of learning (discrete-time) treatment-specific conditional hazard functions. We find that unique challenges arise in this context due to a variety of covariate shift issues that go beyond a mere combination of well-studied confounding and censoring biases. We theoretically analyse their effects by adapting recent generalization bounds from domain adaptation and treatment effect estimation to our setting and discuss implications for model design. We use the resulting insights to propose a novel deep learning method for treatment-specific hazard estimation based on balancing representations. We investigate performance across a range of experimental settings and empirically confirm that our method outperforms baselines by addressing covariate shifts from various sources.
SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data
Alicia Curth, Changhee Lee, Mihaela van der Schaar
NeurIPS 2021
Abstract
We study the problem of inferring heterogeneous treatment effects from time-to-event data. While both the related problems of (i) estimating treatment effects for binary or continuous outcomes and (ii) predicting survival outcomes have been well studied in the recent machine learning literature, their combination — albeit of high practical relevance — has received considerably less attention.
With the ultimate goal of reliably estimating the effects of treatments on instantaneous risk and survival probabilities, we focus on the problem of learning (discrete-time) treatment-specific conditional hazard functions. We find that unique challenges arise in this context due to a variety of covariate shift issues that go beyond a mere combination of well-studied confounding and censoring biases. We theoretically analyse their effects by adapting recent generalization bounds from domain adaptation and treatment effect estimation to our setting and discuss implications for model design.
We use the resulting insights to propose a novel deep learning method for treatment-specific hazard estimation based on balancing representations. We investigate performance across a range of experimental settings and empirically confirm that our method outperforms baselines by addressing covariate shifts from various sources.
Really Doing Great at Estimating CATE? A Critical Look at ML Benchmarking Practices in Treatment Effect Estimation
As we have highlighted throughout this pillar, the machine learning (ML) toolbox for estimation of heterogeneous treatment effects from observational data is expanding rapidly — yet many of its algorithms have been evaluated only on a very limited set of semi-synthetic benchmark datasets. In this paper, we investigate current benchmarking practices for ML-based conditional average treatment effect (CATE) estimators, with special focus on empirical evaluation based on the popular semi-synthetic IHDP benchmark. We identify problems with current practice and highlight that semi-synthetic benchmark datasets, which (unlike real-world benchmarks used elsewhere in ML) do not necessarily reflect properties of real data, can systematically favor some algorithms over others – a fact that is rarely acknowledged but of immense relevance for interpretation of empirical results. Further, we argue that current evaluation metrics evaluate performance only for a small subset of possible use cases of CATE estimators, and discuss alternative metrics relevant for applications in personalized medicine. Additionally, we discuss alternatives for current benchmark datasets, and implications of our findings for benchmarking in CATE estimation.
Really Doing Great at Estimating CATE? A Critical Look at ML Benchmarking Practices in Treatment Effect Estimation
Alicia Curth, David Svensson, Jim Weatherall, Mihaela van der Schaar
NeurIPS 2021
Abstract
The machine learning (ML) toolbox for estimation of heterogeneous treatment effects from observational data is expanding rapidly, yet many of its algorithms have been evaluated only on a very limited set of semi-synthetic benchmark datasets. In this paper, we investigate current benchmarking practices for ML-based conditional average treatment effect (CATE) estimators, with special focus on empirical evaluation based on the popular semi-synthetic IHDP benchmark.
We identify problems with current practice and highlight that semi-synthetic benchmark datasets, which (unlike real-world benchmarks used elsewhere in ML) do not necessarily reflect properties of real data, can systematically favor some algorithms over others — a fact that is rarely acknowledged but of immense relevance for interpretation of empirical results. Further, we argue that current evaluation metrics evaluate performance only for a small subset of possible use cases of CATE estimators, and discuss alternative metrics relevant for applications in personalized medicine.
Additionally, we discuss alternatives for current benchmark datasets, and implications of our findings for benchmarking in CATE estimation.
Benchmarking Heterogeneous Treatment Effect Models through the Lens of Interpretability
Recent developments in the machine learning (ML) literature on heterogeneous treatment effect estimation — many of which we have discussed throughout this pillar — have given rise to many sophisticated, but opaque, tools: due to their flexibility, modularity and ability to learn constrained representations, neural networks in particular have become central to this literature. Unfortunately, the assets of such black boxes come at a cost: models typically involve countless nontrivial operations, making it difficult to understand what they have learned. Yet, understanding these models can be crucial – in a medical context, for example, discovered knowledge on treatment effect heterogeneity could inform treatment prescription in clinical practice. In this work, we therefore use post-hoc feature importance methods to identify features that influence the model’s predictions. This allows us to evaluate treatment effect estimators along a new and important dimension that has been overlooked in previous work: We construct a benchmarking environment to empirically investigate the ability of personalized treatment effect models to identify predictive covariates – covariates that determine differential responses to treatment. Our benchmarking environment then enables us to provide new insight into the strengths and weaknesses of different types of treatment effects models as we modulate different challenges specific to treatment effect estimation – e.g. the ratio of prognostic to predictive information, the possible nonlinearity of potential outcomes and the presence and type of confounding.
Benchmarking Heterogeneous Treatment Effect Models through the Lens of Interpretability
Jonathan Crabbé, Alicia Curth, Ioana Bica, Mihaela van der Schaar
NeurIPS 2022
Abstract
Estimating personalized effects of treatments is a complex, yet pervasive problem. To tackle it, recent developments in the machine learning (ML) literature on heterogeneous treatment effect estimation gave rise to many sophisticated, but opaque, tools: due to their flexibility, modularity and ability to learn constrained representations, neural networks in particular have become central to this literature.
Unfortunately, the assets of such black boxes come at a cost: models typically involve countless nontrivial operations, making it difficult to understand what they have learned. Yet, understanding these models can be crucial — in a medical context, for example, discovered knowledge on treatment effect heterogeneity could inform treatment prescription in clinical practice. In this work, we therefore use post-hoc feature importance methods to identify features that influence the model’s predictions. This allows us to evaluate treatment effect estimators along a new and important dimension that has been overlooked in previous work: We construct a benchmarking environment to empirically investigate the ability of personalized treatment effect models to identify predictive covariates — covariates that determine differential responses to treatment.
Our benchmarking environment then enables us to provide new insight into the strengths and weaknesses of different types of treatment effects models as we modulate different challenges specific to treatment effect estimation — e.g. the ratio of prognostic to predictive information, the possible nonlinearity of potential outcomes and the presence and type of confounding.
Policy Analysis using Synthetic Controls in Continuous-Time
Policy Analysis using Synthetic Controls in Continuous-Time
Alexis Bellot, Mihaela van der Schaar
PMLR 2021
Abstract
Counterfactual estimation using synthetic controls is one of the most successful recent methodological developments in causal inference. Despite its popularity, the current description only considers time series aligned across units and synthetic controls expressed as linear combinations of observed control units. We propose a continuous-time alternative that models the latent counterfactual path explicitly using the formalism of controlled differential equations. This model is directly applicable to the general setting of irregularly-aligned multivariate time series and may be optimized in rich function spaces — thereby improving on some limitations of existing approaches.
Further resources and papers cited on this page, in order of first appearance
- A. M. Alaa, M. van der Schaar, “Bayesian Inference of Individualized Treatment Effects using Multi-task Gaussian Processes,” Neural Information Processing Systems (NeurIPS), 2017.
- T. Kyono, Y. Zhang, M. van der Schaar, “CASTLE: Regularization via Auxiliary Causal Graph Discovery,” Neural Information Processing Systems (NeurIPS), 2020
- I. Bica, A. Alaa, M. van der Schaar, “Time Series Deconfounder: Estimating Treatment Effects over Time in the Presence of Hidden Confounders,” International Conference on Machine Learning (ICML), 2020.
- I. Bica, J. Jordon, M. van der Schaar, “Estimating the Effects of Continuous-valued Interventions using Generative Adversarial Networks,” Neural Information Processing Systems (NeurIPS), 2020.
- J. Berrevoets, J. Jordon, I. Bica, A. Gimson, M. van der Schaar, “OrganITE: Optimal transplant donor organ offering using an individual treatment effect,” Neural Information Processing Systems (NeurIPS), 2020.
- J. Yoon, J. Jordon, M. van der Schaar, “GANITE: Estimation of Individualized Treatment Effects using Generative Adversarial Nets,” International Conference on Learning Representations (ICLR), 2018.
- A. M. Alaa, M. van der Schaar, “Limits of Estimating Heterogeneous Treatment Effects: Guidelines for Practical Algorithm Design,” International Conference on Machine Learning (ICML), 2018.
- A. Curth, M. van der Schaar, “Nonparametric Estimation of Heterogeneous Treatment Effects: From Theory to Learning Algorithms,” International Conference on Artificial Intelligence and Statistics (AISTATS), 2021.
- A. M. Alaa, M. van der Schaar, “Bayesian Nonparametric Causal Inference: Information Rates and Learning Algorithms,” IEEE Journal of Selected Topics in Signal Processing (JSTSP), 2018.
- A. M. Alaa, M. van der Schaar, “Validating Causal Inference Models via Influence Functions,” International Conference on Machine Learning (ICML), 2019.
- B. Lim, A. Alaa, M. van der Schaar, “Forecasting Treatment Responses Over Time Using Recurrent Marginal Structural Networks,” Neural Information Processing Systems (NeurIPS), 2018.
- I. Bica, A. M. Alaa, J. Jordon, M. van der Schaar, “Estimating counterfactual treatment outcomes over time through adversarially balanced representations,” International Conference on Learning Representations (ICLR), 2020.
- W. R. Zame, I. Bica, C. Shen, A. Curth, H.-S. Lee, S. Bailey, J. Weatherall, D. Wright, F. Bretz, M. van der Schaar, “Machine learning for clinical trials in the era of COVID-19,” Statistics in Biopharmaceutical Research – Special Issue on Covid-19, 2020.
- H.-S. Lee, Y. Zhang, W. Zame, C. Shen, J.-W. Lee, M. van der Schaar, “Robust Recursive Partitioning for Heterogeneous Treatment Effects with Uncertainty Quantification,” Neural Information Processing Systems (NeurIPS), 2020.
- I. Bica, A. M. Alaa, C. Lambert, M. van der Schaar, “From Real‐World Patient Data to Individualized Treatment Effects Using Machine Learning: Current and Future Methods to Address Underlying Challenges,” Statistics in Biopharmaceutical Research, 2020.
- N. Seedat, F. Imrie, A. Bellot, Z. Qian, M. van der Schaar, “Continuous-Time Modeling of Counterfactual Outcomes Using Neural Controlled Differential Equations,” International Conference on Machine Learning (ICML), 2022.
- Y. Zhang, J. Berrevoets, M. van der Schaar, “Identifiable Energy-based Representations: An Application to Estimating Heterogeneous Causal Effects,” Artificial Intelligence and Statistics Conference (AISTATS), 2022.
- Z. Qian, Y. Zhang, I. Bica, A. Wood, M. van der Schaar, “SyncTwin: Treatment Effect Estimation with Longitudinal Outcomes,” Neural Information Processing Systems (NeurIPS), 2021.
- Z. Qian, A. Curth, M. van der Schaar, “Estimating Multi-cause Treatment Effects via Single-cause Perturbation,” Neural Information Processing Systems (NeurIPS), 2021.
- A. Curth, M. van der Schaar, “Nonparametric Estimation of Heterogeneous Treatment Effects: From Theory to Learning Algorithms,” Artificial Intelligence and Statistics Conference (AISTATS), 2021.
- A. Curth, M. van der Schaar, “On Inductive Biases for Heterogeneous Treatment Effect Estimation,” Neural Information Processing Systems (NeurIPS), 2021.
- A. Curth, C. Lee, M. van der Schaar, “SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data,” Neural Information Processing Systems (NeurIPS), 2021.
- A. Curth, D. Svensson, J. Weatherall, M. van der Schaar, “Really Doing Great at Estimating CATE? A Critical Look at ML Benchmarking Practices in Treatment Effect Estimation,” Neural Information Processing Systems (NeurIPS), 2021.
- J. Crabbé, A. Curth, I. Bica, M. van der Schaar, “Benchmarking Heterogeneous Treatment Effect Models through the Lens of Interpretability,” Neural Information Processing Systems (NeurIPS), 2022.
- A. Bellot, M. van der Schaar, “Policy Analysis using Synthetic Controls in Continuous-Time,” Proceedings of Machine Learning Research (PMLR), 2021.
- A. Curth, M. van der Schaar, “Understanding the Impact of Competing Events on Heterogeneous Treatment Effect Estimation from Time-to-Event Data,” Artificial Intelligence and Statistics Conference (AISTATS), 2023.
- J. Berrevoets, F. Imrie, T. Kyono, J. Jordon, M. van der Schaar, “To Impute or not to Impute? Missing Data in Treatment Effect Estimation,” Artificial Intelligence and Statistics Conference (AISTATS), 2023.
- J. Berrevoets, J. Jordon, I. Bica, A. Gimson, M. van der Schaar, “OrganITE: Optimal transplant donor organ offering using an individual treatment effect,” Neural Information Processing Systems (NeurIPS), 2020.
A full list of our papers on causal inference, individualized treatment effect inference, and related topics, can be found here.