*Machine learning is capable of enabling truly personalized healthcare; this is what our lab calls “bespoke medicine.” *

More info on bespoke medicine can be found here.

*Bespoke medicine entails far more than providing predictions for individual patients: we also need to understand the effect of specific treatments on specific patients at specific times. This is what we call individualized treatment effect inference. It is a substantially more complex undertaking than prediction, and every bit as important.*

*Our lab has built a position of leadership in this area. We have defined the research agenda by outlining and addressing key complexities and challenges, and by laying the theoretical groundwork for model development. In our development of algorithms, we have identified and targeted an extensive range of potential clinical applications using both clinical trials and observational data as inputs.*

*The page below provides an introduction to individualized treatment effect inference, as well as an overview of some key projects that have driven the entire research area forward.*

*This page is one of several introductions to areas that we see as “research pillars” for our lab. It is a living document, and the content here will evolve as we continue to reach out to the machine learning and healthcare communities, building a shared vision for the future of healthcare.Our primary means of building this shared vision is through two groups of online engagement sessions: Inspiration Exchange (for machine learning students) and Revolutionizing Healthcare (for the healthcare community). If you would like to get involved, please visit the page below.*

*This page is authored and maintained by Mihaela van der Schaar and Nick Maxfield.*

## Individualized treatment effect inference: a brief introduction

This page introduces **individualized treatment effect inference** — which we could also refer to as **causal inference of individualized treatment effects** — as one of our lab’s key research areas, and offers an overview of a range of relevant projects we have undertaken.

The broader area of “causal inference” in machine learning can be broken down into two sub-fields: (i) causal discovery and (ii) individualized treatment effect inference. While (i) is concerned with discovering which variables affect another in what direction, (ii) is concerned with quantifying the association between variables that by (i) are related by estimating the effect of one (or more) variables on another. Here we focus exclusively on (ii).

In creating this page, we aim to raise and discuss issues related to both the static (cross‐sectional) setting and the longitudinal setting (where patient history and treatment timing are taken into account). We describe the challenges associated with learning from observational data, such as confounding bias, as well as the modeling choices used by machine learning methods to handle them in both settings.

Our lab is also deeply interested in what we call “causal machine learning,” a related but distinct area where the focus is on using causal graphs to improve the robustness of machine learning for prediction, domain adaptation, transfer learning, and more. For an example of our work in this area, please take a look at CASTLE, a NeurIPS 2020 paper.

Causal machine learning will also form the basis of a new piece of content in the future.

## Treatment effects: from the average to the individual

A major challenge in the domain of healthcare is ascertaining whether a given treatment influences or determines an outcome—for instance, whether there is a survival benefit to prescribing a certain medication, such as the ability of a statin to lower the risk of cardiovascular disease.

Current treatment guidelines have been developed with the “average” patient in mind (on the basis of randomized control trials), but there is ample evidence that different treatments result in different effects and outcomes from one individual to another: for any given treatment, it is quite likely that only a small proportion of people will actually respond in a manner that resembles the “average” patient.

Since the advent of precision medicine and the availability of large amounts of observational data from electronic health records, the research community has started to explore more quantitative individual-level problems, such as the *magnitude *of the effect of a treatment on a condition *for an individual* (one example might be the survival benefit of weight loss for a 60-year-old cardiovascular patient with diabetes). Rather than making treatment decisions based on blanket assumptions about “average” patients, the goal of clinical decision-makers is now to determine the optimal treatment course for any given patient at any given time. Methods for doing so in a quantitative fashion based on insights from machine learning are in the formative stages of development (our lab’s work in this area will be covered below).

There are two ways to determine whether a treatment works: observational datasets, and post-hoc analysis of clinical trials. Each method has its own strengths and weaknesses.

**Observational datasets**

At the moment, doctors can learn from experience and time which treatments work for each individual, but there is no mechanism for sharing this knowledge on a population level in a way that allows the extraction of valuable insights on treatment effects. The increasing availability of observational data has, however, encouraged the development of various machine learning algorithms tailored for inferring treatment effects. It is worth noting, however, that observation datasets are prone to treatment assignment bias, as explained in more detail later on.

**Clinical trials**

Randomized Controlled Trials (RCTs) are the gold standard for comparing the effectiveness of a new treatment to the current one. Clinical trials may, however, not always be the most practical option for evaluating certain treatments, since they are costly and time-consuming to implement, and they do not always recruit representative patients.

This makes external validity an issue for RCTs, as findings sometimes fail to generalize beyond the study population. This may be due to the narrow inclusion criteria in RCTs compared with the real world, where historically, population restrictions with respect to disease severity, comorbidities, elderly patients, and ethnic minorities can be under‐represented. By contrast, when drugs are US Food and Drug Administration (FDA)‐approved after the clinical trials stage, they start being administered to a much larger and varied population of patients.

Although there is increasing awareness of this issue and global regulatory authorities are encouraging wider inclusion criteria in clinical trials, it remains an issue that is unlikely to be solved by RCTs and associated integrated and model‐based analyses alone. There is scope to add an adaptive element to clinical trials through the use of machine learning.

To summarize the above: our goal is to support a shift **from a focus on average treatment effects to individualized treatment effects** by optimizing the use of **observational datasets** and **clinical trial design**. Estimating individualized treatment effects from EHR data represents a thriving area of research, in which machine learning methods are primed to take center stage.

*Clinical example:*

Breast cancer treatment outcomes

When deciding on a treatment for a given form of cancer, clinical decisions are often made on the basis of results from randomized controlled trials of treatments involving that cancer.

As explained above, this approach assumes a response to treatment based on the response of the “average patient,” rather than taking into account the health history and specific features of the individual.

The figure above shows a range of recommendations for a specific cancer patient. On the left-hand side, we see the risk of recurrence within one year for several treatment options based on population-level data. Based on these recommendations, the optimal treatment choice would be a combination of chemotherapy and radiotherapy.

This stands in contrast to the chart in the middle, which is an individualized recommendation based on observational data using a range of machine learning techniques (many of which are outlined later on this page). This individualized recommendation shows that chemotherapy (without radiotherapy) would in fact yield the lowest likelihood of recurrence within one year.

The figure above is an example taken from a live demonstrator system based on breast cancer, fed by anonymized real-world data. More details on this project are available in the video below, taken from a presentation given by Mihaela van der Schaar at the Royal College of Physicians in 2019.

*Clinical example:*

LVAD implantation

The implantation of left ventricular assist devices (LVADs) in many ways demonstrates the difficulties and pitfalls that are commonplace in medical decision-making, and the importance of being able to estimate individualized treatment effects.

LVADs can serve as a “stopgap” measure for patients on heart transplant waiting lists, but the procedure is costly and invasive. On top of this, there is substantial evidence challenging the conventional assumption that all patients with LVADs will benefit from them equally: in fact, it is clear that both the outcome and the optimal timing of the LVAD implantation vary extensively from individual to individual.

There is a clear benefit, therefore, to being able to learn the individualized survival benefits of LVADs for cardiac patients waiting for a heart. Clinical trials are not well-suited to this purpose: they are expensive, rely on small data samples, and focus on short-term outcomes. In fact, in the case of LVADs it might not even be possible to conduct clinical trials, making learning survival benefits from observational data the only viable option.

This is what our lab did in 2017, in a study featured at NeurIPS entitled “Bayesian Inference of Individualized Treatment Effects using Multi-task Gaussian Processes.” We proposed learning from observational datasets as an alternative to conducting clinical trials using a multi-task learning framework, and demonstrated the effectiveness of this approach.

With regard to LVAD implantation, we were able to use causal multi-task gaussian processes (CGMP) to identify patients who had received LVADs at a time that was suboptimal in terms of survivability. In a number of such cases, patients died while awaiting heart transplants, with a substantial likelihood that this could have been prevented through optimal timing of LVAD implantation.

In other cases, patients died without having received LVADs in the first place, whereas analysis using CGMP suggested a high likelihood of survival until transplant availability in the event of timely LVAD implantation.

LVAD implantation is just one example of how moving beyond assumptions regarding “average” treatment benefits and instead estimating individual effects using machine learning can lead to personalized survival predictions and a finer priority scheme.

## Why is individualized treatment effect inference so complicated?

Our goal is to use machine learning to estimate the effect of a treatment on an individual using static or time-series observational data.

The problem of estimating individual-level causal effects is usually formulated within the classical potential outcomes framework, first introduced by Neyman in 1923 and subsequently expanded by Rubin into a broader causal model. The framework is based on observational data consisting of patient features, treatment assignment and outcome.

*Note:*

Binary treatment options vs. multiple treatment options

The discussion here is, for the purpose of simplicity, focused on binary treatments (untreated = 0, treated = 1).

It is worth bearing in mind, however, that in application the framework and associated methods for individualized treatment effect inference can be extended to any number of potential treatments.

The apparent simplicity of this framework belies the true complexity of the problem of individualized treatment effect inference; we believe there are three key reasons for this:

– we must work in the absence of counterfactual outcomes;

– bias in observational datasets must be addressed; and

– there is no single preferred way to include treatment indicators in outcome models.

Furthermore, little work has been done to develop a comprehensive theory for individualized treatment effect inference, including principles for optimal model selection.

Overcoming these challenges will require not just methodological advances but also new ways of thinking. In the sections below, we will provide an explanation of each of these issues, while highlighting some of the ways in which our lab’s projects have made progress towards their resolution.

*Note: *Assumptions for individualized treatment effect inference

While this topic is not discussed in depth here, it is worth noting that performing Individualized treatment effect inference requires us to make 2 assumptions: 1) overlap and 2) lack of hidden/unmeasured confounders.

*For reference, our most recent work on estimating treatment effects over time in the presence of hidden confounders was presented at ICML 2020 (related paper here).*

## Estimating response surfaces

In the potential outcomes framework outlined above, every subject (individual) in the observational dataset possesses a number of potential outcomes: the subject’s outcome under the application of various treatments, and the subject’s outcome when no treatment is applied. The treatment effect is the difference between the two potential outcomes, but since we only observe the “factual” outcome for a specific treatment assignment, and never observe the corresponding “counterfactual” outcome, we never observe any examples of the true treatment effect in an observational dataset. This is what makes the problem of individualized treatment effect inference fundamentally different from standard supervised learning (regression).

It is important, therefore, to understand from the outset that any method to estimate individualized treatment effects is limited to using the data available at hand, which is entirely composed of factuals, and not counterfactuals. For instance, note that the figure above shows us the factual outcome, but not the counterfactual.

The majority of existing methods for estimating individualized treatment effects from observational data focus on the binary or categorical treatment settings and very few methods consider more complex treatment scenarios. However, it is often the case that treatments have an associated dosage which requires us to estimate the causal effects of continuous-valued interventions.

Additionally, for organ transplantation, it is necessary to estimate the effect of high dimensional, and potentially unique, organs on the patient’s survival such that we assign the organ to the patient that would have the highest survival benefit. Our lab has done work to handle these more complex treatment scenarios in two recent papers published at NeurIPS 2020 on individualized dose-response estimation (SCIGAN) and estimating the individualized effect of transplant-organs (high-dimensional treatments) on patients’ survival (OrganITE).

*Research focus:*

Using GANs to compensate for the absence of counterfactual outcomes

While numerous approaches to individualized treatment effect estimation have produced strong results (including many developed by our own lab) in the absence of counterfactuals, it is also possible to employ generative adversarial networks (GANs) to attempt to account for these unseen counterfactual outcomes.

This was the focus of our work on GANITE, a method first outlined in a 2018 ICLR paper.

The defining feature of the GAN framework is the existence of a generator and discriminator, trained in an adversarial fashion against each other. The generator tries to generate synthetic samples that the discriminator is incapable of distinguishing from the real samples, while the discriminator tries to identify which of the samples are the synthetic ones. This framework can be formulated as a minimax game and at the optimal point of this game, generated samples follow the real data distribution.

As a result, the GAN framework provides a powerful platform for inference based on the factual data while allowing us to capture the uncertainty in the counterfactual distributions by attempting to learn them. GANITE consists of two blocks: a counterfactual imputation block and an individualized treatment effect block, each of which consists of a generator and a discriminator. We view the factual outcome as an observed label and consider the counterfactual outcomes to be missing labels; the counterfactual generator of GANITE attempts to generate counterfactual outcomes in such a way that when given the combined vector of factual and generated counterfactual outcomes the discriminator of GANITE cannot determine which of the components is the factual outcome. (After all, if the generated counterfactuals follow the underlying distribution, it should not be possible to discriminate the real outcome from the generated outcomes.)

With the complete labels (combined factual and estimated counterfactual outcomes), the individualized treatment effect estimation function can then be trained for inferring the potential outcomes of the individual based on the feature information in a supervised way. By also modelling this individualized treatment effect estimation function using a GAN framework, we are able not only to predict the expected outcomes but also quantify the uncertainty in the predictions, which is particularly important in the medical setting.

Unlike many other state-of-the-art methods, GANITE naturally extends to – and in fact is defined in the first place for – any number of treatments.

An additional feature of GANITE is its relative robustness to treatment assignment bias; this is addressed in more detail in the full ICLR 2018 paper below.

## Including treatment effects in outcome models and handling bias

When modeling individualized treatment effects, we face further issues related to handling treatment bias in observational datasets, and a multitude of choices regarding approaches to handling treatment indicators when estimating patient outcomes.

The former challenge results from the fact that, when estimating individualized treatment effects, assignment bias creates a discrepancy in the feature distributions for treated and control patient groups. Simply put: decision-making by doctors introduces bias into the data.

Modeling the treatment assignment, and its impact on the outcome, is a similarly complex proposition: several approaches exist, with the simplest being to split data into separate models (treated and untreated), or to use the assignment variable as a feature to augment the feature dimension.

A third solution, which has been adopted in a number of papers by our own lab’s researchers, is to learn shared representations, where the treatment assignment indexes these shared representations. This enables us to learn jointly across the treated and untreated populations.

*Research focus:*

Building shared representations with non-stationary Gaussian processes

In an ICML 2018 paper, entitled “Limits of Estimating Heterogeneous Treatment Effects: Guidelines for Practical Algorithm Design,” we provided a characterization of the fundamental limits of estimating heterogeneous treatment effects, and established conditions under which these limits can be achieved.

Our analysis revealed that the relative importance of the different aspects of observational data vary with the sample size. For instance, we showed that assignment bias matters only in small-sample regimes, whereas with a large sample size, the way an algorithm models the control and treated outcomes is what bottlenecks its performance. Guided by our analysis, we built a practical algorithm for estimating treatment effects using non-stationary Gaussian processes (NSGP) with doubly-robust hyperparameters.

By employing NSGPs and using a Gaussian process prior across the two response surfaces, we are modeling these two response surfaces jointly. The advantage of doing this is that now we have a shared representation that is learned effectively in the small sample regime. Also, Gaussian processes are very flexible, and they enable us to learn the parameters that are important (so we are able to learn different sparsity and also adapt the model to the different smoothness of the response surfaces).

Using a standard semi-synthetic simulation setup, we demonstrated that our algorithm outperforms the state-of-the-art, and that the behavior of existing algorithms conforms with our analysis.

Many methods have been developed that take one of the three approaches above to modeling treatments, and they also handle bias differently. For all of this research, however, there still remains a great deal of work to be done in developing a comprehensive theory (i.e. a principled guideline for building estimators of treatment effects using machine learning algorithms). This makes it very difficult to verify the types of algorithms we should be developing, or how to deal with the twin two problems of modeling treatments and handling bias.

This is why, a few years ago, our lab developed the first theory for individualized treatment effect inference. To do this, we first tried to develop a theoretical understanding of the limits of this problem. Then, guided by this, we sought to identify unique principles that can guide the development of algorithms.

*Research focus:*

Under the hood of our comprehensive theory for individualized treatment effect inference

In a 2018 paper entitled “Bayesian Nonparametric Causal Inference: Information Rates and Learning Algorithms,” we addressed the individualized causal effect estimation problem on the basis of the Neyman-Rubin potential outcomes model, and established the fundamental limits on the amount of information that a learning algorithm can gather about the causal effect of an intervention given an observational data sample. We also provided guidelines for building proper individualized treatment effect inference models that “do not leave any information on the table” because of poor modeling choices.

We set this into a non-parametric Bayesian estimation framework, where we put a prior over the two response surfaces: treated and untreated. And then we computed point estimates induced by the Bayesian posterior on the basis of the data that we have available. We computed the errors associated with the different estimation problems by determining the precision of estimating heterogeneous effects (PEHE) to estimate the efficiency of these different types of individualized treatment effects models.

We characterized the optimal information rate that can be achieved by any learning procedure, and showed that it depends on the dimensionality of the feature space, and the smoothness of the “rougher” of the two potential outcomes.

We also used the conclusions drawn from our analysis and designed a practical Bayesian causal inference algorithm with a multi-task Gaussian process, and showed that it significantly outperforms the state-of-the-art models through experiments conducted on a standard semisynthetic dataset.

The theory we developed guides our model design in two ways: in the small sample regime, we need to have methods that are effectively handling assignment bias, and are hence able to share the training data effectively between the response surfaces. In the large sample regime, we need models that are able to flexibly learn from the available data and do hyperparameter tuning effectively.

We continue to push the boundaries of our understanding of different strategies for treatment effect estimation. More recently, we investigated the strengths and weaknesses of a number of so-called meta-learners (model-agnostic learning strategies) both theoretically and empirically, providing further guidance towards principled algorithm design. Our recent paper on this topic was accepted for publication at AISTATS 2021, and can be found here.

A firm theoretical foundation for individual-level individualized treatment effect inference will make it possible to carry out reliable estimation of individualized treatment effects. Such reliable estimation will have obvious implications for the treatment of patients, but it will also have less-obvious implications for clinical trials. The first is that it will enable more reliable post-hoc analysis (such as understanding which groups of patients benefit least or most from the trial treatment). The second is that it may better inform the process of sequentially recruiting patients into clinical trials, thereby enabling better design, both in terms of maximizing overall statistical power and in terms of maximizing the information learned for patients with specific covariates.

*Research focus:*

Estimating the effects of continuous interventions from observational data

It is highly common for treatment decisions to involve not only determining which intervention to make (e.g. whether to treat cancer with radiotherapy, chemotherapy or surgery) but also determining the value of some continuous parameter associated with intervening (e.g. the dosage of radiotherapy to be administered).

Despite this, relatively little work has been done in the setting of continuous-valued interventions, while much attention has been given to the problem of estimating the effect of discrete interventions from observational data.

Since continuous interventions arise in many practical scenarios, the impact of this problem in the healthcare setting is clear: being able to better estimate individual responses to dosages would help us select treatments that result in improved patient outcomes. Moreover, clinicians and patients will often need to consider several different outcomes (such as potential side effects); better estimates of such outcomes allow the patients to make a more informed decision that is suitable for them.

In a paper accepted for publication at NeurIPS 2020, entitled “Estimating the Effects of Continuous-valued Interventions using Generative Adversarial Network,” we proposed a novel framework called SCIGAN for estimating response curves for continuous interventions from observational data. As the name suggests, SCIGAN is built on a modified GAN framework (for an introduction to GANs, see the related section above)

To address the challenges presented by shifting to continuous interventions, we proposed a novel architecture for our discriminator: a hierarchical discriminator that leverages the structure of the continuous intervention setting. Our approach is very flexible, and can simultaneously estimate counterfactual outcomes for several different continuous interventions.

Our proposed model represents an important step forward. Nevertheless, this work is on the theoretical side and significant testing, potentially through clinical trials, will be needed before such methods can be used in practice.

## Selecting optimal models for individualized treatment effect inference

In the sections above, we have introduced the challenges inherent in developing approaches to individualized treatment effect inference; we have explained how these challenges can be compensated for or handled, and have outlined a theory for building effective models.

This still leaves us, however, with a further challenge in implementation: a wide variety of models to choose from, and a potentially limitless array of application types and datasets. Choosing “one best model” is impossible, since no single method will ever outperform all others across all datasets, so the challenge becomes selecting the best-performing model for each particular task and dataset. This is further complicated by the fact that we lack access to the counterfactuals and we cannot compute ground truth individualized treatment effects estimates to evaluate the model’s predictions against. This is in contrast to predictive models, where one can use the mean squared error between the model’s predictions and the ground truth label.

The answer to this problem is to use automated machine learning (AutoML) to compare models and select the best model for the task at hand. In experiments applying our own AutoML framework for individualized treatment effect inference (details of which are provided in the box below), we found that the best model selected by the framework tended to significantly outperform other commonly-used methods. This is shown below in a comparison of the performance of methods published at ICML, NeurIPS and ICLR conferences from 2016 to 2018 on 77 datasets.

*Research focus:*

Automated causal inference using influence functions

In an ICML 2019 paper, entitled “Validating Causal Inference Models via Influence Functions,” our lab introduced a first-of-its-kind validation procedure for estimating the performance of causal inference methods using influence functions (IFs)—the functional derivatives of a loss function.

The procedure we introduced utilizes a Taylor-like expansion to approximate the loss function of a method on a given dataset in terms of the influence functions of its loss on a “synthesized”, proximal dataset with known causal effects.

This automated and data-driven approach to model selection enables confident deployment of (black-box) machine learning-based methods, and safeguards against naïve modeling choices.

Using AutoML enables practitioners such as epidemiologists and applied statisticians to use our validation procedure to select the best model for the observational study at hand.

Moreover, it is often the case that the observational data used to train a treatments effect model may come from a setting where the distribution of patient features is different from the one in the deployment (target) environment, for example, when transferring models across hospitals or countries. Because of this, it is important to be able to also select models that are robust to these covariate shifts across disparate patient populations.

In a recent paper from our lab, we propose leveraging the invariance of causal structures across domains to introduce a novel model selection metric specifically designed for treatment effects models under the unsupervised domain adaptation setting. Experimentally, our method selects treatment effects models that are more robust to covariate shifts on several synthetic and real healthcare datasets, including on estimating the effect of ventilation in COVID-19 patients from different geographic locations.

## Individualized treatment effect estimation using time-series data

While the majority of previous work focuses on the effects of interventions at a single point in time, observational data also capture information on complex time-dependent treatment scenarios, such as where the efficacy of treatments changes over time (for example, drug resistance in cancer patients), or where patients receive multiple interventions administered at different points in time (such as joint prescriptions of chemotherapy and radiotherapy).

Estimating the effects of treatments over time therefore presents unique opportunities, such as understanding how diseases evolve under different treatment plans, how individual patients respond to medication over time, and which timings may be optimal for assigning treatments, thus providing new tools to improve clinical decision support systems

Electronic health records provide a rich source of data for machine learning methods to learn dynamic treatment responses over time. These records, collected over time as part of regular follow-ups, provide a more cost-effective method to gather insights on the effectiveness of past treatment regimens.

Estimating counterfactual patient outcomes over time is challenging due to the presence of time-dependent confounders in observational datasets. Time-dependent confounders are patient covariates that affect the treatment assignments and are themselves affects by past treatments.

For instance, imagine a patient is given treatment A when a certain covariate (let’s say, white blood cell count) has been outside of normal range values for a while. Now, also imagine that the white blood cell count was itself affected by the past administration of a different treatment, treatment B. If this patient is more likely to die, without adjusting for the time-dependent confounding (e.g. the changes in the white blood cell count over time), we could incorrectly conclude that treatment A is harmful to patients.

To make this even more challenging, estimating the effect of a different sequence of treatments on the patient would require not only adjusting for the bias at the current step (in treatment A), but also for the bias introduced by the previous application of treatment B.

Using standard supervised learning methods to estimate these treatment effects will be biased by the treatment assignment policy present in the observational dataset and will not be able to generalize well to changes in the treatment policy in order to generate counterfactuals.

*Research focus:*

Two approaches to handling time-depending confounders

Existing methods for individualized treatment effect inference in the static setting cannot be applied in the longitudinal setting since they are designed to handle the cross-sectional set-up, where the treatment and outcome depend only on a static value of the patient covariates. By contrast, any direct estimation of individualized treatment effects using time-series observational data is hampered by the presence of time-dependent confounders (as mentioned directly above), where actions taken are dependent on time-varying variables related to the outcome of interest.

Models developed to estimate treatment effects based on static data would not be able to model how the changes in patient covariates over time affect the assignment of treatments, and would also be unable to estimate the effect of a sequence of treatments on the patient outcome. Different models that can handle these temporal dependencies in the observational data and varying-length patient histories are, therefore, needed for estimating treatment effects over time

**Approach 1: Recurrent Marginal Structural Network**

In a NeurIPS 2018 paper, entitled “Forecasting Treatment Responses Over Time Using Recurrent Marginal Structural Networks,” we proposed a new deep learning model, which we refer to as Recurrent Marginal Structural Networks (RMSN). Drawing inspiration from marginal structural models, a class of methods in epidemiology which use propensity weighting to adjust for time dependent confounders, the RMSN adopts a sequence to-sequence architecture to directly learn time-dependent treatment responses from observational data.

We used two sets of deep neural networks to build our RMSN: 1) a set propensity networks to compute treatment probabilities used for IPTW, and 2) a prediction network used to determine the treatment response for a given set of planned interventions.

Using simulations of a state-of-the-art pharmacokinetic pharmacodynamic (PK-PD) model of tumor growth, we demonstrated the ability of our network to accurately learn unbiased treatment responses from observational data – even under changes in the policy of treatment assignments – and performance gains over benchmarks.

**Approach 2: counterfactual recurrent network**

In an ICLR 2020 paper, entitled “Estimating counterfactual treatment outcomes over time through adversarially balanced representations,” we introduced the Counterfactual Recurrent Network (CRN), a novel sequence-to-sequence model that leverages the increasing availability of patient observational data, as well as recent advances in representation learning and domain adversarial training, to estimate treatment effects over time.

To handle the bias from time varying confounders, CRN uses domain adversarial training to build balancing representations of the patient history. At each timestep, CRN constructs a treatment invariant representation which removes the association between patient history and treatment assignments and thus can be reliably used for making counterfactual predictions.

Using a model of tumor growth, we validated CRN in realistic medical scenarios, demonstrating that, when compared with existing state-of-the-art methods, CRN achieves lower error in estimating counterfactuals and in choosing the correct treatment and timing of treatment.

The ability to accurately estimate treatment effects over time using machine learning allows clinicians to determine, in a manner tailored to each individual patient, both the treatments to prescribe and the optimal time at which to administer them, given their observational history.

Both new methods and theory are necessary to be able to harness the full potential of observational data for learning individualized effects of complex treatment scenarios. Further work in this direction is needed for proposing alternative methods for handling time-dependent confounders, for modelling combinations of treatments assigned over time or for estimating the individualized effects of time-dependent treatments with associated dosage.

## ML-assisted clinical trials

Understanding treatment effects can play an important role in the post-hoc analysis of clinical trials into interventions and treatments, as well as influencing the design of more effective clinical trials.

The implementation of clinical trials is a setting in which the relevant population is diverse, and different parts of the population display different reactions to treatment. In such settings, heterogeneous treatment effect (HTE) analysis, also called subgroup analysis, is used to find subgroups consisting of subjects who have similar covariates and display similar treatment responses. The identification of subgroups improves the interpretation of treatment effects across the entire population, and makes it possible to develop more effective interventions and treatments and to improve the design of further experiments. In clinical trials, HTE analysis can identify subgroups of the population for which the studied treatment is effective, even when it is found to be ineffective for the population in general.

*Clinical example:*

Machine learning for clinical trials in the era of COVID-19

The COVID-19 pandemic has presented enormous challenge to clinical trials in particular, given the need for expedited development, approval, and distribution.

In a 2020 paper co-authored with some of our collaborators, published in *Statistics in Biopharmaceutical Research*, we identified ways in which machine learning can respond to the challenges inherent in clinical trials of COVID-19 treatments and vaccines.

We identified three key areas for support: ongoing clinical trials for non-COVID-19 drugs; clinical trials for repurposing drugs to treat COVID-19, and clinical trials for new drugs to treat COVID-19. Many of the research projects outlined above feature in the paper.

*Research focus:*

Robust Recursive Partitioning (R2P): a method to support adaptive clinical trial design

To identify subjects who have similar covariates and display similar treatment responses, it is necessary to create reliable estimates of the treatment responses of individual subjects; i.e. of individualized treatment effects.

Most of the current methods for HTE analysis begin with a particular algorithm for estimating individualized treatment effects, and identify subgroups by maximizing the differences across subgroups of the average treatment effect in each subgroup, under the assumption that treatment effects are homogeneous within subgroups. These approaches have several weaknesses: they rely on a particular algorithm for estimating treatment effects, they ignore (in)homogeneity within identified subgroups, and they do not produce good confidence estimates.

In a 2020 NeurIPS paper, entitled “Robust Recursive Partitioning for Heterogeneous Treatment Effects with Uncertainty Quantification,” we introduced Robust Recursive Partitioning (R2P), a new method for subgroup analysis, that addresses all these weaknesses. R2P recursively partitions the entire population by taking into account both heterogeneity across subgroups and homogeneity within subgroups, using the novel criterion of confident homogeneity that is based on the quantification of uncertainty of the individualized treatment effect estimation.

Experiments using synthetic and semi synthetic datasets (based on real data) have demonstrated that R2P constructs partitions that are simultaneously more homogeneous within groups and more heterogeneous across groups than the partitions produced by other methods. Moreover, because R2P can employ any individualized treatment effect estimator, it also produces much narrower confidence intervals with a prescribed coverage guarantee than other methods.

Experiments using synthetic and semi-synthetic datasets (the latter based on real data) demonstrate that R2P outperforms state-of-the-art baseline algorithms in every dimension: greater heterogeneity across subgroups, greater homogeneity within subgroups, and narrower confidence intervals.

An additional strength of R2P is that it can employ any method for interpretable individualized treatment effect estimation, including improved methods that will undoubtedly be developed in the future.

We plan to create an additional page for adaptive clinical trials as a key research pillar in the near future, but in the meantime you can review our related publications if you’d like to learn more.

## Learn more and get involved

This page has served as an introduction to individualized treatment effect inference—from the perspective of both healthcare and machine learning.

We have demonstrated the importance of estimating individualized treatment effects in enabling “bespoke medicine” and truly moving beyond one-size-fits-all approaches. In particular, there is great potential to influence and improve the design of clinical trials, and to make effective use of observational data even in the absence of clinical trials. There are further applications to explore, such as modeling individualized treatment effects for organ transplants (as most recently highlighted in a paper accepted for presentation at NeurIPS 2020).

We have also outlined the numerous intricacies and challenges that have complicated the development of machine learning methods and techniques for individualized treatment effect inference, not only due to the lack of counterfactuals, but also due to the lack of a governing theory, the ubiquity of bias in observational data, the choice between several options for modeling treatments, and the difficulty of adapting from static to dynamic datasets. We have also summarized our own lab’s projects seeking to address these challenges.

If you would like to learn more about this topic, we would recommend reading a somewhat more detailed (but still accessible) overview of our work on individualized treatment effect inference, entitled “From Real‐World Patient Data to Individualized Treatment Effects Using Machine Learning: Current and Future Methods to Address Underlying Challenges” (published in *Clinical Pharmacology & Therapeutics* in 2020).

We have also created a video tutorial series on individualized treatment effect inference, which we will continue to update over time.

We would also encourage you to stay abreast of ongoing developments in this and other areas of machine learning for healthcare by signing up to take part in one of our two streams of online engagement sessions.

If you are a practicing clinician, please sign up for Revolutionizing Healthcare, which is a forum for members of the clinical community to share ideas and discuss topics that will define the future of machine learning in healthcare (no machine learning experience required).

If you are a machine learning student, you can join our Inspiration Exchange engagement sessions, in which we introduce and discuss new ideas and development of new methods, approaches, and techniques in machine learning for healthcare.

**Resources and papers cited on this page, in order of first appearance**

A full list of our papers on causal inference, individualized treatment effect inference, and related topics, can be found here.