van der Schaar Lab

van der Schaar Lab at ICML 2020: seven papers and a tutorial

Photo by bantersnaps on Unsplash

The van der Schaar Lab’s diverse and pioneering research will be on full display at the 2020 International Conference on Machine Learning (ICML). Seven papers have been accepted for publication at the event, and Mihaela van der Schaar will be giving a tutorial on machine learning for healthcare as well as two keynote presentations in two different workshops.

Having seven papers accepted for publication is a clear demonstration of the lab’s pre-eminent position among academic teams in the United Kingdom and Europe. Additionally, Mihaela is ranked among the top 10 academics by papers accepted for the conference, and is the top-ranked female academic.

As the premier gathering of professionals dedicated to the advancement of machine learning, ICML is renowned for presenting and publishing cutting-edge research. Participants span a wide range of backgrounds, from academic and industrial researchers, to entrepreneurs and engineers.

Mihaela’s tutorial at ICML will be on the topic of “Machine Learning for Healthcare: Challenges, Formalisms and Methods, and Research Frontiers” while her keynotes will be in workshops on AutoML and missing data imputation.

Each of the seven papers authored by the lab’s researchers presents a novel solution to an important problem, with a substantial potential impact on the application of machine learning in medicine. The papers cover diverse topics, including new techniques for providing uncertainty estimates, new methods for temporal phenotyping using deep predictive clustering, and new methods for finding the optimal doses for clinical trials with safety constraints.

Titles, authors and abstracts for all seven selected papers are given below.

Frequentist Uncertainty in Recurrent Neural Networks via Blockwise Influence Functions

Ahmed Alaa, Mihaela van der Schaar

Recurrent neural networks (RNNs) are instrumental in modelling sequential and time series data. Yet, when using RNNs to inform decision-making, predictions by themselves are not sufficient—we also need estimates of predictive uncertainty. Existing approaches for uncertainty quantification in RNNs are based predominantly on Bayesian methods; these are computationally prohibitive, and require major alterations to the RNN architecture and training. Capitalizing on ideas from classical jackknife resampling, we develop a frequentist alternative that: (a) is computationally efficient, (b) does not interfere with model training or compromise its accuracy, (c) applies to any RNN architecture, and (d) provides theoretical coverage guarantees on the estimated uncertainty intervals. Our method derives predictive uncertainty from the variability of the (jackknife) sampling distribution of the RNN outputs, which is estimated by repeatedly deleting “blocks” of (temporally-correlated) training data, and collecting the predictions of the RNN re-trained on the remaining data. To avoid computationally expensive re-training, we utilize influence functions to estimate the effect of removing training data blocks on the learned RNN parameters. Using data from a critical care medical setting, we demonstrate the utility of uncertainty quantification in sequential decision-making.

Discriminative Jackknife: Quantifying Uncertainty in Deep Learning via Higher-Order Influence Functions

Ahmed Alaa, Mihaela van der Schaar

Deep learning models achieve high predictive accuracy across a broad spectrum of tasks, but rigorously quantifying their predictive uncertainty remains challenging. Usable estimates of predictive uncertainty should (1) cover the true prediction targets with high probability, and (2) discriminate between high- and low-confidence prediction instances. Existing methods for uncertainty quantification are based predominantly on Bayesian neural networks; these may fall short of (1) and (2)— i.e., Bayesian credible intervals do not guarantee frequentist coverage, and approximate posterior inference undermines discriminative accuracy. In this paper, we develop the discriminative jackknife (DJ), a frequentist procedure that utilizes higherorder influence functions (HOIFs) of a model’s parameters to construct a jackknife (or leave-oneout) estimator of predictive confidence intervals. The DJ satisfies (1) and (2), is applicable to a wide range of deep learning models, is easy to implement, and can be applied in a post-hoc fashion without interfering with model training or compromising its accuracy. Experiments demonstrate that DJ performs competitively compared to existing Bayesian and non-Bayesian baselines.

Unlabelled Data Improves Bayesian Uncertainty Calibration under Covariate Shift

Alex Chan, Ahmed Alaa, Zhaozhi Qian, Mihaela van der Schaar

Modern neural networks have proven to be powerful function approximators, providing state-of-the-art performance in a multitude of applications. They however fall short in their ability to quantify confidence in their predictions—this is crucial in high-stakes applications that involve critical decision-making. Bayesian neural networks (BNNs) aim at solving this problem by placing a prior distribution over the network’s parameters, thereby inducing a posterior distribution that encapsulates predictive uncertainty. While existing variants of BNNs based on Monte Carlo dropout produce reliable (albeit approximate) uncertainty estimates over in-distribution data, they tend to exhibit over confidence in predictions made on target data whose feature distribution differs from the training data, i.e., the covariate shift setup. In this paper, we develop an approximate Bayesian inference scheme based on posterior regularisation, wherein unlabelled target data are used as “pseudo-labels” of model confidence that are used to regularise the model’s loss on labelled source data. We show that this approach significantly improves the accuracy of uncertainty quantification on covariate-shifted data sets, with minimal modification to the underlying model architecture. We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.

Temporal Phenotyping using Deep Predicting Clustering of Disease Progression

Changhee Lee, Mihaela van der Schaar

Due to the wider availability of modern electronic health records (EHR), patient care data is often being stored in the form of time-series. Clustering such time-series data is crucial for patient phenotyping, anticipating patients’ prognoses by identifying “similar” patients, and designing treatment guidelines that are tailored to homogeneous patient subgroups. In this paper, we develop a deep learning approach for clustering time-series data, where each cluster comprises patients who share similar future outcomes of interest (e.g., adverse events, the onset of comorbidities, etc.). The clustering is carried out by using our novel loss functions that encourage each cluster to have homogeneous future outcomes. We adopt actorcritic models to allow “back-propagation” through the sampling process that is required for assigning clusters to time-series inputs. Experiments on two real-world datasets show that our model achieves superior clustering performance over state of-the-art benchmarks and identifies meaningful clusters that can be translated into actionable information for clinical decision-making.

Learning for Dose Allocation in Adaptive Clinical Trials with Safety Constraints

Cong Shen, Sofia Villar, Zhiyang Wang, Mihaela van der Schaar

Phase I dose-finding trials are increasingly challenging as the relationship between efficacy and toxicity of new compounds (or combination of them) becomes more complex. Despite this, most commonly used methods in practice focus on identifying a Maximum Tolerated Dose (MTD) by learning only from toxicity events. We present a novel adaptive clinical trial methodology, called Safe Efficacy Exploration Dose Allocation (SEEDA), that aims at maximizing the cumulative efficacies while satisfying the toxicity safety constraint with high probability. We evaluate performance objectives that have operational meanings in practical clinical trials, including cumulative efficacy, recommendation/allocation success probabilities, toxicity violation probability, and sample efficiency. An extended SEEDAPlateau algorithm that is tailored for the increase-then- plateau efficacy behavior of molecularly targeted agents (MTA) is also presented. Through numerical experiments with both synthetic and real-world datasets, we show that SEEDA outperforms state-of-the-art clinical trial designs by finding the optimal dose with higher success rate and fewer patients.

Inverse Active Sensing: Modeling and Understanding Timely Decision-Making

Dan Jarrett, Mihaela van der Schaar

Evidence-based decision-making entails collecting (costly) observations about an underlying phenomenon of interest, and subsequently committing to an (informed) decision on the basis of accumulated evidence. In this setting, active sensing is the goal oriented problem of efficiently selecting which acquisitions to make, and when and what decision to settle on. As its complement, inverse active sensing seeks to uncover an agent’s preferences and strategy given their observable decision-making behavior. In this paper, we develop an expressive,unified framework for the general setting of evidence-based-decision-making under endogenous ,context-dependent time pressure—which requires negotiating (subjective) tradeoffs between accuracy, speediness, and cost of information. Using this language, we demonstrate how it enables modeling intuitive notions of surprise, suspense, and optimality in decision strategies (the forward problem). Finally, we illustrate how this formulation enables understanding decision-making behavior by quantifying preferences implicit in observed decision strategies (the inverse problem).

Time Series Deconfounder: Estimating Treatment Effects over Time in the Presence of Hidden Confounders

Ioana Bica, Ahmed Alaa, Mihaela van der Schaar

The estimation of treatment effects is a pervasive problem in medicine. Existing methods for estimating treatment effects from longitudinal observational data assume that there are no hidden confounders. This assumption is not testable in practice and, if it does not hold, leads to biased estimates. In this paper, we develop the Time Series Deconfounder, a method that leverages the assignment of multiple treatments over time to enable the estimation of treatment effects in the presence of multi-cause hidden confounders. The Time Series Deconfounder uses a novel recurrent neural network architecture with multitask output to build a factor model over time and infer substitute confounders that render the assigned treatments conditionally independent. Then it performs causal inference using the substitute confounders. We provide a theoretical analysis for obtaining unbiased causal effects of time-varying exposures using the Time Series Deconfounder. Using both simulations and real data to show the effectiveness of our method in deconfounding the estimation of treatment responses in longitudinal data.

For a full list of the van der Schaar Lab’s publications, click here.

Nick Maxfield

From 2020 to 2022, Nick oversaw the van der Schaar Lab’s communications, including media relations, content creation, and maintenance of the lab’s online presence.