van der Schaar Lab

Uncertainty quantification

Please note: this page is a work in progress. Please treat it as a “stub” containing only basic information, rather than a full-fledged summary of our lab’s vision for ML for uncertainty quantification and our research to date.

This page is authored and maintained by Mihaela van der Schaar and Nick Maxfield.


The successful application of machine learning models to real-world prediction problems requires us to be able to limit and quantify the uncertainty in model predictions by providing valid and accurate prediction intervals. Simply put: in addition to making a prediction, we need to know how confident we can be in this prediction. This is particularly crucial in high-stakes applications where machine learning outputs will inform critical decision-making, such as healthcare.

While machine learning models may achieve high predictive accuracy across a broad spectrum of tasks, rigorously quantifying their predictive uncertainty remains challenging. Usable estimates of predictive uncertainty should cover the true prediction targets with high probability, while discriminating between high and low confidence prediction instances.

Existing approaches to uncertainty quantification suffer from a range of problems and drawbacks: many are computationally prohibitive, may be difficult to calibrate, may incur high sample complexity, and may require major alterations to the model architecture and training. Additionally, most uncertainty quantification approaches are poorly-suited to time-series setting.

Bayesian neural networks, which are frequently used in methods for uncertainty quantification, tend to exhibit over-confidence in predictions made on target data whose feature distribution differs from the training data; furthermore, they do not provide frequentist coverage guarantees, cannot be applied post-hoc, and their approximate posterior inference undermines discriminative accuracy.

In addition to ensuring that uncertainty quantification is fully incorporated into our own AI and machine learning tools for healthcare, our lab treats the problem of uncertainty quantification itself as an important research pillar in its own right. To that end, we have developed a range of robust and powerful approaches for the healthcare setting and beyond.

As detailed below, our methods (most of which have been introduced in papers published in top-tier AI and machine learning conferences) address the shortcomings of existing approaches to uncertainty quantification, and significantly outperform benchmark methods.

Unlabelled Data Improves Bayesian Uncertainty Calibration under Covariate Shift

Alex Chan, Ahmed M. Alaa, Zhaozhi Qian, Mihaela van der Schaar

ICML 2020

Modern neural networks have proven to be powerful function approximators, providing state-of-the-art performance in a multitude of applications. They however fall short in their ability to quantify confidence in their predictions — this is crucial in high-stakes applications that involve critical decision-making.

Bayesian neural networks (BNNs) aim at solving this problem by placing a prior distribution over the network’s parameters, thereby inducing a posterior distribution that encapsulates predictive uncertainty. While existing variants of BNNs based on Monte Carlo dropout produce reliable (albeit approximate) uncertainty estimates over in-distribution data, they tend to exhibit over-confidence in predictions made on target data whose feature distribution differs from the training data, i.e., the covariate shift setup.

In this paper, we develop an approximate Bayesian inference scheme based on posterior regularisation, wherein unlabelled target data are used as “pseudo-labels” of model confidence that are used to regularise the model’s loss on labelled source data. We show that this approach significantly improves the accuracy of uncertainty quantification on covariate-shifted data sets, with minimal modification to the underlying model architecture.

We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.

Discriminative Jackknife: Quantifying Uncertainty in Deep Learning via Higher-Order Influence Functions

Ahmed M. Alaa, Mihaela van der Schaar

ICML 2020

Deep learning models achieve high predictive accuracy across a broad spectrum of tasks, but rigorously quantifying their predictive uncertainty remains challenging. Usable estimates of predictive uncertainty should (1) cover the true prediction targets with high probability, and (2) discriminate between high- and low confidence prediction instances.

Existing methods for uncertainty quantification are based predominantly on Bayesian neural networks; these may fall short of (1) and (2) — i.e., Bayesian credible intervals do not guarantee frequentist coverage, and approximate posterior inference undermines discriminative accuracy.

In this paper, we develop the discriminative jackknife (DJ), a frequentist procedure that utilizes influence functions of a model’s loss functional to construct a jackknife (or leave one-out) estimator of predictive confidence intervals. The DJ satisfies (1) and (2), is applicable to a wide range of deep learning models, is easy to implement, and can be applied in a post-hoc fashion without interfering with model training or compromising its accuracy.

Experiments demonstrate that DJ performs competitively compared to existing Bayesian and non-Bayesian regression baselines.

Frequentist Uncertainty in Recurrent Neural Networks via Blockwise Influence Functions

Ahmed M. Alaa, Mihaela van der Schaar

ICML 2020

Recurrent neural networks (RNNs) are instrumental in modelling sequential and time-series data. Yet, when using RNNs to inform decision-making, predictions by themselves are not sufficient — we also need estimates of predictive uncertainty. Existing approaches for uncertainty quantification in RNNs are based predominantly on Bayesian methods; these are computationally prohibitive, and require major alterations to the RNN architecture and training.

Capitalizing on ideas from classical jackknife resampling, we develop a frequentist alternative that: (a) does not interfere with model training or compromise its accuracy, (b) applies to any RNN architecture, and (c) provides theoretical coverage guarantees on the estimated uncertainty intervals.

Our method derives predictive uncertainty from the variability of the (jackknife) sampling distribution of the RNN outputs, which is estimated by repeatedly deleting “blocks” of (temporally-correlated) training data, and collecting the predictions of the RNN re-trained on the remaining data. To avoid exhaustive re-training, we utilize influence functions to estimate the effect of removing training data blocks on the learned RNN parameters.

Using data from a critical care setting, we demonstrate the utility of uncertainty quantification in sequential decision-making.

Robust Recursive Partitioning for Heterogeneous Treatment Effects with Uncertainty Quantification

Hyun-Suk Lee, Yao Zhang, William R. Zame, Cong Shen, Jang-Won Lee, Mihaela van der Schaar, Mihaela van der Schaar

NeurIPS 2020

Subgroup analysis of treatment effects plays an important role in applications from medicine to public policy to recommender systems. It allows physicians (for example) to identify groups of patients for whom a given drug or treatment is likely to be effective and groups of patients for which it is not.

Most of the current methods of subgroup analysis begin with a particular algorithm for estimating individualized treatment effects (ITE) and identify subgroups by maximizing the difference across subgroups of the average treatment effect in each subgroup. These approaches have several weaknesses: they rely on a particular algorithm for estimating ITE, they ignore (in)homogeneity within identified subgroups, and they do not produce good confidence estimates.

This paper develops a new method for subgroup analysis, R2P, that addresses all these weaknesses. R2P uses an arbitrary, exogenously prescribed algorithm for estimating ITE and quantifies the uncertainty of the ITE estimation, using a construction that is more robust than other methods.

Experiments using synthetic and semi-synthetic datasets (based on real data) demonstrate that R2P constructs partitions that are simultaneously more homogeneous within groups and more heterogeneous across groups than the partitions produced by other methods. Moreover, because R2P can employ any ITE estimator, it also produces much narrower confidence intervals with a prescribed coverage guarantee than other methods.

Conformal Time-Series Forecasting

Kamilė Stankevičiūtė, Ahmed M. Alaa, Mihaela van der Schaar

NeurIPS 2021

Current approaches for (multi-horizon) time-series forecasting using recurrent neural networks (RNNs) focus on issuing point estimates, which are insufficient for informing decision-making in critical application domains wherein uncertainty estimates are also required.

Existing methods for uncertainty quantification in RNN-based time-series forecasts are limited as they may require significant alterations to the underlying architecture, may be computationally complex, may be difficult to calibrate, may incur high sample complexity, and may not provide theoretical validity guarantees for the issued uncertainty intervals.

In this work, we extend the inductive conformal prediction framework to the time-series forecasting setup, and propose a lightweight uncertainty estimation procedure to address the above limitations. With minimal exchangeability assumptions, our approach provides uncertainty intervals with theoretical guarantees on frequentist coverage for multi-horizon forecast predictor and dataset.

We demonstrate the effectiveness of the conformal forecasting framework by comparing it with existing baselines on a variety of synthetic and real-world datasets.

AutoCP: Automated Pipelines for Accurate Prediction Intervals

Yao Zhang, William R. Zame, Mihaela van der Schaar

Submitted for publication, 2020

Successful application of machine learning models to real-world prediction problems, e.g. financial forecasting and personalized medicine, has proved to be challenging, because such settings require limiting and quantifying the uncertainty in the model predictions, i.e. providing valid and accurate prediction intervals.

Conformal Prediction is a distribution-free approach to construct valid prediction intervals in finite samples. However, the prediction intervals constructed by Conformal Prediction are often (because of over-fitting, inappropriate measures of nonconformity, or other issues) overly conservative and hence inadequate for the application(s) at hand.

This paper proposes an AutoML framework called Automatic Machine Learning for Conformal Prediction (AutoCP). Unlike the familiar AutoML frameworks that attempt to select the best prediction model, AutoCP constructs prediction intervals that achieve the user-specified target coverage rate while optimizing the interval length to be accurate and less conservative.

We tested AutoCP on a variety of datasets and found that it significantly outperforms benchmark algorithms.

Improving Adaptive Conformal Prediction Using Self-Supervised Learning

Nabeel Seedat*, Alan Jeffares*, Fergus Imrie, Mihaela van der Schaar

AISTATS 2023

Conformal prediction is a powerful distribution-free tool for uncertainty quantification, establishing valid prediction intervals with finite-sample guarantees. To produce valid intervals which are also adaptive to the difficulty of each instance, a common approach is to compute normalized nonconformity scores on a separate calibration set. Self-supervised learning has been effectively utilized in many domains to learn general representations for downstream predictors. However, the use of self-supervision beyond model pretraining and representation learning has been largely unexplored. In this work, we investigate how self-supervised pretext tasks can improve the quality of the conformal regressors, specifically by improving the adaptability of conformal intervals. We train an auxiliary model with a self-supervised pretext task on top of an existing predictive model and use the self-supervised error as an additional feature to estimate nonconformity scores. We empirically demonstrate the benefit of the additional information using both synthetic and real data on the efficiency (width), deficit, and excess of conformal prediction intervals.

Learn more and get involved

Our research related to uncertainty quantification is closely linked to a number of our lab’s other core areas of focus. If you’re interested in branching out, we’d recommend reviewing our summaries on interpretable machine learning and time series in healthcare.

We would also encourage you to stay up-to-date with ongoing developments in this and other areas of machine learning for healthcare by signing up to take part in one of our two streams of online engagement sessions.

If you are a practicing clinician, please sign up for Revolutionizing Healthcare, which is a forum for members of the clinical community to share ideas and discuss topics that will define the future of machine learning in healthcare (no machine learning experience required).

If you are a machine learning student, you can join our Inspiration Exchange engagement sessions, in which we introduce and discuss new ideas and development of new methods, approaches, and techniques in machine learning for healthcare.

A full list of our papers on uncertainty quantification and related topics can be found here.