van der Schaar Lab

van der Schaar Lab at ICML 2022: 7 papers and 3 workshops 

Note: this post originally appeared on 17 May, but was updated and republished on 18 July with details regarding poster sessions and workshops.

When ICML 2022 takes place from 17 – 23 July, the van der Schaar lab will be well-represented with 7 accepted papers at this leading international academic conference in machine learning.

Along with NeurIPS and ICLR, ICML is one of the 3 primary conferences of high impact in machine learning and artificial intelligence research.

During our 29 June Inspiration Exchange session, we brought together our students to give an introduction to their ICML papers, you can watch this here:


All 7 papers accepted for publication at ICML 2022 highlight our lab’s continuing and impactful research. Our researchers will be able to present on topics such as discovering diverse classes of differential equations from data, machine learning interpretability, synthetic data, data-centric AI, AutoML, data imputation, individualised treatment effects, and, last but not least, augmenting human skills using machine learning.

Based on the work at the van der Schaar lab on cutting-edge machine learning methods for complex and important problems, all of the papers offer substantial technical contributions in areas that we believe are particularly promising and that further the lab’s research agenda for healthcare.

Titles, authors, and abstracts for all 7 papers are given below. We have now also added presentation and poster-session times.

For an introduction to some of the highlights of this year’s contributions, we have produced in-a-nutshell-videos for our paper on how faithful is your Synthetic Data, Neural Laplace, Data-SUITE, and HyperImpute.

Neural Laplace: Learning diverse classes of differential equations in the Laplace domain

Samuel Holt, Zhaozhi Qian, Mihaela van der Schaar

Neural Ordinary Differential Equations model dynamical systems with ODEs learned by neural networks. However, ODEs are fundamentally inadequate to model systems with long-range dependencies or discontinuities, which are common in engineering and biological systems.

Broader classes of differential equations (DE) have been proposed as remedies, including delay differential equations and integro-differential equations. Furthermore, Neural ODE suffers from numerical instability when modelling stiff ODEs and ODEs with piecewise forcing functions. In this work, we propose Neural Laplace, a unified framework for learning diverse classes of DEs including all the aforementioned ones. Instead of modelling the dynamics in the time domain, we model it in the Laplace domain, where the history-dependencies and discontinuities in time can be represented as summations of complex exponentials. To make learning more efficient, we use the geometrical stereographic map of a Riemann sphere to induce more smoothness in the Laplace domain.

In the experiments, Neural Laplace shows superior performance in modelling and extrapolating the trajectories of diverse classes of DEs, including the ones with complex history dependency and abrupt changes.

Neural Laplace goes beyond Neural ODE and provides a unified framework for learning diverse classes of differential equations including ODE, delay DE, integro DE and
more. Instead of modeling the dynamics in the time domain, it models the system in the Laplace domain, where the history-dependencies and discontinuities in time can be
represented as summations of complex exponentials.

Learning differential equations that govern dynamical systems is of great practical interest in the natural and social sciences. Experimentally Neural Laplace shows superior performance in modeling and extrapolating the trajectories of diverse classes of DEs, including ones with complex history dependency and abrupt changes.

Label-Free Explainability for Unsupervised Models

Jonathan Crabbé, Mihaela van der Schaar

Unsupervised black-box models are challenging to interpret. Indeed, most existing explainability methods require labels to select which component(s) of the black-box’s output to interpret. In the absence of labels, black-box outputs often are representation vectors whose components do not correspond to any meaningful quantity. Hence, choosing which component(s) to interpret in a label-free unsupervised/self-supervised setting is an important, yet unsolved problem.

To bridge this gap in the literature, we introduce two crucial extensions of post-hoc explanation techniques: (1) label-free feature importance and (2) label-free example importance that respectively highlight influential features and training examples for a black-box to construct representations at inference time.

We demonstrate that our extensions can be successfully implemented as simple wrappers around many existing feature and example importance methods.

We illustrate the utility of our label-free explainability paradigm through a qualitative and quantitative comparison of representation spaces learned by various autoencoders trained on distinct unsupervised tasks.

Label-Free Explainability extends feature and example importance explanations to unsupervised models. This permits to interpret unsupervised models, which was not possible before.

Explainability allows us to have a deeper understanding of how machine learning models work. This is crucial if we want to predict their limitation, improve them and even extract
knowledge from them.

Inverse Contextual Bandits: Learning How Behavior Evolves over Time

Alihan Hüyük, Daniel Jarrett, Mihaela van der Schaar

Understanding a decision-maker’s priorities by observing their behavior is critical for transparency and accountability in decision processes— such as in healthcare. Though conventional approaches to policy learning almost invariably assume stationarity in behavior, this is hardly true in practice: Medical practice is constantly evolving as clinical professionals fine-tune their knowledge over time.

For instance, as the medical community’s understanding of organ transplantations has progressed over the years, a pertinent question is: How have actual organ allocation policies been evolving? To give an answer, we desire a policy learning method that provides interpretable representations of decision-making, in particular capturing an agent’s non-stationary knowledge of the world, as well as operating in an offline manner.

First, we model the evolving behavior of decision-makers in terms of contextual bandits, and formalize the problem of Inverse Contextual Bandits (ICB).

Second, we propose two concrete algorithms as solutions, learning parametric and nonparametric representations of an agent’s behavior.

Finally, using both real and simulated data for liver transplantations, we illustrate the applicability and explainability of our method, as well as benchmarking and validating its accuracy.

ICB learns interpretable representations of time-varying behavior. It contrasts conventional
approaches to policy learning that almost invariably assume stationarity in behavior.

Learning a quantitative and interpretable account of how clinical practice has evolved over
time in response to new medical knowledge is crucial: It would enable policy-makers to
objectively evaluate if the policies they introduced have had the intended impact on practice.
This would play a substantial role in designing better guidelines going forward.

HyperImpute: Generalized Iterative Imputation with Automatic Model Selection

Bogdan Cebere, Daniel Jarrett, Tennison Liu, Alicia Curth, Mihaela van der Schaar

Consider the problem of imputing missing values in a dataset. One the one hand, conventional approaches using iterative imputation benefit from the simplicity and customizability of learning conditional distributions directly, but suffer from the practical requirement for appropriate model specification of each and every variable. On the other hand, recent methods using deep generative modelling benefit from the capacity and efficiency of learning with neural network function approximators, but are often difficult to optimize and rely on stronger data assumptions.

In this work, we study an approach that marries the advantages of both: We propose HyperImpute, a generalized iterative imputation framework for adaptively and automatically configuring column-wise models and their hyperparameters. Practically, we provide a concrete implementation with out-of-the-box learners, optimizers, simulators, and extensible interfaces.

Empirically, we investigate this framework via comprehensive experiments and sensitivities on a variety of public datasets, and demonstrate its ability to generate accurate imputations relative to a strong suite of benchmarks. Contrary to recent work, we believe our findings constitute a strong defense of the iterative imputation paradigm.

HyperImpute is a generalized iterative imputation framework leveraging for adaptively and automatically configuring column-wise imputation models and their hyperparameters.

Missing data is a ubiquitous problem in practical scenarios. We present a practical software package with out-of-the-box tools that can be used for imputation on real-world tabular datasets.

How Faithful is your Synthetic Data? Sample-level Metrics for Evaluating and Auditing Generative Models

Ahmed Alaa, Boris van Breugel, Evgeny Saveliev, Mihaela van der Schaar

Devising domain- and model-agnostic evaluation metrics for generative models is an important and as yet unresolved problem. Most existing metrics, which were tailored solely to the image synthesis application, exhibit a limited capacity for diagnosing modes of failure of generative models across broader application domains. In this paper, we introduce a 3-dimensional metric, (-Precision, -Recall, Authenticity), that characterizes the fidelity, diversity and generalization performance of any generative model in a wide variety of application domains.

Our metric unifies statistical divergence measures with precision-recall analysis, enabling sample- and distribution-level diagnoses of model fidelity and diversity. We introduce generalization as an additional dimension for model performance that quantifies the extent to which a model copies training data—a crucial performance indicator when modeling sensitive and private data.

The three metric components are interpretable probabilistic quantities, and can be estimated via sample-level binary classification. The sample-level nature of our metric inspires a novel use case which we call model auditing, wherein we judge the quality of individual samples generated by a (black-box) model, discarding low-quality samples and hence improving the overall model performance in a post-hoc manner.

We propose metrics for quantifying the fidelity, diversity and privacy of synthetic data on the sample-level. To compute these metrics, we use OneClass embeddings, which can be used for multiple data modalities (tabular, image, time-series).

Data is fundamental to research, yet privacy concerns usually prohibit sharing medical data. Synthetic data—data that is generated from scratch but looks real data—is a solution. The proposed metrics allow practitioners to verify that their generated synthetic data meets quality and privacy requirements. As we show, the proposed sample-level metrics also allow auditing of synthetic data—discarding samples with insufficient fidelity.

Data-SUITE: Data-centric identification of in-distribution incongruous examples

Nabeel Seedat, Jonathan Crabbé, Mihaela van der Schaar

Systematic quantification of data quality is critical for consistent model performance. Prior works have focused on out-of-distribution data. Instead, we tackle an understudied yet equally important problem of characterizing incongruous regions of in-distribution (ID) data, which may arise from feature space heterogeneity. To this end, we propose a paradigm shift with Data-SUITE: a datacentric framework to identify these regions, independent of a task-specific model.

DATA-SUITE leverages copula modeling, representation learning, and conformal prediction to build featurewise confidence interval estimators based on a set of training instances. These estimators can be used to evaluate the congruence of test instances with respect to the training set, to answer two practically useful questions: (1) which test instances will be reliably predicted by a model trained with the training instances? and (2) can we identify incongruous regions of the feature space so that data owners understand the data’s limitations or guide future data collection?

We empirically validate Data-SUITE’s performance and coverage guarantees and demonstrate on cross-site medical data, biased data, and data with concept drift, that Data-SUITE best identifies ID regions where a downstream model may be reliable (independent of said model). We also illustrate how these identified regions can provide insights into datasets and highlight their limitations.

Data-SUITE is a new data-centric AI framework tackling an understudied data quality problem: characterizing incongruous regions of in-distribution data. It identifies impactful data instances in a model independent manner.

Assists practitioners to systematically probe their data and in doing so answering two practically useful questions: (1) Reliable Model Deployment: identify which test instances will be reliably predicted by a downstream model trained with the training instances? and (2) Insightful Data Exploration: identify incongruous regions of the feature space (e.g., sub-population bias or under-representation), to help data owners understand the data’s limitations or guide future data collection.

Continuous-Time Modeling of Counterfactual Outcomes Using Neural Controlled Differential Equations

Nabeel Seedat, Fergus Imrie, Alexis Bellot, Zhaozhi Qian, Mihaela van der Schaar

Systematic quantification of data quality is critical for consistent model performance. Prior works have focused on out-of-distribution data. Instead, we tackle an understudied yet eEstimating counterfactual outcomes over time has the potential to unlock personalized healthcare by assisting decision-makers to answer “what-if” questions. Existing causal inference approaches typically consider regular, discrete-time intervals between observations and treatment decisions and hence are unable to naturally model irregularly sampled data, which is the common setting in practice.

To handle arbitrary observation patterns, we interpret the data as samples from an underlying continuous-time process and propose to model its latent trajectory explicitly using the mathematics of controlled differential equations. This leads to a new approach, the Treatment Effect Neural Controlled Differential Equation (TE-CDE), that allows the potential outcomes to be evaluated at any time point. In addition, adversarial training is used to adjust for time-dependent confounding which is critical in longitudinal settings and is an added challenge not encountered in conventional time-series.

To assess solutions to this problem, we propose a controllable simulation environment based on a model of tumor growth for a range of scenarios with irregular sampling reflective of a variety of clinical scenarios. TE-CDE consistently outperforms existing approaches in all simulated scenarios with irregular sampling.

TE-CDE leverages neural controlled differential equations (CDEs) to model counterfactual outcomes in continuous time, framing the evolution of a patient’s latent state as the solution to a CDE.

To answer what treatment should be administered and when it should be given, we must reliably estimate the effect of a treatment or sequence of treatments. Often, we must do this from observational data; however, typically observational data is irregularly sampled, with inconsistent sampling times across patients. TE-CDE can accurately estimate counterfactuals in this scenario, outperforming previous approaches which rely on discretization schemes.


The van der Schaar lab will be involved in three workshops at ICML. On Friday 22 July, Mihaela van der Schaar will give a keynote, answering the question: ‘did machine learning make any difference in the COVID-19 pandemic?’ for the 1st ever workshop on Healthcare AI and Covid-19.

Also on Friday 22 July, the Adaptive Experimental Design and Active Learning in the Real World (ReALML) workshop will include a Spotlight presentation about ‘Adaptively Identifying Good Patient Populations in Clinical Trials‘. Furthermore, said paper will be presented as poster alongside ‘Identifying Good Arms Fast and with Confidence: Strategies and Empirical Insights‘, both authored by Alicia Curth, Alihan Hüyük, and Mihaela van der Schaar.

The Workshop trifecta will be concluded on Saturday 23 July, with Hao Sun, Boris van Breugel, Jonathan Crabbé, Nabeel Seedat, and Mihaela van der Schaar presenting a poster about ‘DAUX: a Density-based Approach for Uncertainty Explanations’ at the Distribution-Free Uncertainty Quantification Workshop.

The International Conference on Machine Learning (ICML) is the premier gathering of professionals dedicated to the advancement of the branch of artificial intelligence known as machine learning.

ICML is globally renowned for presenting and publishing cutting-edge research on all aspects of machine learning used in closely related areas like artificial intelligence, statistics and data science, as well as important application areas such as machine vision, computational biology, speech recognition, and robotics.

ICML is one of the fastest growing artificial intelligence conferences in the world. Participants at ICML span a wide range of backgrounds, from academic and industrial researchers, to entrepreneurs and engineers, to graduate students and postdocs.


The full ICML 2022 schedule will be available here.

For a full list of the van der Schaar Lab’s publications, click here.

Andreas Bedorf