van der Schaar Lab

van der Schaar Lab at NeurIPS 2022: 7 papers, hosting 2 workshops, and presenting at 3 others

Note: this post originally appeared on 21 September, but was updated and republished on 31 October with details regarding papers and workshops.

The van der Schaar Lab will make another strong showing at NeurIPS – widely considered the world’s largest and most prestigious AI and machine learning research conference—with 7 papers accepted for publication and the organisation of 2 impactful workshops (on Synthetic Data and Causal Deep Learning) this year.

This year’s NeurIPS conference will run from 28 November to 9 December.


All 7 papers accepted for publication at NeurIPS 2022 highlight our lab’s continuing and impactful research on topics such as machine learning interpretability, data-centric AI, unsupervised ensemble learning, feature selection, transfer learning, treatment effect estimation, and, last but not least, augmenting human skills using machine learning. This research furthers the lab’s research agenda on developing cutting-edge machine learning for transforming healthcare.

Titles, authors and abstracts for all 7 accepted papers are given below.

Concept Activation Regions:
A Generalized Framework for Concept-Based Explanations

Jonathan Crabbé, Mihaela van der Schaar

Concept-based explanations permit to understand the predictions of a deep neural network (DNN) through the lens of concepts specified by users. Existing methods assume that the examples illustrating a concept are mapped in a fixed direction of the DNN’s latent space. When this holds true, the concept can be represented by a concept activation vector (CAV) pointing in that direction.

In this work, we propose to relax this assumption by allowing concept examples to be scattered across different clusters in the DNN’s latent space. Each concept is then represented by a region of the DNN’s latent space that includes these clusters and that we call concept activation region (CAR). To formalize this idea, we introduce an extension of the CAV formalism that is based on the kernel trick and support vector classifiers. This CAR formalism yields global concept-based explanations and local concept-based feature importance. We prove that CAR explanations built with radial kernels are invariant under latent space isometries.

In this way, CAR assigns the same explanations to latent spaces that have the same geometry. We further demonstrate empirically that CARs offer (1) more accurate descriptions of how concepts are scattered in the DNN’s latent space; (2) global explanations that are closer to human concept annotations and (3) concept-based feature importance that meaningfully relate concepts with each other. Finally, we use CARs to show that DNNs can autonomously rediscover known scientific concepts, such as the prostate cancer grading system.

Benchmarking Heterogeneous Treatment Effect Models through the Lens of Interpretability

Jonathan Crabbé, Alicia Curth, Ioana Bica, Mihaela van der Schaar

Estimating personalized effects of treatments is a complex, yet pervasive problem. To tackle it, recent developments in the machine learning (ML) literature on heterogeneous treatment effect estimation gave rise to many sophisticated, but opaque, tools: due to their flexibility, modularity and ability to learn constrained representations, neural networks in particular have become central to this literature.

Unfortunately, the assets of such black boxes come at a cost: models typically involve countless nontrivial operations, making it difficult to understand what they have learned. Yet, understanding these models can be crucial — in a medical context, for example, discovered knowledge on treatment effect heterogeneity could inform treatment prescription in clinical practice. In this work, we therefore use post-hoc feature importance methods to identify features that influence the model’s predictions. This allows us to evaluate treatment effect estimators along a new and important dimension that has been overlooked in previous work: We construct a benchmarking environment to empirically investigate the ability of personalized treatment effect models to identify predictive covariates — covariates that determine differential responses to treatment.

Our benchmarking environment then enables us to provide new insight into the strengths and weaknesses of different types of treatment effects models as we modulate different challenges specific to treatment effect estimation — e.g. the ratio of prognostic to predictive information, the possible nonlinearity of potential outcomes and the presence and type of confounding.

Online Decision Mediation from Scratch

Daniel Jarrett, Alihan Hüyük, Mihaela van der Schaar

Consider learning a decision support assistant to serve as an intermediary between (oracle) expert behavior and (imperfect) human behavior: At each time, the algorithm observes an action chosen by a fallible agent, and decides whether to accept that agent’s decision, intervene with an alternative, or request the expert’s opinion.

For instance, in clinical diagnosis, fully-autonomous machine behavior is often beyond ethical affordances, thus real-world decision support is often limited to monitoring and forecasting. Instead, such an intermediary would strike a prudent balance between the former (purely prescriptive) and latter (purely descriptive) approaches, while providing an efficient interface between human mistakes and expert feedback. In this work, we first formalize the sequential problem of online decision mediation—that is, of simultaneously learning and evaluating mediator policies from scratch with abstentive feedback: In each round, deferring to the oracle obviates the risk of error, but incurs an upfront penalty, and reveals the otherwise hidden expert action as a new training data point. Second, we motivate and propose a solution that seeks to trade off (immediate) loss terms against (future) improvements in generalization error; in doing so, we identify why conventional bandit algorithms may fail.

Finally, through experiments and sensitivities on a variety of datasets, we illustrate consistent gains over applicable benchmarks on performance measures with respect to the mediator policy, the learned model, and the decision-making system as a whole.

Data-IQ: Characterizing subgroups with heterogeneous outcomes in tabular data

Nabeel Seedat, Jonathan Crabbé, Ioana Bica, Mihaela van der Schaar

High model performance, on average, can hide that models may systematically underperform on subgroups of the data. We consider the tabular setting, which surfaces the unique issue of outcome heterogeneity – this is prevalent in areas such as healthcare, where patients with similar features can have different outcomes, thus making reliable predictions challenging.

To tackle this, we propose Data-IQ, a framework to systematically stratify examples into subgroups with respect to their outcomes. We do this by analyzing the behavior of individual examples during training, based on their predictive confidence and, importantly, the aleatoric (data) uncertainty. Capturing the aleatoric uncertainty permits a principled characterization and then subsequent stratification of data examples into three distinct subgroups (Easy, Ambiguous, Hard). We experimentally demonstrate the benefits of Data-IQ on four real-world medical datasets. We show that Data-IQ’s characterization of examples is most robust to variation across similarly performant (yet different) models, compared to baselines.

Since Data-IQ can be used with any ML model (including neural networks, gradient boosting etc.), this property ensures consistency of data characterization, while allowing flexible model selection. Taking this a step further, we demonstrate that the subgroups enable us to construct new approaches to both feature acquisition and dataset selection. Furthermore, we highlight how the subgroups can inform reliable model usage, noting the significant impact of the Ambiguous subgroup on model generalization.

Synthetic Model Combination: An Instance-wise Approach to Unsupervised Ensemble Learning

Alex Chan, Mihaela van der Schaar

Consider making a prediction over new test data without any opportunity to learn from a training set of labelled data – instead given access to a set of expert models and their predictions alongside some limited information about the dataset used to train them.

In scenarios from finance to the medical sciences, and even consumer practice, stakeholders have developed models on private data they either cannot, or do not want to, share. Given the value and legislation surrounding personal information, it is not surprising that only the models, and not the data, will be released – the pertinent question becoming: how best to use these models? Previous work has focused on global model selection or ensembling, with the result of a single final model across the feature space. Machine learning models perform notoriously poorly on data outside their training domain however, and so we argue that when ensembling models the weightings for individual instances must reflect their respective domains – in other words models that are more likely to have seen information on that instance should have more attention paid to them.

We introduce a method for such an instance-wise ensembling of models, including a novel representation learning step for handling sparse high-dimensional domains.

Finally, we demonstrate the need and generalisability of our method on classical machine learning tasks as well as highlighting a real world use case in the pharmacological setting of vancomycin precision dosing.

Transfer Learning on Heterogeneous Feature Spaces for Treatment Effects Estimation

Ioana Bica, Mihaela van der Schaar

Consider the problem of improving the estimation of conditional average treatment effects (CATE) for a target domain of interest by leveraging related information from a source domain with a different feature space.

This heterogeneous transfer learning problem for CATE estimation is ubiquitous in areas such as healthcare where we may wish to evaluate the effectiveness of a treatment for a new patient population for which different clinical covariates and limited data are available. In this paper, we address this problem by introducing several building blocks that use representation learning to handle the heterogeneous feature spaces and a flexible multi-task architecture with shared and private layers to transfer information between potential outcome functions across domains.

Then, we show how these building blocks can be used to recover transfer learning equivalents of the standard CATE learners. On a new semi-synthetic data simulation benchmark for heterogeneous transfer learning we not only demonstrate performance improvements of our heterogeneous transfer causal effect learners across datasets, but also provide insights into the differences between these learners from a transfer perspective.

Composite Feature Selection Using Deep Ensembles

Fergus Imrie, Alexander Norcliffe, Pietro Lio, Mihaela van der Schaar

Workshops hosted by our lab

The van der Schaar lab will be organising two workshops at NeurIPS.

On Friday 2 December, for the SyntheticData4ML workshop, the van der Schaar Lab will bring together research communities in generative models, privacy, and fairness as well as industry leaders in a joint effort to develop the theory, methodology, and algorithms to generate synthetic benchmark datasets with the goal of enabling ethical and reproducible ML research.

Also on Friday 2 December, the van der Schaar Lab will present a workshop on Causality for Real-world Impact: CML4Impact 22. Real-world problems aren’t granted the luxury of making strict assumptions, yet still require causal thinking to solve. Armed with the rigour of causality, and the can-do-attitude of machine learning, the lab believes the time is ripe to start working towards solving real-world problems.

Our researchers present at these Workshops

On Friday 2 December, Krzysztof Kacprzyk will present about his paper “D-Cipher: Discovery of Closed-form Partial Differential Equations” at the AI4Science Workshop, co-authored by Zhaozhi Qian and Mihaela van der Schaar.

At the Algorithmic Fairness through the Lens of Causality and Privacy on Saturday 3 December, Tennison Liu will present his paper “Practical Approaches for Fair Learning with Multitype and Multivariate Sensitive Attributes”, co-authored by Alex Chan, Boris van Breugel, and Mihaela van der Schaar.

On Monday 5 December, Alicia Curth will present a poster at the virtual component of the Women in Machine Learning (WiML) Workshop. The poster is titled “Adaptively Identifying Patient Populations with Treatment Benefit in Clinical Trials” and is authored by Alicia, Alihan Hüyük, and Mihaela van der Schaar.

The conference on Neural Information Processing Systems (NeurIPS) is the largest and most prestigious conference in AI and machine learning.

The purpose of NeurIPS is to foster the exchange of research on neural information processing systems in their biological, technological, mathematical, and theoretical aspects. The core focus is peer-reviewed novel research which is presented and discussed in the general session, along with invited talks by leaders in their field.

The conference was founded in 1987 and is now a multi-track interdisciplinary annual meeting that includes invited talks, demonstrations, symposia, and oral and poster presentations of refereed papers. Along with the conference is a professional exposition focusing on machine learning in practice, a series of tutorials, and topical workshops that provide a less formal setting for the exchange of ideas.

This year’s NeurIPS conference will run from 28 November to 9 December.


The full NeurIPS 2022 schedule will be available here.

For a full list of the van der Schaar Lab’s publications, click here.

Andreas Bedorf