The van der Schaar Lab’s work will be highly visible at ICML 2021 (July 18-24), the leading international academic conference in machine learning. Mihaela van der Schaar and postdoc Ahmed Alaa will deliver a tutorial on synthetic data, while the lab’s researchers will publish 4 papers and deliver talks at 3 separate workshops being held as part of the conference—as well as co-organizing a workshop.
Along with NeurIPS and ICLR, ICML is one of the 3 primary conferences of high impact in machine learning and artificial intelligence research.
Tutorial on synthetic data (July 19)
On July 19 at 17:00 CEST (16:00 BST; other time zones here), Mihaela van der Schaar and postdoc Ahmed Alaa will deliver a tutorial entitled “Synthetic healthcare data generation and assessment: challenges, methods, and impact on machine learning.”
Covering both high-level theory and specific methods and approaches, the tutorial will advance and explain a vision for synthetic data to help catalyze a revolution in healthcare by breaking the current logjam in data availability. In addition, Mihaela and Ahmed will discuss the issue of evaluating the quality of synthetic data and the performance of generative models; they will highlight the challenges associated with evaluating generative models as compared to discriminative predictions, and present various metrics that can be used to quantify different aspects of synthetic data quality.
Papers accepted for publication
All 4 papers accepted for publication at ICML 2021 represent cutting-edge machine learning methods on complex and important problems, including understanding human decision-making, individualized treatment effects, time series analysis, and machine learning interpretability. All make substantial technical contributions in areas the van der Schaar Lab believes to be particularly promising, and which further the lab’s research agenda for healthcare.
Titles, authors and abstracts for all 4 papers are given below.
Inverse Decision Modeling: Learning Interpretable Representations of Behavior
Decision analysis deals with modeling and enhancing decision processes. A principal challenge in improving behavior is in obtaining a transparent description of existing behavior in the first place.
In this paper, we develop an expressive, unifying perspective on inverse decision modeling: a framework for learning parameterized representations of sequential decision behavior.
First, we formalize the forward problem (as a normative standard), subsuming common classes of control behavior.
Second, we use this to formalize the inverse problem (as a descriptive model), generalizing existing work on imitation/reward learning—while opening up a much broader class of research problems in behavior representation.
Finally, we instantiate this approach with an example (inverse bounded rational control), illustrating how this structure enables learning (interpretable) representations of (bounded) rationality—while naturally capturing intuitive notions of suboptimal actions, biased beliefs, and imperfect knowledge of environments.
How can we explain the predictions of a machine learning model?
When the data is structured as a multivariate time series, this question induces additional difficulties such as the necessity for the explanation to embody the time dependency and the large number of inputs.
To address these challenges, we propose dynamic masks (Dynamask). This method produces instance-wise importance scores for each feature at each time step by fitting a perturbation mask to the input sequence. In order to incorporate the time dependency of the data, Dynamask studies the effects of dynamic perturbation operators. In order to tackle the large number of inputs, we propose a scheme to make the feature selection parsimonious (to select no more feature than necessary) and legible (a notion that we detail by making a parallel with information theory).
With synthetic and real-world data, we demonstrate that the dynamic underpinning of Dynamask, together with its parsimony, offer a neat improvement in the identification of feature importance over time. The modularity of Dynamask makes it ideal as a plug-in to increase the transparency of a wide range of machine learning models in areas such as medicine and finance, where time series are abundant.
Counterfactual estimation using synthetic controls is one of the most successful recent methodological developments in causal inference. Despite its popularity, the current description only considers time series aligned across units and synthetic controls expressed as linear combinations of observed control units.
We propose a continuous-time alternative that models the latent counterfactual path explicitly using the formalism of controlled differential equations.
This model is directly applicable to the general setting of irregularly-aligned multivariate time series and may be optimized in rich function spaces – thereby substantially improving on some limitations of existing approaches.
Learning Queueing Policies for Organ Transplantation Allocation using Interpretable Counterfactual Survival Analysis
Organ transplantation is often the last resort for treating end-stage illnesses, but managing transplant wait-lists is challenging because of organ scarcity and the complexity of assessing donor-recipient compatibility.
In this paper, we develop a data-driven model for (real-time) organ allocation using observational data for transplant outcomes. Our model integrates a queuing-theoretic framework with unsupervised learning to cluster the organs into “organ types”, and then construct priority queues (associated with each organ type) wherein incoming patients are assigned. To reason about organ allocations, the model uses synthetic controls to infer a patient’s survival outcomes under counterfactual allocations to the different organ types– the model is trained end-to-end to optimize the trade-off between patient waiting time and expected survival time. The usage of synthetic controls enable patient-level interpretations of allocation decisions that can be presented and understood by clinicians.
We test our model on multiple data sets, and show that it outperforms other organ-allocation policies in terms of added life-years, and death count. Furthermore, we introduce a novel organ-allocation simulator to accurately test new policies.
Keynote at Interpretable Machine Learning for Healthcare Workshop (July 23)
On July 23 at 06:30 PDT (14:30 BST; other time zones here), Mihaela van der Schaar will give a keynote as part of the Interpretable Machine Learning in Healthcare (IMLH) Workshop.
The talk, entitled “Quantitative epistemology: conceiving a new human-machine partnership,” will introduce and explain a brand new area of research pioneered by the van der Schaar Lab with the aim of using AI and machine learning to understand and empower human learning and decision-making.
Keynote at Time Series Workshop (July 24)
On July 24 at 09:00 PDT (17:00 BST; other time zones here), Mihaela van der Schaar will deliver a keynote as part of the Time Series Workshop.
Her talk will be entitled “Time-series in healthcare: challenges and solutions,” and will explore new approaches to building dynamic models that incorporate time series datasets available in healthcare.
Presentation of paper at Neglected Assumptions in Causal Inference Workshop (July 23)
On July 23, Ph.D. student Alicia Curth will present a paper entitled “Doing Great at Estimating CATE? On the Neglected Assumptions in Benchmark Comparisons of Treatment Effect Estimators” as part of the Workshop on The Neglected Assumptions in Causal Inference (NACI).
Self-Supervised Learning for Reasoning and Perception Workshop (July 24)
Mihaela van der Schaar is on the organizing team for the Self-Supervised Learning for Reasoning and Perception Workshop, which will take place on July 24. The workshop will bring together SSL-interested researchers from various domains to discuss how to develop SSL methods for reasoning tasks, such as how to design pretext tasks for symbolic reasoning, how to develop contrastive learning methods for relational reasoning, how to develop SSL approaches to bridge reasoning and perception, etc.
About ICML 2021
The International Conference on Machine Learning (ICML) is the premier gathering of professionals dedicated to the advancement of the branch of artificial intelligence known as machine learning.
ICML is globally renowned for presenting and publishing cutting-edge research on all aspects of machine learning used in closely related areas like artificial intelligence, statistics and data science, as well as important application areas such as machine vision, computational biology, speech recognition, and robotics.
ICML is one of the fastest growing artificial intelligence conferences in the world. Participants at ICML span a wide range of backgrounds, from academic and industrial researchers, to entrepreneurs and engineers, to graduate students and postdocs.
The full ICML 2021 schedule will be available here.
For a full list of the van der Schaar Lab’s publications, click here.