van der Schaar Lab

clairvoyance alpha: the first unified end-to-end autoML pipeline for time-series data

The van der Schaar Lab has released an alpha version of clairvoyance, a ground-breaking autoML package that represents the culmination of years of research, development, and real-world testing. As a unified, end-to-end pipeline for time-series data, clairvoyance is unmatched in its versatility and capability. While primarily developed to aid clinical research and decision support, clairvoyance can also be used to great effect in non-medical contexts, thanks to its ability to facilitate complex inference workflows in a transparent, reproducible, and efficient manner.

clairvoyance is an enormously important project for our team, and is the result of years of work across a number of different areas. It’s the first of its kind: an automated end-to-end pipeline that can produce personalized and interpretable predictions and recommendations using time-series data. I have no doubt that clairvoyance will prove useful in driving clinical decision-making research, and I also believe it can offer a lot of benefits to the machine learning community.

– Prof. Mihaela van der Schaar

A crucial breakthrough in the use of time-series data

Time-series data is the bread and butter of evidence-based clinical decision support, as it offers significantly more valuable insight than the “snapshots” presented by static data. With the increasing availability of electronic patient records, there is enormous untapped potential to apply machine learning to time-series data, providing accurate and actionable predictive models for real-world concerns.

At the same time, medical time-series problems in the wild are challenging due to their highly composite nature. Existing applications of machine learning to such problems have treated these component tasks as separate problems, leading to a siloed and stylized development approach that often fails to account for complexities and interdependencies within the real-world machine learning lifecycle. Regrettably, this has resulted in a remarkable gap between the inherent capabilities of machine learning methods and their actual effectiveness in clinical research and decision support.

clairvoyance offers a systematic and automated approach to personalized dynamic predictions, personalized information acquisition, personalized monitoring, and personalized treatment plans, while also offering interpretations.

clairvoyance pipeline

Under a single, consistent interface, clairvoyance encapsulates all major modeling steps for time-series data, including: i) loading and (ii) preprocessing patient records, (iii) configuring problem definitions, (iv) handling missing or irregular samples in both static and temporal contexts, (v) conducting feature selection, (vi) fitting prediction models, performing vii) calibration and (viii) uncertainty estimation of model outputs, (ix) applying global or instance-wise methods for interpreting learned models, and (x) computing evaluation metrics.

Existing time-series packages typically concentrate on implementing algorithms for specific problems, such as imputation, dynamic forecasting or feature extraction. By contrast, clairvoyance enables end-to-end development along every step of the inference workflow, including components critical to medical problems.

– Dr. Jinsung Yoon (van der Schaar Lab; graduated 2020)

One solution to 3 problems

Designing real-world project lifecycles poses challenges in terms of engineering (difficult to build), evaluation (difficult to assess), and efficiency (difficult to optimize). clairvoyance was conceived to tackle all of these at once.

The engineering problem is that inference procedures involves significant investment, yet few resources are available for clinical practitioners and domain experts to easily develop and validate complete workflows. As a software toolkit, clairvoyance enables workflow development within a single unified interface: having modular and composable structures enables rapid experimentation and deployment by clinical practitioners and engineers alike, while simplifying collaboration and code-sharing.

The evaluation problem is that, while the performance of each component in the workflow may depend on the broader context, current empirical practices tend to examine the merits of each component individually; surrounding steps are largely configured per convenience to enforce “all else equal” conditions for assessment. This prevents meaningful assessment of the performance of the workflow as a whole, and does not meaningfully support integrated workflow development. As an empirical standard, clairvoyance serves as a complete experimental benchmarking environment: having a standardized, extensible pipeline paradigm provides realistic and systematic context for evaluating novel component models, ensuring that comparisons are fair, transparent, and reproducible.

The efficiency problem is that sophisticated models are resource-intensive to optimize: State-of-the-art deep learning approaches require many knobs to be tuned—a computational difficulty compounded by the combination of multiple models, as well as by potential temporal distribution shifts in time series. Finally, through automated machine learning, clairvoyance takes care of model configuration and stepwise selection, efficiently considering interdependencies among components, algorithms, and time steps. (Note: “automated machine learning” or “autoML” refers to the process of selecting algorithms and building pipelines using machine learning. AutoML is essential in order to enable machine learning to be applied effectively and at scale: given the huge number of different diseases, different variables, and different needs, it’s not possible to hand-craft a model for each disease. See this recent post for details.)

Estimating personalized treatment effects

Identifying when to give treatments to patients, and how to select among multiple treatments over time, are important medical problems with a few existing solutions. As clinical decision-makers are often faced with the problem of choosing between treatment alternatives for patients, reliably estimating their effects is paramount. An integral aspect of clairvoyance is the use of a novel counterfactual recurrent network (CRN) approach, developed by van der Schaar Lab’s researchers, to estimate future treatment outcomes. This approach leverages recent advances in representation learning and domain adversarial training to overcome the problems of existing methods for causal inference over time. CRN achieves this in a manner that is free from the bias introduced by time-varying confounders. These developments were presented in a session during the 2020 International Conference on Learning Representations (ICLR 2020); further details can be found here.

What is unique here is that we go beyond dynamic forecasting and provide actionable intelligence by estimating individualized treatment effects. clairvoyance is capable of predicting counterfactual trajectories for each patient under different possible treatment strategies, thus enabling us to determine when to give treatments to patients and how to select among multiple treatments over time.

– Ioana Bica (Ph.D. student, van der Schaar Lab)

Interpretable and actionable outputs

As a group that works extensively with clinicians in formulating problems and developing solutions, the van der Schaar Lab places particular emphasis on ensuring that models are interpretable and can be trusted by users without extensive machine learning expertise. This is an essential factor in being able to narrow the divide between the builders and users of machine learning models. Interpretability has therefore been built into clairvoyance through the inclusion of numerous interpretability features including INVASE, an in-house method that uses actor-critic reinforcement learning methods to help turn black-box models into white-box models (more information here).

In addition to interpretability, clairvoyance offers confidence estimates, ensuring that users are given an indication of the degree of certainty accompanying all recommendations or predictions.

Applications in medicine

clairvoyance is an inherently versatile tool, and can be applied to practically any disease or condition for which high-quality time-series patient datasets are available. It could also be adapted beyond a medical setting with relative ease.

Precursors to clairvoyance have been tested and validated using datasets for breast cancer and cystic fibrosis patients, as well as for ICU admission prediction (initially using past patient data in the U.S. and subsequently using live patient data in the U.K.). Results of testing have consistently demonstrated predictive capabilities exceeding those of existing statistical techniques or current state-of-the art machine learning models. A more detailed analysis of the performance of the current version of clairvoyance can be found here.

One potential application of clairvoyance in the near future could be the current COVID-19 pandemic. While the van der Schaar Lab is already providing AI-enabled predictive tools to the UK’s National Health Service to help hospital capacity management, clairvoyance could offer even more comprehensive support for healthcare professionals, including:

– personalized predictions (mortality, discharge, ICU admission [including early warning] and readmission) and recommendations, both at time of admission and throughout the period of hospitalization;
– personalized information acquisition, including estimations of the value of certain information and determination of which tests should be conducted;
– personalized monitoring to determine which tests should be conducted and when to make forecasts, predictions, and treatment recommendations;
– personalized treatment plans to determine whether, how and when to admit patients to ICU, to use specific kinds of mechanical equipment, or to provide medication, as well as the likely impact of doing so; and
– transfer learning, enabling i) comparison of disease trajectories for patients before and after COVID-19, and ii) learning which information acquired from the numerous trajectories of non-COVID-19 patients can be effectively transferred to COVID-19 patients, and which information cannot.

Say a new patient walks in for something. As a matter of basic decision support, you’d want to be able to forecast some indicators, assess possible treatments, and decide what more measurements to take, and when, and to do this preferably under the same roof.

– Dan Jarrett (Ph.D. student, van der Schaar Lab)

Looking to the future of clairvoyance

clairvoyance alpha is publicly available, and can be downloaded and tested. Development continues at pace, and the van der Schaar Lab team expects to release a beta version in the near future, followed by a full release subsequently. Further iterations will continue to be publicly available at no cost.

It is the research team’s firm belief that, even in alpha, clairvoyance offers versatility and integration that cannot be found elsewhere. While today’s announcement marks the release of the alpha version and a limited amount of supplementary information, further details will be shared during the coming months.

The alpha software can be found here, and a more thorough introduction to clairvoyance is available here.

Appendix: clairvoyance compared with related software

Mihaela van der Schaar

Mihaela van der Schaar

Mihaela van der Schaar is John Humphrey Plummer Professor of Machine Learning, Artificial Intelligence and Medicine at the University of Cambridge, a Turing Faculty Fellow at The Alan Turing Institute in London, and a Chancellor’s Professor at UCLA.

In 2019, Mihaela was identified by National Endowment for Science, Technology and the Arts as the female researcher based in the UK with the most publications in the field of AI. She was also elected a 2019 “Star in Computer Networking and Communications”.

Avatar

Jinsung Yoon

Google Scholar: https://scholar.google.com/citations?user=kiFd6A8AAAAJ

Avatar

Zhaozhi Qian

Avatar

Dan Jarrett

Ioana Bica

Ioana Bica