The van der Schaar Lab has released an alpha version of clairvoyance, a ground-breaking autoML package that represents the culmination of years of research, development, and real-world testing. As a unified, end-to-end pipeline for time-series data, clairvoyance is unmatched in its versatility and capability. While primarily developed to aid clinical research and decision support, clairvoyance can also be used to great effect in non-medical contexts, thanks to its ability to facilitate complex inference workflows in a transparent, reproducible, and efficient manner.
A crucial breakthrough in the use of time-series data
Time-series data is the bread and butter of evidence-based clinical decision support, as it offers significantly more valuable insight than the “snapshots” presented by static data. With the increasing availability of electronic patient records, there is enormous untapped potential to apply machine learning to time-series data, providing accurate and actionable predictive models for real-world concerns.
At the same time, medical time-series problems in the wild are challenging due to their highly composite nature. Existing applications of machine learning to such problems have treated these component tasks as separate problems, leading to a siloed and stylized development approach that often fails to account for complexities and interdependencies within the real-world machine learning lifecycle. Regrettably, this has resulted in a remarkable gap between the inherent capabilities of machine learning methods and their actual effectiveness in clinical research and decision support.
clairvoyance offers a systematic and automated approach to personalized dynamic predictions, personalized information acquisition, personalized monitoring, and personalized treatment plans, while also offering interpretations.
Under a single, consistent interface, clairvoyance encapsulates all major modeling steps for time-series data, including: i) loading and (ii) preprocessing patient records, (iii) configuring problem definitions, (iv) handling missing or irregular samples in both static and temporal contexts, (v) conducting feature selection, (vi) fitting prediction models, performing vii) calibration and (viii) uncertainty estimation of model outputs, (ix) applying global or instance-wise methods for interpreting learned models, and (x) computing evaluation metrics.
One solution to 3 problems
Designing real-world project lifecycles poses challenges in terms of engineering (difficult to build), evaluation (difficult to assess), and efficiency (difficult to optimize). clairvoyance was conceived to tackle all of these at once.
The engineering problem is that inference procedures involves significant investment, yet few resources are available for clinical practitioners and domain experts to easily develop and validate complete workflows. As a software toolkit, clairvoyance enables workflow development within a single unified interface: having modular and composable structures enables rapid experimentation and deployment by clinical practitioners and engineers alike, while simplifying collaboration and code-sharing.
The evaluation problem is that, while the performance of each component in the workflow may depend on the broader context, current empirical practices tend to examine the merits of each component individually; surrounding steps are largely configured per convenience to enforce “all else equal” conditions for assessment. This prevents meaningful assessment of the performance of the workflow as a whole, and does not meaningfully support integrated workflow development. As an empirical standard, clairvoyance serves as a complete experimental benchmarking environment: having a standardized, extensible pipeline paradigm provides realistic and systematic context for evaluating novel component models, ensuring that comparisons are fair, transparent, and reproducible.
The efficiency problem is that sophisticated models are resource-intensive to optimize: State-of-the-art deep learning approaches require many knobs to be tuned—a computational difficulty compounded by the combination of multiple models, as well as by potential temporal distribution shifts in time series. Finally, through automated machine learning, clairvoyance takes care of model configuration and stepwise selection, efficiently considering interdependencies among components, algorithms, and time steps. (Note: “automated machine learning” or “autoML” refers to the process of selecting algorithms and building pipelines using machine learning. AutoML is essential in order to enable machine learning to be applied effectively and at scale: given the huge number of different diseases, different variables, and different needs, it’s not possible to hand-craft a model for each disease. See this recent post for details.)
Estimating personalized treatment effects
Identifying when to give treatments to patients, and how to select among multiple treatments over time, are important medical problems with a few existing solutions. As clinical decision-makers are often faced with the problem of choosing between treatment alternatives for patients, reliably estimating their effects is paramount. An integral aspect of clairvoyance is the use of a novel counterfactual recurrent network (CRN) approach, developed by van der Schaar Lab’s researchers, to estimate future treatment outcomes. This approach leverages recent advances in representation learning and domain adversarial training to overcome the problems of existing methods for causal inference over time. CRN achieves this in a manner that is free from the bias introduced by time-varying confounders. These developments were presented in a session during the 2020 International Conference on Learning Representations (ICLR 2020); further details can be found here.
Interpretable and actionable outputs
As a group that works extensively with clinicians in formulating problems and developing solutions, the van der Schaar Lab places particular emphasis on ensuring that models are interpretable and can be trusted by users without extensive machine learning expertise. This is an essential factor in being able to narrow the divide between the builders and users of machine learning models. Interpretability has therefore been built into clairvoyance through the inclusion of numerous interpretability features including INVASE, an in-house method that uses actor-critic reinforcement learning methods to help turn black-box models into white-box models (more information here).
In addition to interpretability, clairvoyance offers confidence estimates, ensuring that users are given an indication of the degree of certainty accompanying all recommendations or predictions.
Applications in medicine
clairvoyance is an inherently versatile tool, and can be applied to practically any disease or condition for which high-quality time-series patient datasets are available. It could also be adapted beyond a medical setting with relative ease.
Precursors to clairvoyance have been tested and validated using datasets for breast cancer and cystic fibrosis patients, as well as for ICU admission prediction (initially using past patient data in the U.S. and subsequently using live patient data in the U.K.). Results of testing have consistently demonstrated predictive capabilities exceeding those of existing statistical techniques or current state-of-the art machine learning models. A more detailed analysis of the performance of the current version of clairvoyance can be found here.
One potential application of clairvoyance in the near future could be the current COVID-19 pandemic. While the van der Schaar Lab is already providing AI-enabled predictive tools to the UK’s National Health Service to help hospital capacity management, clairvoyance could offer even more comprehensive support for healthcare professionals, including:
– personalized predictions (mortality, discharge, ICU admission [including early warning] and readmission) and recommendations, both at time of admission and throughout the period of hospitalization;
– personalized information acquisition, including estimations of the value of certain information and determination of which tests should be conducted;
– personalized monitoring to determine which tests should be conducted and when to make forecasts, predictions, and treatment recommendations;
– personalized treatment plans to determine whether, how and when to admit patients to ICU, to use specific kinds of mechanical equipment, or to provide medication, as well as the likely impact of doing so; and
– transfer learning, enabling i) comparison of disease trajectories for patients before and after COVID-19, and ii) learning which information acquired from the numerous trajectories of non-COVID-19 patients can be effectively transferred to COVID-19 patients, and which information cannot.
Looking to the future of clairvoyance
clairvoyance alpha is publicly available, and can be downloaded and tested. Development continues at pace, and the van der Schaar Lab team expects to release a beta version in the near future, followed by a full release subsequently. Further iterations will continue to be publicly available at no cost.
It is the research team’s firm belief that, even in alpha, clairvoyance offers versatility and integration that cannot be found elsewhere. While today’s announcement marks the release of the alpha version and a limited amount of supplementary information, further details will be shared during the coming months.
Appendix: clairvoyance compared with related software