van der Schaar Lab

Revolutionizing Clinical Trials using Machine Learning

Clinical trials today: Expensive, difficult, and ripe for disruption

Since their initial use in the 1940s, randomized controlled trials (RCTs) have become the gold-standard supporting the practice of evidence-based medicine [1]. However, increasing complexity of regulations and protocols mean they are both expensive and difficult to run: they cost upwards of $33M and take years to produce results [2,3]. Restrictive inclusion criteria also mean that half of clinical trials exclude more than 75% of patients they aim to treat [4]. Yet RCTs remain the foundation of modern medicine and more than 1,800 trials are commenced every year.

Although novel approaches to clinical trial design have emerged [5]—like decentralized trials, e-consent, and various flavors of adaptive designs—conventional RCTs have remained the dominant approach despite their acknowledged flaws. This situation presents a huge opportunity for innovation. Given the scale at which clinical trials are operated, even small improvements to how clinical trials are run could have tremendous impact on healthcare.

Challenges felt throughout the clinical development journey

The challenges of conducting an RCT may be considered in four stages : (Stage 1) planning of a clinical trial that targets the new treatment, (Stage 2) conduct of the planned trial, (Stage 3) analysis of the results obtained from the trial, and (Stage 4) clinical-use of the treatment if the trial has been successful. These four stages are not to be confused with the conventional phase (Phase I–IV) of drug development [6]. All four stages apply to clinical studies undertaken during each Phase: planning, analysis, and conduct apply mainly to Phase I–III while clinical-use mostly involves Phase III and beyond.

In order to better define the problem, we have explored the challenges involved in each stage with clinicians during our Revolutionizing Healthcare sessions (available here for the first session and the second session). This has identified unique challenges relevant to each stage, summarized below. Broadly speaking, planning requires synthesizing information from a diverse range of sources such as observational or pre-clinical data; conduct entails making decisions about who to recruit or at what dosage to administer a drug; analysis deals with complex inference problems related to risks and outcomes; and commercialization relies on intelligent modeling of various interdependent processes, from how diseases progress to how clinicians behave when prescribing treatments. An important—and continually overlooked—challenge is how the result of one trial may inform planning, conduct and analysis of the next, for example by suggesting broader applications of a successful treatment, or indicating how recruitment for a group of conditions might be optimized or stratified.

Machine learning can already help …but there is also work to be done

Machine learning forms a strong foundation for one to start tackling these challenges; it provides the fundamental tools and techniques that are necessary to precisely formulate what is needed and reason about potential solutions. As we mention later, there are already methods that can help improve trials, mainly by enabling better ways of harnessing available information to make data-driven decisions that make successful studies more likely. But of course, tools and techniques available now are not always able to fully capture the unique nature of each challenge, hence they need to continue to evolve. This need presents an opportunity for machine learning to grow further as well, together with the next-generation trials it enables.

An example: “Identifying good subpopulations fast with confidence”

Consider again the stage of clinical trial conduct. Adaptive clinical trials aim to make dynamic adjustments to how a trial is conducted—for instance, regarding the targeted population or the dosage of the treatment—according to the data collected during the trial itself [7]. As such, an adaptive trial deals predominantly with making on-the-go decisions based on limited information available at the time.

Reinforcement learning and multi-armed bandit literature lay the foundations of online decision making in uncertain environments, as in conducting adaptive trials. While many methodologies developed are applicable to clinical trials, decisions made during an adaptive trial also bring additional constraints to this setting that are not previously considered: for example, clinical trials are often limited both by budget as well as requirements for confidence in inferred results. Although decision making either with a fixed budget or a fixed confidence requirement is very well understood, the case where both constraints are present at the same time and what is achievable then remains largely unexplored (see our recent paper on the subject).

To move towards potential solutions, we have identified the subfield of machine learning whose concepts are best suited to address each challenge. This categorization forms an initial road map for machine learning to start revolutionizing clinical trials.

Our contributions so far

We have already proposed initial solutions for many of these challenges (highlighted in black as opposed to light gray) as summarized in the figure below. For a more detailed discussion of these works, please refer to our research pillars on adaptive clinical trials and individualized treatment effect inference.

Further reading

W. R. Zame, I. Bica, C. Shen, A. Curth, H.-S. Lee, S. Bailey, J. Weatherall, D. Wright, F. Bretz, M. van der Schaar, “Machine learning for clinical trials in the era of COVID-19,” Stat. Biopharm. Res., vol. 12, no. 4, pp. 506–517, 2020.

I. Bica, A. M. Alaa, C. Lambert, and M. van der Schaar, “From real-world patient data to individualized treatment effect using machine learning: current and future methods to address underlying challenges,” Clin. Pharmacol. Ther., vol. 109, pp. 87–100, 2021.

A. Curth, A. Hüyük, and M. van der Schaar, “Adaptively Identifying Patient Populations With Treatment Benefit in Clinical Trials,” arXiv preprint arXiv:2208.05844, 2022.


[1] A. W. Craft, “The first randomised controlled trial,” Arch. Disease Childhood, vol. 79, no. 5, pp. 410, 1998.

[2] K. A. Getz and A. C. Rafael, “Trial watch: Trends in clinical trial design complexity,” Nature Rev. Drug Discovery, vol. 16, no. 5, pp. 307–308, 2017.

[3] J. P. A. Ioannidis, “Effect of the statistical significance of results on the time to completion and publication of randomized efficacy trials,” JAMA, vol. 279, no. 4, pp. 281–286, 1998.

[4] J. He, D. R. Morales, and B. Guthrie, “Exclusion rates in randomized controlled trials of treatments for physical conditions: A systematic review,” Trials, vol. 21, pp. 228, 2020.

[5] “NEJM — The changing face of clinical trials,” Online: (accessed Aug. 19, 2022).

[6] C. A. Umscheid, D. J. Margolis, C. E. Grossman, “Key concepts of clinical trials: A narrative review,” Posgrad Med., vol. 123, no. 5, pp. 194–204, 2011.

[7] L. E. Bothwell, J. Avorn, N. F. Khan et al., “Adaptive design clinical trial: A review of the literature and,” BMJ Open, vol. 8, pp. e018320, 2018.

Alihan Hüyük

Alihan is a PhD student in the Department of Applied Mathematics and Theoretical Physics at the University of Cambridge. He is supervised by Professor Mihaela van der Schaar.

Prior to attending Cambridge, he completed a BSc in Electrical and Electronics Engineering at Bilkent University. Alihan’s current research focuses on developing interpretable machine learning methods with the purpose of understanding the decision-making process of clinicians.

Previously, he worked on multi-armed bandit problems in combinatorial and multi-objective settings.

Zhaozhi Qian

After obtaining a MSc in Machine Learning at UCL, Zhaozhi Qian started his career as a data scientist in the largest mobile gaming company in Europe. Three years later, he found it might be more fulfilling to apply AI to cure cancer than to make the gamers hit the purchase button 1% more often.

He thus joined the group in 2019 as a PhD student focusing on robust and interpretable learning for longitudinal data. So far, his work has included inferring latent disease interaction networks from Electronic Health Records, uncovering the causal structure between events that unfold over time, and calibrating the predictive uncertainty under domain shift.

Zhaozhi also worked as a contractor in the NHS during the COVID-19 pandemic contributing his analytical skills to the national response to the pandemic.

Eoin McKinney

University Lecturer in Renal medicine at the University of Cambridge; Honorary consultant in nephrology and transplantation, Cambridge University Hospitals NHS Foundation Trust

Dr McKinney’s research explores the interface between immune responses to infection and those driving inflammatory pathology, applying machine learning methods to the integration of multi-omics data, building interpretable predictive models for rapid translation into clinical practice while informing underlying disease biology and identifying novel therapeutic strategies.

Mihaela van der Schaar

Mihaela van der Schaar is the John Humphrey Plummer Professor of Machine Learning, Artificial Intelligence and Medicine at the University of Cambridge and a Fellow at The Alan Turing Institute in London.

Mihaela has received numerous awards, including the Oon Prize on Preventative Medicine from the University of Cambridge (2018), a National Science Foundation CAREER Award (2004), 3 IBM Faculty Awards, the IBM Exploratory Stream Analytics Innovation Award, the Philips Make a Difference Award and several best paper awards, including the IEEE Darlington Award.

In 2019, she was identified by National Endowment for Science, Technology and the Arts as the most-cited female AI researcher in the UK. She was also elected as a 2019 “Star in Computer Networking and Communications” by N²Women. Her research expertise span signal and image processing, communication networks, network science, multimedia, game theory, distributed systems, machine learning and AI.

Mihaela’s research focus is on machine learning, AI and operations research for healthcare and medicine.