van der Schaar Lab

Quantitative epistemology: conceiving a new human-machine partnership

This page is authored and maintained by Mihaela van der Schaar and Nick Maxfield.


Pioneering a new field of research

Quantitative epistemology is a new and transformationally significant research pillar pioneered by the van der Schaar Lab. The purpose of this research is to develop a strand of machine learning aimed at understanding, supporting, and improving human decision-making. We aim to do so by building machine learning models of decision-making—including how humans acquire and learn from new information, establish and update their beliefs, and act on the basis of their cumulative knowledge. Because our approach is driven by observational data in studying knowledge as well as using machine learning methods for supporting and improving knowledge acquisition and its impact on decision-making, we call this “quantitative epistemology.”

Our methods are aimed at studying human decision-making, identifying potential suboptimalities in beliefs and decision processes (such as cognitive biases, selective attention, imperfect retention of past experience, etc.), and understanding risk attitudes and their implications for learning and decision-making. This would allow us to construct decision support systems that provide humans with information pertinent to their intended actions, their possible alternatives and counterfactual outcomes, as well as other evidence to empower better decision-making.

Revisiting the roots of human (meta-)learning

Quantitative epistemology draws inspiration from the field of meta-learning. While meta-learning is arguably best-known today as a subfield of machine learning, in this case we are referring the original meaning of the term within the domains of social psychology and education—as coined by Donald B. Maudsley in his 1979 book entitled A theory of meta-learning and principles of facilitation : an organismic perspective.

Maudsley defined meta-learning as “the process by which learners become aware of and increasingly in control of habits of perception, inquiry, learning, and growth.” He put forward five requirements learners must observe in order to practice meta-learning successfully:
– Have a theory;
– Work in a supportive environment;
– Discover their rules and assumptions;
– Reconnect with reality-information from the environment; and
– Reorganize themselves by changing their rules/assumptions.

In reality, meta-learning remains extremely difficult for humans, even when the five requirements listed above are met. Our goal for quantitative epistemology, therefore, was to develop a new machine learning field aiming to empower humans to perform meta-learning. Our vision is to use machine learning to serve they purpose defined by Maudsley by empowering humans to improve and control their own perception, inquiry, learning, and growth—as well as their decision-making.

This is in keeping with our lab’s overall vision of using machine learning to learn human intelligence with the aim of empowering humans—rather than empowering machine intelligence.

A human-machine partnership based on empowerment, not replacement

As mentioned above, it is important to distinguish quantitative epistemology from existing work in AI and machine learning, such as imitation learning (i.e. replicating expert actions) and apprenticeship learning (i.e. matching expert returns), both of which intend to construct autonomous agents that can mimic and replace human demonstrators. Instead, we are concerned with leveraging machine learning to help humans become better decision-makers.

Quantitative epistemology entails developing machine learning models that capture how humans acquire new information, how they pay attention to such information, how their beliefs may be represented, how their internal models may be structured, how these different levels of knowledge are leveraged in the form of actions, and how such knowledge is learned and updated over time.

Quantitative epistemology envisages a new human-machine partnership in which machines support and empower humans, rather than replacing them.

The figure below depicts the broad strokes of this partnership in terms of long-term cycles in which a theory of meta-learning is built and continually honed, and in which humans are constantly being empowered to control their growth, perception, inquiry, learning, and decision-making.

Starting at the bottom left of the figure and moving clockwise:
1. humans act and perform meta-learning;
2. assumptions, structures, and rules, etc., can be studied using machine learning (quantitative epistemology) and developed into meta-learning models;
3. we can use these behavior models to distil hypotheses about meta-learning;
4. through the scientific process, we can build these hypotheses into a comprehensive and quantitative theory of meta-learning;
5a. we can reconnect this theory with reality-information and improve it cyclically over time;
5b. this process can also provide new advice, empowering humans to grow and further hone their perception, inquiry, learning, and decision-making.

Note: our use of “meta-learning models” here refers to models that examine the individual-specific thought processes and tendencies or biases that influence how humans make decisions when presented with specific information. Such models can examine characteristics including (but not limited to) an individual’s capacity for flexibility or adaptivity, tolerance of risk, or degree of optimism, and can also identify context-specific factors that drive changes in these characteristics. For instance, such models may identify that certain clinicians tend to be less optimistic when diagnosing patients at risk, or they may show how optimism and confirmation bias could lead to similar but differentiable behavior.

We can also use quantitative epistemology to build the “supportive environment” Maudsley defined as a requirement for successful meta-learning.

Starting at the very bottom of the figure and moving clockwise:
1. as in the previous figure, humans act and perform meta-learning;
2. machine learning tools (quantitative epistemology) can understand these decisions by building meta-learning models, identifying potential biases, errors, and inconsistencies, and providing advice;
3. humans are provided with this information;
4. humans inform the machine learning tools whether the adjustments or corrections provided about their behavior are effective or not, and offer clarifications about their decisions as well as rating the advice provided to them;
5. this serves to improve the understanding of the quantitative epistemology machine learning tools, driving a cycle that can further empower humans.

Applications of quantitative epistemology

Broadly, we currently see four potential areas of application for quantitative epistemology, none of which are limited to healthcare:

1. Decision Support
This is arguably the most intuitive and straightforward application of understanding human decision-making. In medicine, for example, we can combine a meaningful understanding of the basis on which decisions are made with normative standards for optimal decision-making in areas such as diagnosis, treatment, and resource allocation.

Furthermore, we can apply quantitative epistemology in a single-agent or multi-agent setting, using our understanding of decision-making to optimize decision-making across multiple individuals or groups, whether in a co-operative or a competitive setting.

2. Analysis of variation
In many fields such as healthcare, there is often remarkable regional, institutional, and subgroup-level variability in practice. This variability renders detection and quantification of biases crucial.

Quantitative epistemology can yield powerful tools to audit clinical decision-making to investigate variation in practice, biases, and sub-optimal decision-making, and understand where improvements can be made.

3. (Re)-Definition of Normative Standards
There are many areas in which normative standards have not been defined, or may need to be continually redefined. Through the application of quantitative epistemology, we can determine whether normative standards are realistic and effective representations of desired outcomes, enabling policy-makers to design better policies going forward.

4. Education and training
Quantitative epistemology aims to produce a data-driven, quantitative—and most importantly interpretable—description of the process by which humans form and adapt their beliefs and understanding of the world. This could yield enormous benefit in education and training: both the content and instructional methods employed in courses could be extensively tailored to specific individuals, taking into account their learning styles, biases, and preferences.

This section showcases the potential utility of quantitative epistemology as an investigative approach for auditing and quantifying individual decisions in the healthcare domain. The method demonstrated here is INTERPOLE, which was introduced in a paper published at ICLR 2021 (abstract and further details are provided below).

INTERPOLE is a model for interpretable policy learning that seeks to model the evolution of an agent’s beliefs and provide a concrete basis for analyzing the corresponding sequence of actions taken. Sequential observations are aggregated through a decision-maker’s belief-update process, and sequential actions are determined by the agent’s probabilistic belief-action mapping.

The example given below uses real-world diagnosis patterns from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database over the course of a sequence of 6-monthly patient visits.

The state space consists of normal functioning (“NL”), mild cognitive impairment (“MCI”), and dementia. For the action space, we consider the decision problem of ordering vs. not ordering an MRI test, which (while often informative of Alzheimer’s) is financially costly.

In the figure above, each vertex of the belief simplex corresponds to one of the three stable diagnoses, and each point in the simplex corresponds to a unique belief (i.e. probability distribution). The closer the point is to a vertex (i.e. state), the higher the probability assigned to that state.

The figure gives examples for real patients, including: (a) a typical normally-functioning patient, where the decision-maker’s beliefs remain mostly on the decision boundary; (b) a typical patient who is believed to be deteriorating towards dementia; (c) a patient who—apparently—could have been diagnosed much earlier than they actually were; and (d) a patient with a (seemingly redundant) MRI test that is actually highly informative.

Explaining trajectories
Patients (a) and (b) are “typical” patients who fit well to the overall learned policy. The former is a normally-functioning patient believed to remain around the decision boundary in all visits except the first; appropriately, they are ordered an MRI during approximately half of their visits. The latter is believed to be deteriorating from MCI towards dementia, hence prescribed an MRI in all visits.

Identifying belated diagnoses
In many diseases, early diagnosis is paramount. Using quantitative epistemology approaches such as INTERPOLE, we can detect patients who appear to have been diagnosed significantly later than they should have.

Patient (c), for example, was ordered an MRI in neither of their first two visits despite the fact that the “typical” policy would have strongly recommended one. At a third visit, the MRI that was finally ordered led to near-certainty of cognitive impairment but this could have been known 12 months earlier! In fact, among all ADNI patients in the database, 6.5% were subject to this apparent pattern of “belatedness”, where a late MRI is immediately followed by a jump to near-certain deterioration.

Quantifying the value of information
Patient (d) highlights how quantitative epistemology can be used to quantify the value of a test in terms of its information gain.

While the patient was ordered an MRI in all of their visits, it may appear (on the surface) that the third and final MRIs were redundant—since they had little apparent affect on beliefs. However, this is only true for the factual belief update that occurred according to the MRI outcome that was actually observed. Having access to an estimated model of how beliefs are updated in the form of decision dynamics, we can also compute counterfactual belief updates—that is belief updates that could have occurred if the MRI outcome in question were to be different.

In the particular case of patient (d), the tests were in fact highly informative, since (as it happened) the patient’s CDR-SB scores were suggestive of impairment, and (in the counterfactual) the doctor’s beliefs could have potentially leapt drastically towards MCI.

Clinician evaluation of INTERPOLE
We evaluated INTERPOLE by consulting nine clinicians from four different countries (United States, United Kingdom, the Netherlands, and China) for feedback.

To determine whether decision dynamics are a transparent way of modeling how information is aggregated by decision-makers, we presented the clinicians with the medical history of an example patient represented in three ways, using: i) only the most recent action-observation, ii) the complete action-observation trajectory, and iii) the belief trajectory as recovered by INTERPOLE. All nine clinicians preferred the belief trajectories over action-observation trajectories.

We also sought to establish whether the proposed representation of (possibly suboptimal) decision boundaries is a more transparent way of describing policies, compared with the representation of reward functions. To do this, we showed the clinicians the policies learned from ADNI in the form of decision boundaries (i.e. INTERPOLE) and reward functions. Seven out of the nine clinicians preferred the representation in terms of decision boundaries.

Further details regarding INTERPOLE can be found below. For more information on our work related to Alzheimer’s, click here.

Intersection with other areas of research

Quantitative epistemology will complement and build upon projects across the lab’s other key research areas, including decision support systems, predictive analytics, automated ML, individualized treatment effect inference, interpretability, synthetic data, and more.

These points of intersection (and the immense potential for additional intersection) should be clear from the following descriptions of some of our initial projects related to quantitative epistemology.

Our work so far

Quantitative epistemology has become an area of significant focus for our lab’s researchers in recent years. Some of our first papers are shared below.

Online Decision Mediation
We develop a decision support assistant that serves as an intermediary between (oracle) expert behaviour and (imperfect) human behaviour. At each time, the algorithm observes an action chosen by a fallible agent, and decides whether to accept that agent’s decision, intervene with an alternative, or request the expert’s opinion. Successful mediation requires striking a balance between when to learn from the expert and when to intervene based on what is learned.

Online Decision Mediation

Daniel Jarrett, Alihan Hüyük, Mihaela van der Schaar

NeurIPS 2022

Consider learning a decision support assistant to serve as an intermediary between (oracle) expert behavior and (imperfect) human behavior: At each time, the algorithm observes an action chosen by a fallible agent, and decides whether to *accept* that agent’s decision, *intervene* with an alternative, or *request* the expert’s opinion. For instance, in clinical diagnosis, fully-autonomous machine behavior is often beyond ethical affordances, thus real-world decision support is often limited to monitoring and forecasting.

Instead, such an intermediary would strike a prudent balance between the former (purely prescriptive) and latter (purely descriptive) approaches, while providing an efficient interface between human mistakes and expert feedback. In this work, we first formalize the sequential problem of *online decision mediation*—that is, of simultaneously learning and evaluating mediator policies from scratch with *abstentive feedback*: In each round, deferring to the oracle obviates the risk of error, but incurs an upfront penalty, and reveals the otherwise hidden expert action as a new training data point. Second, we motivate and propose a solution that seeks to trade off (immediate) loss terms against (future) improvements in generalization error; in doing so, we identify why conventional bandit algorithms may fail.

Finally, through experiments and sensitivities on a variety of datasets, we illustrate consistent gains over applicable benchmarks on performance measures with respect to the mediator policy, the learned model, and the decision-making system as a whole.

Inverse Contextual Bandits: Learning How Behavior Evolves over Time
Conventional approaches to policy learning almost invariably assume stationarity in behaviour, this is hardly true in practice: Medical practice is constantly evolving as clinical professionals fine-tune their knowledge over time. To quantify how medical practice have been evolving, we develop a policy learning method that provides interpretable representations of decision-making, in particular capturing an agent’s non-stationary knowledge of the world, as well as operating in an offline manner.

Inverse Contextual Bandits: Learning How Behavior Evolves over Time

Alihan Hüyük, Daniel Jarrett, Mihaela van der Schaar

ICML 2022

Understanding a decision-maker’s priorities by observing their behaviour is critical for transparency and accountability in decision processes, such as in healthcare. Though conventional approaches to policy learning almost invariably assume stationarity in behaviour, this is hardly true in practice: Medical practice is constantly evolving as clinical professionals fine-tune their knowledge over time.

For instance, as the medical community’s understanding of organ transplantations has progressed over the years, a pertinent question is: How have actual organ allocation policies been evolving? To give an answer, we desire a policy learning method that provides interpretable representations of decision-making, in particular capturing an agent’s non-stationary knowledge of the world, as well as operating in an offline manner. First, we model the evolving behaviour of decision-makers in terms of contextual bandits, and formalise the problem of Inverse Contextual Bandits (ICB). Second, we propose two concrete algorithms as solutions, learning parametric and nonparametric representations of an agent’s behaviour.

Finally, using both real and simulated data for liver transplantations, we illustrate the applicability and explainability of our method, as well as benchmarking and validating its accuracy.

Inverse Online Learning: Understanding Non-Stationary and Reactionary Policies
We introduce a novel approach using deep state-space models to retrospectively estimate the factors that govern decision processes and how they change over time. By applying this technique to the analysis of organ donation acceptance decisions, we demonstrate its ability to provide valuable insights into human decision making and the potential for improving decision-making ability.

Inverse Online Learning: Understanding Non-Stationary and Reactionary Policies

Alex J. Chan, Alicia Curth, Mihaela van der Schaar

ICLR 2022

Human decision making is well known to be imperfect and the ability to analyse such processes individually is crucial when attempting to aid or improve a decision-maker’s ability to perform a task, e.g. to alert them to potential biases or oversights on their part. To do so, it is necessary to develop interpretable representations of how agents make decisions and how this process changes over time as the agent learns online in reaction to the accrued experience.

To then understand the decision-making processes underlying a set of observed trajectories, we cast the policy inference problem as the inverse to this online learning problem. By interpreting actions within a potential outcomes framework, we introduce a meaningful mapping based on agents choosing an action they believe to have the greatest treatment effect. We introduce a practical algorithm for retrospectively estimating such perceived effects, alongside the process through which agents update them, using a novel architecture built upon an expressive family of deep state-space models.

Through application to the analysis of UNOS organ donation acceptance decisions, we demonstrate that our approach can bring valuable insights into the factors that govern decision processes and how they change over time.

POETREE: Interpretable Policy Learning with Adaptive Decision Trees
Policy Extraction through Decision Trees (POETREE) is a new framework for interpretable policy learning that is compatible with fully-offline and partially-observable clinical decision environments. It uses fully-differentiable tree architectures to learn a representation of patient history and adapt over time, resulting in decision tree policies that can outperform the state-of-the-art transparent models. This approach has the potential to improve future decision support systems and help us better understand, diagnose, and support real-world policies in healthcare.

POETREE: Interpretable Policy Learning with Adaptive Decision Trees

Alizée Pace, Alex J. Chan, Mihaela van der Schaar

ICLR 2022

Building models of human decision-making from observed behaviour is critical to better understand, diagnose and support real-world policies such as clinical care. As established policy learning approaches remain focused on imitation performance, they fall short of explaining the demonstrated decision-making process.

Policy Extraction through decision Trees (POETREE) is a novel framework for interpretable policy learning, compatible with fully-offline and partially-observable clinical decision environments — and builds probabilistic tree policies determining physician actions based on patients’ observations and medical history. Fully-differentiable tree architectures are grown incrementally during optimization to adapt their complexity to the modelling task, and learn a representation of patient history through recurrence, resulting in decision tree policies that adapt over time with patient information.

This policy learning method outperforms the state-of-the-art on real and synthetic medical datasets, both in terms of understanding, quantifying and evaluating observed behaviour as well as in accurately replicating it — with potential to improve future decision support systems.

Inferring Lexicographically-Ordered Rewards from Preferences
In modelling the preferences of agents over a set of alternatives, the dominant approach has been to find a single reward/utility function with the property that alternatives yielding higher rewards are preferred over alternatives yielding lower rewards. However, in many settings, preferences are based on multiple—often competing—objectives; a single reward function is not adequate to represent such preferences. In this paper, we propose a method for inferring multi-objective reward-based representations of an agent’s observed preferences.

Inferring Lexicographically-Ordered Rewards from Preferences

Alihan Hüyük, William R. Zame, Mihaela van der Schaar

AAAI 2022

Modeling the preferences of agents over a set of alternatives is a principal concern in many areas. The dominant approach has been to find a single reward/utility function with the property that alternatives yielding higher rewards are preferred over alternatives yielding lower rewards. However, in many settings, preferences are based on multiple—often competing—objectives; a single reward function is not adequate to represent such preferences.

This paper proposes a method for inferring multi-objective reward-based representations of an agent’s observed preferences. We model the agent’s priorities over different objectives as entering lexicographically, so that objectives with lower priorities matter only when the agent is indifferent with respect to objectives with higher priorities.

We offer two example applications in healthcare—one inspired by cancer treatment, the other inspired by organ transplantation—to illustrate how the lexicographically-ordered rewards we learn can provide a better understanding of a decision-maker’s preferences and help improve policies when used in reinforcement learning.

The Medkit-Learn(ing) Environment: Medical Decision Modelling through Simulation
The Medkit-Learn(ing) Environment is a new tool that uses synthetic medical data to test and improve machine learning algorithms for imitation and inverse reinforcement learning in the healthcare field. With the Medkit, researchers can evaluate and compare their algorithms in a realistic medical setting, while inspecting their algorithms to validate that they learn appropriate features.

The Medkit-Learn(ing) Environment: Medical Decision Modelling through Simulation

Alex J. Chan, Ioana Bica, Alihan Hüyük, Daniel Jarrett, Mihaela van der Schaar

NeurIPS 2021

Understanding decision-making in clinical environments is of paramount importance if we are to bring the strengths of machine learning to ultimately improve patient outcomes. Several factors including the availability of public data, the intrinsically offline nature of the problem, and the complexity of human decision making, has meant that the mainstream development of algorithms is often geared towards optimal performance in tasks that do not necessarily translate well into the medical regime; often overlooking more niche issues commonly associated with the area.

We therefore present a new benchmarking suite designed specifically for medical sequential decision making: the Medkit-Learn(ing) Environment, a publicly available Python package providing simple and easy access to high-fidelity synthetic medical data. While providing a standardised way to compare algorithms in a realistic medical setting we employ a generating process that disentangles the policy and environment dynamics to allow for a range of customisations, thus enabling systematic evaluation of algorithms’ robustness against specific challenges prevalent in healthcare.

Closing the loop in medical decision support by understanding clinical decision-making: A case study on organ transplantation
Using organ transplantation as a case study, we formalize the desiderata of methods for understanding clinical decision-making. We show that most existing machine learning methods are insufficient to meet these requirements and propose iTransplant, a novel data-driven framework to learn the factors affecting decisions on organ offers in an instance-wise fashion directly from clinical data, as a possible solution.

Closing the loop in medical decision support by understanding clinical decision-making: A case study on organ transplantation

Yuchao Qin, Fergus Imrie, Alihan Hüyük, Daniel Jarrett, Alexander Gimson, Mihaela van der Schaar

NeurIPS 2021

Significant effort has been placed on developing decision support tools to improve patient care. However, drivers of real-world clinical decisions in complex medical scenarios are not yet well-understood, resulting in substantial gaps between these tools and practical applications. In light of this, we highlight that more attention on understanding clinical decision-making is required both to elucidate current clinical practices and to enable effective human-machine interactions. This is imperative in high-stakes scenarios with scarce available resources.

Using organ transplantation as a case study, we formalize the desiderata of methods for understanding clinical decision-making. We show that most existing machine learning methods are insufficient to meet these requirements and propose iTransplant, a novel data-driven framework to learn the factors affecting decisions on organ offers in an instance-wise fashion directly from clinical data, as a possible solution. Through experiments on real-world liver transplantation data from OPTN, we demonstrate the use of iTransplant to: (1) discover which criteria are most important to clinicians for organ offer acceptance; (2) identify patient-specific organ preferences of clinicians allowing automatic patient stratification; and (3) explore variations in transplantation practices between different transplant centers.

Finally, we emphasize that the insights gained by iTransplant can be used to inform the development of future decision support tools.

Inverse decision modeling (IDM)
In a paper accepted for publication at ICML 2021, we developed an expressive, unifying perspective on inverse decision modeling (IDM): a framework for learning parameterized representations of sequential decision behavior.

IDM enables us to quantify intuitive notions of bounded rationality—such as the apparent flexibility of decisions, tolerance for surprise, or optimism in beliefs—while also making such representations interpretable. In presenting IDM, we highlight its potential utility in real-world settings as an investigative device for auditing and understanding human decision-making.

Inverse Decision Modeling: Learning Interpretable Representations of Behavior

Daniel Jarrett, Alihan Hüyük, Mihaela van der Schaar

ICML 2021

Decision analysis deals with modeling and enhancing decision processes. A principal challenge in improving behavior is in obtaining a transparent description of existing behavior in the first place.

In this paper, we develop an expressive, unifying perspective on inverse decision modeling: a framework for learning parameterized representations of sequential decision behavior.

First, we formalize the forward problem (as a normative standard), subsuming common classes of control behavior.

Second, we use this to formalize the inverse problem (as a descriptive model), generalizing existing work on imitation/reward learning—while opening up a much broader class of research problems in behavior representation.

Finally, we instantiate this approach with an example (inverse bounded rational control), illustrating how this structure enables learning (interpretable) representations of (bounded) rationality—while naturally capturing intuitive notions of suboptimal actions, biased beliefs, and imperfect knowledge of environments.

Approximate Variational Reward Imitation Learning (AVRIL)
AVRIL, presented in a paper published at ICLR 2021, offers yet another potential approach to addressing the problem of studying decision-making in settings in which there is no access to knowledge of the environment dynamics nor intrinsic reward, nor even the ability to interact and test policies. As explained directly below, AVRIL offers reward inference in environments beyond the scope of current methods, as well as task performance competitive with focused offline imitation learning algorithms.

Scalable Bayesian Inverse Reinforcement Learning

Alex Chan, Mihaela van der Schaar

ICLR 2021

Bayesian inference over the reward presents an ideal solution to the ill-posed nature of the inverse reinforcement learning problem. Unfortunately current methods generally do not scale well beyond the small tabular setting due to the need for an inner-loop MDP solver, and even non-Bayesian methods that do themselves scale often require extensive interaction with the environment to perform well, being inappropriate for high stakes or costly applications such as healthcare.

In this paper we introduce our method, Approximate Variational Reward Imitation Learning (AVRIL), that addresses both of these issues by jointly learning an approximate posterior distribution over the reward that scales to arbitrarily complicated state spaces alongside an appropriate policy in a completely offline manner through a variational approach to said latent reward.

Applying our method to real medical data alongside classic control simulations, we demonstrate Bayesian reward inference in environments beyond the scope of current methods, as well as task performance competitive with focused offline imitation learning algorithms.

Counterfactual inverse reinforcement learning (CIRL)
In a paper published at ICLR 2021, we proposed learning explanations of expert decisions by modeling their reward function in terms of preferences with respect to counterfactual “what if” outcomes. In healthcare, for example, treatments often affect several patient covariates, by having both benefits and side-effects; decision-makers often make choices based on their preferences over these outcomes. By presenting decision-makers with counterfactuals, we can present them with potential outcomes of a particular action and model their preferences and reward functions. In the context of healthcare, doing this could enable us to quantify and inspect policies in different institutions and uncover the trade-offs and preferences associated with expert actions, as well as revealing the tendencies of individual practitioners to treat various diseases more or less aggressively.

Learning “What-if” Explanations for Sequential Decision-Making

Ioana Bica, Daniel Jarrett, Alihan Hüyük, Mihaela van der Schaar

ICLR 2021

Building interpretable parameterizations of real-world decision-making on the basis of demonstrated behavior–i.e. trajectories of observations and actions made by an expert maximizing some unknown reward function–is essential for introspecting and auditing policies in different institutions.

In this paper, we propose learning explanations of expert decisions by modeling their reward function in terms of preferences with respect to “”what if” outcomes: Given the current history of observations, what would happen if we took a particular action? To learn these cost-benefit tradeoffs associated with the expert’s actions, we integrate counterfactual reasoning into batch inverse reinforcement learning. This offers a principled way of defining reward functions and explaining expert behavior, and also satisfies the constraints of real-world decision-making—where active experimentation is often impossible (e.g. in healthcare). Additionally, by estimating the effects of different actions, counterfactuals readily tackle the off-policy nature of policy evaluation in the batch setting, and can naturally accommodate settings where the expert policies depend on histories of observations rather than just current states.

Through illustrative experiments in both real and simulated medical environments, we highlight the effectiveness of our batch, counterfactual inverse reinforcement learning approach in recovering accurate and interpretable descriptions of behavior.

Interpretable policy learning (INTERPOLE)
The motivation behind INTERPOLE, introduced in a paper published at ICLR 2021, was to create a transparent description of behavior capable of locating the factors that contribute to individual decisions, in a language that can readily understood by domain experts. Classical imitation learning approaches incorporate black-box hidden states that are rarely amenable to meaningful interpretation, while apprenticeship learning algorithms only offer high-level reward mappings that are not informative as to individual actions observed in the data. Additionally, INTERPOLE aims to accommodate partial observability, and operate completely offline.

During our work on INTERPOLE, we conducted experiments on both simulated and real-world data for the problem of Alzheimer’s disease diagnosis. We then sought feedback on our approach through a survey of 9 clinicians, who expressed an overwhelming preference for INTERPOLE by comparison with other potential approaches. Further details are provided earlier on this page.

Explaining by Imitating: Understanding Decisions by Interpretable Policy Learning

Alihan Hüyük, Daniel Jarrett, Cem Tekin, Mihaela van der Schaar

ICLR 2021

Understanding human behavior from observed data is critical for transparency and accountability in decision-making. Consider real-world settings such as healthcare, in which modeling a decision-maker’s policy is challenging—with no access to underlying states, no knowledge of environment dynamics, and no allowance for live experimentation. We desire learning a data-driven representation of decision- making behavior that (1) inheres transparency by design, (2) accommodates partial observability, and (3) operates completely offline.

To satisfy these key criteria, we propose a novel model-based Bayesian method for interpretable policy learning (“Interpole”) that jointly estimates an agent’s (possibly biased) belief-update process together with their (possibly suboptimal) belief-action mapping.

Through experiments on both simulated and real-world data for the problem of Alzheimer’s disease diagnosis, we illustrate the potential of our approach as an investigative device for auditing, quantifying, and understanding human decision-making behavior.

Inverse active sensing
The first paper resulting from this push into new territory was titled “Inverse Active Sensing: Modeling and Understanding Timely Decision-Making,” and was published at ICML 2020. The paper takes the familiar concept of active sensing (the goal-oriented problem of efficiently selecting which information to acquire, and when and what decision to settle on) and inverts it, seeking to uncover an agent’s preferences and strategy for acquiring information given their observable decision-making behavior.

Inverse active sensing has a diverse range of potential applications both in and beyond healthcare. A particularly salient application might be understanding decision-making around diagnosis of patients. For instance, we expect doctors to care much more about correctly diagnosing a lethal disease than another condition that presents with similar symptoms, but do they actually? By how much? Inverse active sensing can help us answer questions like these by uncovering preferences that effectively underlie observed decision behavior.

Inverse Active Sensing: Modeling and Understanding Timely Decision-Making

Daniel Jarrett, Mihaela van der Schaar

ICML 2020

Evidence-based decision-making entails collecting (costly) observations about an underlying phenomenon of interest, and subsequently committing to an (informed) decision on the basis of accumulated evidence. In this setting, active sensing is the goal-oriented problem of efficiently selecting which acquisitions to make, and when and what decision to settle on. As its complement, inverse active sensing seeks to uncover an agent’s preferences and strategy given their observable decision-making behavior.

In this paper, we develop an expressive, unified framework for the general setting of evidence-based decision-making under endogenous, context-dependent time pressure—which requires negotiating (subjective) tradeoffs between accuracy, speediness, and cost of information. Using this language, we demonstrate how it enables modeling intuitive notions of surprise, suspense, and optimality in decision strategies (the forward problem).

Finally, we illustrate how this formulation enables understanding decision-making behavior by quantifying preferences implicit in observed decision strategies (the inverse problem).

The path ahead

The work above showcases our first few tentative steps into quantitative epistemology. We have committed to substantial further investment of our lab’s time and resources on a long-term basis.

Quantitative epistemology can yield fascinating new insights into how humans learn and make decisions, and can bring about a new type of human-machine partnership based on empowerment, not replacement. While existing approaches (shown in blue above) can be incorporated into our research and recent work by our own lab (shown in purple) has helped us lay a partial foundation for this new area of research, we are truly entering uncharted territory. There are many complex questions (whosn in green) to explore, and practically unlimited new discoveries to make. Our sincere hope is that our readers will share our vision for quantitative epistemology, and consider developing new machine learning methods within the quantitative epistemology agenda.

Going forward, our priorities will be:
– to hone our vision for what quantitative epistemology can become, how we can create a new human-machine partnership, and how in practice this can deliver social benefit (in and beyond healthcare);
– to construct a comprehensive theoretical foundation that will serve as the basis for development of models and methods (our ICML 2021 paper on inverse decision modeling is just one initial example of this); and
– to solve specific real-world problems in partnership with our network of clinical collaborators (such as the Alzheimer’s diagnosis example earlier on this page), while also using newly developed approaches to support clinical auditing, address variation in practice, and encourage the introduction of more quantitative and principled clinical guidelines in complex areas such as cancer and transplantation.

As we continue to expand the boundaries of quantitative epistemology ever further, this page will serve as a living map documenting our latest discoveries and reflecting our evolving understanding of this brand new area of research. Please continue to check back here for the latest updates.

You can find our related publications here.

Videos: NeurIPS 2021, ICML 2021, and Inspiration Exchange engagement session

This invited talk, entitled “Quantitative epistemology – empowering human meta-learning using machine learning,” was given by Mihaela van der Schaar on December 13, 2021, as part of the Workshop on Meta-Learning (MetaLearn) running alongside NeurIPS 2021.

On July 23, 2021, Mihaela van der Schaar gave a keynote talk entitled “Quantitative epistemology – conceiving a new human-machine partnership” as part of the ICML 2021 Interpretable Machine Learning in Healthcare (IMLH) Workshop.

The full talk can be found below, and is highly recommended viewing for anyone who would like to know more or get involved in the quantitative epistemology research agenda.

Our primary means of building a shared vision for machine learning for healthcare is through two groups of online engagement sessions: Inspiration Exchange (for machine learning students) and Revolutionizing Healthcare (for the healthcare community). If you would like to get involved, please visit the page below.

On July 12, 2021, our lab held an Inspiration Exchange session dedicated to introducing quantitative epistemology—including theory, approaches, and future directions for this brand new research area. The recording is available directly below.