This page showcases the latest research in, and theoretical underpinnings of, the area of quantitative epistemology. It is a living document, the content of which will evolve as we continue to develop approaches and build a vision for this new research area.
- Pioneering a new field of research
- Applications of quantitative epistemology
- Intersection with other areas of research
- Our work so far
- Inverse Active Sensing: Modeling and Understanding Timely Decision-Making
- Explaining by Imitating: Understanding Decisions by Interpretable Policy Learning
- Learning “What-if” Explanations for Sequential Decision-Making
- Scalable Bayesian Inverse Reinforcement Learning
- Inverse Decision Modeling: Learning Interpretable Representations of Behavior
- The path ahead
Pioneering a new field of research
Quantitative epistemology is a new and transformationally significant research pillar pioneered by the van der Schaar Lab. The purpose of this research is to develop a strand of machine learning aimed at understanding, supporting, and improving human decision-making. We aim to do so by building machine learning models of decision-making, including how humans acquire and learn from new information, establish and update their beliefs, and act on the basis of their cumulative knowledge. Because our approach is driven by observational data in studying knowledge as well as using machine learning methods for supporting and improving knowledge acquisition and its impact on decision-making, we call this “quantitative epistemology.”
It is important to distinguish this pursuit from existing work in imitation learning (i.e. replicating expert actions) and apprenticeship learning (i.e. matching expert returns), both of which intend to construct autonomous agents that can mimic and replace human demonstrators. Instead, we are concerned with leveraging machine learning to help humans become better decision-makers.
We develop machine learning models that capture how humans acquire new information, how they pay attention to such information, how their beliefs may be represented, how their internal models may be structured, how these different levels of knowledge are leveraged in the form of actions, and how such knowledge is learned and updated over time. Our methods are aimed at studying human decision-making, identifying potential suboptimalities in beliefs and decision processes (such as cognitive biases, selective attention, imperfect retention of past experience, etc.), and understanding risk attitudes and their implications for learning and decision-making. This would allow us to construct decision support systems that provide humans with information pertinent to their intended actions, their possible alternatives and counterfactual outcomes, as well as other evidence to empower better decision-making.
Applications of quantitative epistemology
Broadly, we currently see four potential areas of application for quantitative epistemology, none of which are limited to healthcare:
1. Decision Support
This is arguably the most intuitive and straightforward application of understanding human decision-making. In medicine, for example, we can combine a meaningful understanding of the basis on which decisions are made with normative standards for optimal decision-making in areas such as diagnosis, treatment, and resource allocation.
Furthermore, we can apply quantitative epistemology in a single-agent or multi-agent setting, using our understanding of decision-making to optimize decision-making across multiple individuals or groups, whether in a co-operative or a competitive setting.
2. Analysis of variation
In many fields such as healthcare, there is often remarkable regional, institutional, and subgroup-level variability in practice. This variability renders detection and quantification of biases crucial.
Quantitative epistemology can yield powerful tools to audit clinical decision-making to investigate variation in practice, biases, and sub-optimal decision-making, and understand where improvements can be made.
3. (Re)-Definition of Normative Standards
There are many areas in which normative standards have not been defined, or may need to be continually redefined. Through the application of quantitative epistemology, we can determine whether normative standards are realistic and effective representations of desired outcomes, enabling policy-makers to design better policies going forward.
4. Education and training
Quantitative epistemology aims to produce a data-driven, quantitative—and most importantly interpretable—description of the process by which humans form and adapt their beliefs and understanding of the world. This could yield enormous benefit in education and training: both the content and instructional methods employed in courses could be extensively tailored to specific individuals, taking into account their learning styles, biases, and preferences.
Quantitative epistemology in action: decision trajectories for Alzheimer’s patients
This section showcases the potential utility of quantitative epistemology as an investigative approach for auditing and quantifying individual decisions in the healthcare domain. The method demonstrated here is INTERPOLE, which was introduced in a paper published at ICLR 2021 (abstract and further details are provided below).
INTERPOLE is a model for interpretable policy learning that seeks to model the evolution of an agent’s beliefs and provide a concrete basis for analyzing the corresponding sequence of actions taken. Sequential observations are aggregated through a decision-maker’s belief-update process, and sequential actions are determined by the agent’s probabilistic belief-action mapping.
The example given below uses real-world diagnosis patterns from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database over the course of a sequence of 6-monthly patient visits.
The state space consists of normal functioning (“NL”), mild cognitive impairment (“MCI”), and dementia. For the action space, we consider the decision problem of ordering vs. not ordering an MRI test, which (while often informative of Alzheimer’s) is financially costly.
In the figure above, each vertex of the belief simplex corresponds to one of the three stable diagnoses, and each point in the simplex corresponds to a unique belief (i.e. probability distribution). The closer the point is to a vertex (i.e. state), the higher the probability assigned to that state.
The figure gives examples for real patients, including: (a) a typical normally-functioning patient, where the decision-maker’s beliefs remain mostly on the decision boundary; (b) a typical patient who is believed to be deteriorating towards dementia; (c) a patient who—apparently—could have been diagnosed much earlier than they actually were; and (d) a patient with a (seemingly redundant) MRI test that is actually highly informative.
Patients (a) and (b) are “typical” patients who fit well to the overall learned policy. The former is a normally-functioning patient believed to remain around the decision boundary in all visits except the first; appropriately, they are ordered an MRI during approximately half of their visits. The latter is believed to be deteriorating from MCI towards dementia, hence prescribed an MRI in all visits.
Identifying belated diagnoses
In many diseases, early diagnosis is paramount. Using quantitative epistemology approaches such as INTERPOLE, we can detect patients who appear to have been diagnosed significantly later than they should have.
Patient (c), for example, was ordered an MRI in neither of their first two visits despite the fact that the “typical” policy would have strongly recommended one. At a third visit, the MRI that was finally ordered led to near-certainty of cognitive impairment but this could have been known 12 months earlier! In fact, among all ADNI patients in the database, 6.5% were subject to this apparent pattern of “belatedness”, where a late MRI is immediately followed by a jump to near-certain deterioration.
Quantifying the value of information
Patient (d) highlights how quantitative epistemology can be used to quantify the value of a test in terms of its information gain.
While the patient was ordered an MRI in all of their visits, it may appear (on the surface) that the third and final MRIs were redundant—since they had little apparent affect on beliefs. However, this is only true for the factual belief update that occurred according to the MRI outcome that was actually observed. Having access to an estimated model of how beliefs are updated in the form of decision dynamics, we can also compute counterfactual belief updates—that is belief updates that could have occurred if the MRI outcome in question were to be different.
In the particular case of patient (d), the tests were in fact highly informative, since (as it happened) the patient’s CDR-SB scores were suggestive of impairment, and (in the counterfactual) the doctor’s beliefs could have potentially leapt drastically towards MCI.
Clinician evaluation of INTERPOLE
We evaluated INTERPOLE by consulting nine clinicians from four different countries (United States, United Kingdom, the Netherlands, and China) for feedback.
To determine whether decision dynamics are a transparent way of modeling how information is aggregated by decision-makers, we presented the clinicians with the medical history of an example patient represented in three ways, using: i) only the most recent action-observation, ii) the complete action-observation trajectory, and iii) the belief trajectory as recovered by INTERPOLE. All nine clinicians preferred the belief trajectories over action-observation trajectories.
We also sought to establish whether the proposed representation of (possibly suboptimal) decision boundaries is a more transparent way of describing policies, compared with the representation of reward functions. To do this, we showed the clinicians the policies learned from ADNI in the form of decision boundaries (i.e. INTERPOLE) and reward functions. Seven out of the nine clinicians preferred the representation in terms of decision boundaries.
Further details regarding INTERPOLE can be found below. For more information on our work related to Alzheimer’s, click here.
Intersection with other areas of research
Quantitative epistemology will complement and build upon projects across the lab’s other key research areas, including decision support systems, predictive analytics, automated ML, individualized treatment effect inference, interpretability, synthetic data, and more.
These points of intersection (and the immense potential for additional intersection) should be clear from the following descriptions of some of our initial projects related to quantitative epistemology.
Our work so far
Quantitative epistemology has become an area of significant focus for our lab’s researchers in recent years. Some of our first papers are shared below.
Inverse active sensing
The first paper resulting from this push into new territory was titled “Inverse Active Sensing: Modeling and Understanding Timely Decision-Making,” and was published at ICML 2020. The paper takes the familiar concept of active sensing (the goal-oriented problem of efficiently selecting which information to acquire, and when and what decision to settle on) and inverts it, seeking to uncover an agent’s preferences and strategy for acquiring information given their observable decision-making behavior.
Inverse active sensing has a diverse range of potential applications both in and beyond healthcare. A particularly salient application might be understanding decision-making around diagnosis of patients. For instance, we expect doctors to care much more about correctly diagnosing a lethal disease than another condition that presents with similar symptoms, but do they actually? By how much? Inverse active sensing can help us answer questions like these by uncovering preferences that effectively underlie observed decision behavior.
Inverse Active Sensing: Modeling and Understanding Timely Decision-Making
Daniel Jarrett, Mihaela van der Schaar
Evidence-based decision-making entails collecting (costly) observations about an underlying phenomenon of interest, and subsequently committing to an (informed) decision on the basis of accumulated evidence. In this setting, active sensing is the goal-oriented problem of efficiently selecting which acquisitions to make, and when and what decision to settle on. As its complement, inverse active sensing seeks to uncover an agent’s preferences and strategy given their observable decision-making behavior.
In this paper, we develop an expressive, unified framework for the general setting of evidence-based decision-making under endogenous, context-dependent time pressure—which requires negotiating (subjective) tradeoffs between accuracy, speediness, and cost of information. Using this language, we demonstrate how it enables modeling intuitive notions of surprise, suspense, and optimality in decision strategies (the forward problem).
Finally, we illustrate how this formulation enables understanding decision-making behavior by quantifying preferences implicit in observed decision strategies (the inverse problem).
Interpretable policy learning (INTERPOLE)
The motivation behind INTERPOLE, introduced in a paper published at ICLR 2021, was to create a transparent description of behavior capable of locating the factors that contribute to individual decisions, in a language that can readily understood by domain experts. Classical imitation learning approaches incorporate black-box hidden states that are rarely amenable to meaningful interpretation, while apprenticeship learning algorithms only offer high-level reward mappings that are not informative as to individual actions observed in the data. Additionally, INTERPOLE aims to accommodate partial observability, and operate completely offline.
During our work on INTERPOLE, we conducted experiments on both simulated and real-world data for the problem of Alzheimer’s disease diagnosis. We then sought feedback on our approach through a survey of 9 clinicians, who expressed an overwhelming preference for INTERPOLE by comparison with other potential approaches. Further details are provided earlier on this page.
Explaining by Imitating: Understanding Decisions by Interpretable Policy Learning
Alihan Hüyük, Daniel Jarrett, Cem Tekin, Mihaela van der Schaar
Understanding human behavior from observed data is critical for transparency and accountability in decision-making. Consider real-world settings such as healthcare, in which modeling a decision-maker’s policy is challenging—with no access to underlying states, no knowledge of environment dynamics, and no allowance for live experimentation. We desire learning a data-driven representation of decision- making behavior that (1) inheres transparency by design, (2) accommodates partial observability, and (3) operates completely offline.
To satisfy these key criteria, we propose a novel model-based Bayesian method for interpretable policy learning (“Interpole”) that jointly estimates an agent’s (possibly biased) belief-update process together with their (possibly suboptimal) belief-action mapping.
Through experiments on both simulated and real-world data for the problem of Alzheimer’s disease diagnosis, we illustrate the potential of our approach as an investigative device for auditing, quantifying, and understanding human decision-making behavior.
Counterfactual inverse reinforcement learning (CIRL)
In a paper published at ICLR 2021, we proposed learning explanations of expert decisions by modeling their reward function in terms of preferences with respect to counterfactual “what if” outcomes. In healthcare, for example, treatments often affect several patient covariates, by having both benefits and side-effects; decision-makers often make choices based on their preferences over these outcomes. By presenting decision-makers with counterfactuals, we can present them with potential outcomes of a particular action and model their preferences and reward functions. In the context of healthcare, doing this could enable us to quantify and inspect policies in different institutions and uncover the trade-offs and preferences associated with expert actions, as well as revealing the tendencies of individual practitioners to treat various diseases more or less aggressively.
Learning “What-if” Explanations for Sequential Decision-Making
Ioana Bica, Daniel Jarrett, Alihan Hüyük, Mihaela van der Schaar
Building interpretable parameterizations of real-world decision-making on the basis of demonstrated behavior–i.e. trajectories of observations and actions made by an expert maximizing some unknown reward function–is essential for introspecting and auditing policies in different institutions.
In this paper, we propose learning explanations of expert decisions by modeling their reward function in terms of preferences with respect to “”what if” outcomes: Given the current history of observations, what would happen if we took a particular action? To learn these cost-benefit tradeoffs associated with the expert’s actions, we integrate counterfactual reasoning into batch inverse reinforcement learning. This offers a principled way of defining reward functions and explaining expert behavior, and also satisfies the constraints of real-world decision-making—where active experimentation is often impossible (e.g. in healthcare). Additionally, by estimating the effects of different actions, counterfactuals readily tackle the off-policy nature of policy evaluation in the batch setting, and can naturally accommodate settings where the expert policies depend on histories of observations rather than just current states.
Through illustrative experiments in both real and simulated medical environments, we highlight the effectiveness of our batch, counterfactual inverse reinforcement learning approach in recovering accurate and interpretable descriptions of behavior.
Approximate Variational Reward Imitation Learning (AVRIL)
AVRIL, presented in a paper published at ICLR 2021, offers yet another potential approach to addressing the problem of studying decision-making in settings in which there is no access to knowledge of the environment dynamics nor intrinsic reward, nor even the ability to interact and test policies. As explained directly below, AVRIL offers reward inference in environments beyond the scope of current methods, as well as task performance competitive with focused offline imitation learning algorithms.
Scalable Bayesian Inverse Reinforcement Learning
Alex Chan, Mihaela van der Schaar
Bayesian inference over the reward presents an ideal solution to the ill-posed nature of the inverse reinforcement learning problem. Unfortunately current methods generally do not scale well beyond the small tabular setting due to the need for an inner-loop MDP solver, and even non-Bayesian methods that do themselves scale often require extensive interaction with the environment to perform well, being inappropriate for high stakes or costly applications such as healthcare.
In this paper we introduce our method, Approximate Variational Reward Imitation Learning (AVRIL), that addresses both of these issues by jointly learning an approximate posterior distribution over the reward that scales to arbitrarily complicated state spaces alongside an appropriate policy in a completely offline manner through a variational approach to said latent reward.
Applying our method to real medical data alongside classic control simulations, we demonstrate Bayesian reward inference in environments beyond the scope of current methods, as well as task performance competitive with focused offline imitation learning algorithms.
Inverse decision modeling (IDM)
In a paper accepted for publication at ICML 2021, we developed an expressive, unifying perspective on inverse decision modeling (IDM): a framework for learning parameterized representations of sequential decision behavior.
IDM enables us to quantify intuitive notions of bounded rationality—such as the apparent flexibility of decisions, tolerance for surprise, or optimism in beliefs—while also making such representations interpretable. In presenting IDM, we highlight its potential utility in real-world settings as an investigative device for auditing and understanding human decision-making.
Inverse Decision Modeling: Learning Interpretable Representations of Behavior
Daniel Jarrett, Alihan Hüyük, Mihaela van der Schaar
Decision analysis deals with modeling and enhancing decision processes. A principal challenge in improving behavior is in obtaining a transparent description of existing behavior in the first place.
In this paper, we develop an expressive, unifying perspective on inverse decision modeling: a framework for learning parameterized representations of sequential decision behavior.
First, we formalize the forward problem (as a normative standard), subsuming common classes of control behavior.
Second, we use this to formalize the inverse problem (as a descriptive model), generalizing existing work on imitation/reward learning—while opening up a much broader class of research problems in behavior representation.
Finally, we instantiate this approach with an example (inverse bounded rational control), illustrating how this structure enables learning (interpretable) representations of (bounded) rationality—while naturally capturing intuitive notions of suboptimal actions, biased beliefs, and imperfect knowledge of environments.
The path ahead
The work above showcases our first few tentative steps into quantitative epistemology. We have committed to substantial further investment of our lab’s time and resources on a long-term basis.
Going forward, our priorities will be:
– to hone our vision for what quantitative epistemology can become, how we can create a new human-machine partnership, and how in practice this can deliver social benefit (in and beyond healthcare);
– to construct a comprehensive theoretical foundation that will serve as the basis for development of models and methods (our ICML 2021 paper on inverse decision modeling is just one initial example of this); and
– to solve specific real-world problems in partnership with our network of clinical collaborators (such as the Alzheimer’s diagnosis example earlier on this page), while also using newly developed approaches to support clinical auditing, address variation in practice, and encourage the introduction of more quantitative and principled clinical guidelines in complex areas such as cancer and transplantation.
As we continue to expand the boundaries of quantitative epistemology ever further, this page will serve as a living map documenting our latest discoveries and reflecting our evolving understanding of this brand new area of research. Please continue to check back here for the latest updates.
You can find our related publications here.
Our primary means of building a shared vision for machine learning for healthcare is through two groups of online engagement sessions: Inspiration Exchange (for machine learning students) and Revolutionizing Healthcare (for the healthcare community). If you would like to get involved, please visit the page below.
We plan to hold an Inspiration Exchange session dedicated to introducing and discussing quantitative epistemology in June/July 2021—please join us by clicking the button below!