van der Schaar Lab

Quantitative epistemology: conceiving a new human-machine partnership

This page is authored and maintained by Mihaela van der Schaar and Nick Maxfield.

Pioneering a new field of research

Quantitative epistemology is a new and transformationally significant research pillar pioneered by the van der Schaar Lab. The purpose of this research is to develop a strand of machine learning aimed at understanding, supporting, and improving human decision-making. We aim to do so by building machine learning models of decision-making—including how humans acquire and learn from new information, establish and update their beliefs, and act on the basis of their cumulative knowledge. Because our approach is driven by observational data in studying knowledge as well as using machine learning methods for supporting and improving knowledge acquisition and its impact on decision-making, we call this “quantitative epistemology.”

Our methods are aimed at studying human decision-making, identifying potential suboptimalities in beliefs and decision processes (such as cognitive biases, selective attention, imperfect retention of past experience, etc.), and understanding risk attitudes and their implications for learning and decision-making. This would allow us to construct decision support systems that provide humans with information pertinent to their intended actions, their possible alternatives and counterfactual outcomes, as well as other evidence to empower better decision-making.

Revisiting the roots of human (meta-)learning

Quantitative epistemology draws inspiration from the field of meta-learning. While meta-learning is arguably best-known today as a subfield of machine learning, in this case we are referring the original meaning of the term within the domains of social psychology and education—as coined by Donald B. Maudsley in his 1979 book entitled A theory of meta-learning and principles of facilitation : an organismic perspective.

Maudsley defined meta-learning as “the process by which learners become aware of and increasingly in control of habits of perception, inquiry, learning, and growth.” He put forward five requirements learners must observe in order to practice meta-learning successfully:
– Have a theory;
– Work in a supportive environment;
– Discover their rules and assumptions;
– Reconnect with reality-information from the environment; and
– Reorganize themselves by changing their rules/assumptions.

In reality, meta-learning remains extremely difficult for humans, even when the five requirements listed above are met. Our goal for quantitative epistemology, therefore, was to develop a new machine learning field aiming to empower humans to perform meta-learning. Our vision is to use machine learning to serve they purpose defined by Maudsley by empowering humans to improve and control their own perception, inquiry, learning, and growth—as well as their decision-making.

This is in keeping with our lab’s overall vision of using machine learning to learn human intelligence with the aim of empowering humans—rather than empowering machine intelligence.

A human-machine partnership based on empowerment, not replacement

As mentioned above, it is important to distinguish quantitative epistemology from existing work in AI and machine learning, such as imitation learning (i.e. replicating expert actions) and apprenticeship learning (i.e. matching expert returns), both of which intend to construct autonomous agents that can mimic and replace human demonstrators. Instead, we are concerned with leveraging machine learning to help humans become better decision-makers.

Quantitative epistemology entails developing machine learning models that capture how humans acquire new information, how they pay attention to such information, how their beliefs may be represented, how their internal models may be structured, how these different levels of knowledge are leveraged in the form of actions, and how such knowledge is learned and updated over time.

Quantitative epistemology envisages a new human-machine partnership in which machines support and empower humans, rather than replacing them.

The figure below depicts the broad strokes of this partnership in terms of long-term cycles in which a theory of meta-learning is built and continually honed, and in which humans are constantly being empowered to control their growth, perception, inquiry, learning, and decision-making.

Starting at the bottom left of the figure and moving clockwise:
1. humans act and perform meta-learning;
2. assumptions, structures, and rules, etc., can be studied using machine learning (quantitative epistemology) and developed into meta-learning models;
3. we can use these behavior models to distil hypotheses about meta-learning;
4. through the scientific process, we can build these hypotheses into a comprehensive and quantitative theory of meta-learning;
5a. we can reconnect this theory with reality-information and improve it cyclically over time;
5b. this process can also provide new advice, empowering humans to grow and further hone their perception, inquiry, learning, and decision-making.

Note: our use of “meta-learning models” here refers to models that examine the individual-specific thought processes and tendencies or biases that influence how humans make decisions when presented with specific information. Such models can examine characteristics including (but not limited to) an individual’s capacity for flexibility or adaptivity, tolerance of risk, or degree of optimism, and can also identify context-specific factors that drive changes in these characteristics. For instance, such models may identify that certain clinicians tend to be less optimistic when diagnosing patients at risk, or they may show how optimism and confirmation bias could lead to similar but differentiable behavior.

We can also use quantitative epistemology to build the “supportive environment” Maudsley defined as a requirement for successful meta-learning.

Starting at the very bottom of the figure and moving clockwise:
1. as in the previous figure, humans act and perform meta-learning;
2. machine learning tools (quantitative epistemology) can understand these decisions by building meta-learning models, identifying potential biases, errors, and inconsistencies, and providing advice;
3. humans are provided with this information;
4. humans inform the machine learning tools whether the adjustments or corrections provided about their behavior are effective or not, and offer clarifications about their decisions as well as rating the advice provided to them;
5. this serves to improve the understanding of the quantitative epistemology machine learning tools, driving a cycle that can further empower humans.

Applications of quantitative epistemology

Broadly, we currently see four potential areas of application for quantitative epistemology, none of which are limited to healthcare:

1. Decision Support
This is arguably the most intuitive and straightforward application of understanding human decision-making. In medicine, for example, we can combine a meaningful understanding of the basis on which decisions are made with normative standards for optimal decision-making in areas such as diagnosis, treatment, and resource allocation.

Furthermore, we can apply quantitative epistemology in a single-agent or multi-agent setting, using our understanding of decision-making to optimize decision-making across multiple individuals or groups, whether in a co-operative or a competitive setting.

2. Analysis of variation
In many fields such as healthcare, there is often remarkable regional, institutional, and subgroup-level variability in practice. This variability renders detection and quantification of biases crucial.

Quantitative epistemology can yield powerful tools to audit clinical decision-making to investigate variation in practice, biases, and sub-optimal decision-making, and understand where improvements can be made.

3. (Re)-Definition of Normative Standards
There are many areas in which normative standards have not been defined, or may need to be continually redefined. Through the application of quantitative epistemology, we can determine whether normative standards are realistic and effective representations of desired outcomes, enabling policy-makers to design better policies going forward.

4. Education and training
Quantitative epistemology aims to produce a data-driven, quantitative—and most importantly interpretable—description of the process by which humans form and adapt their beliefs and understanding of the world. This could yield enormous benefit in education and training: both the content and instructional methods employed in courses could be extensively tailored to specific individuals, taking into account their learning styles, biases, and preferences.

This section showcases the potential utility of quantitative epistemology as an investigative approach for auditing and quantifying individual decisions in the healthcare domain. The method demonstrated here is INTERPOLE, which was introduced in a paper published at ICLR 2021 (abstract and further details are provided below).

INTERPOLE is a model for interpretable policy learning that seeks to model the evolution of an agent’s beliefs and provide a concrete basis for analyzing the corresponding sequence of actions taken. Sequential observations are aggregated through a decision-maker’s belief-update process, and sequential actions are determined by the agent’s probabilistic belief-action mapping.

The example given below uses real-world diagnosis patterns from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database over the course of a sequence of 6-monthly patient visits.

The state space consists of normal functioning (“NL”), mild cognitive impairment (“MCI”), and dementia. For the action space, we consider the decision problem of ordering vs. not ordering an MRI test, which (while often informative of Alzheimer’s) is financially costly.

In the figure above, each vertex of the belief simplex corresponds to one of the three stable diagnoses, and each point in the simplex corresponds to a unique belief (i.e. probability distribution). The closer the point is to a vertex (i.e. state), the higher the probability assigned to that state.

The figure gives examples for real patients, including: (a) a typical normally-functioning patient, where the decision-maker’s beliefs remain mostly on the decision boundary; (b) a typical patient who is believed to be deteriorating towards dementia; (c) a patient who—apparently—could have been diagnosed much earlier than they actually were; and (d) a patient with a (seemingly redundant) MRI test that is actually highly informative.

Explaining trajectories
Patients (a) and (b) are “typical” patients who fit well to the overall learned policy. The former is a normally-functioning patient believed to remain around the decision boundary in all visits except the first; appropriately, they are ordered an MRI during approximately half of their visits. The latter is believed to be deteriorating from MCI towards dementia, hence prescribed an MRI in all visits.

Identifying belated diagnoses
In many diseases, early diagnosis is paramount. Using quantitative epistemology approaches such as INTERPOLE, we can detect patients who appear to have been diagnosed significantly later than they should have.

Patient (c), for example, was ordered an MRI in neither of their first two visits despite the fact that the “typical” policy would have strongly recommended one. At a third visit, the MRI that was finally ordered led to near-certainty of cognitive impairment but this could have been known 12 months earlier! In fact, among all ADNI patients in the database, 6.5% were subject to this apparent pattern of “belatedness”, where a late MRI is immediately followed by a jump to near-certain deterioration.

Quantifying the value of information
Patient (d) highlights how quantitative epistemology can be used to quantify the value of a test in terms of its information gain.

While the patient was ordered an MRI in all of their visits, it may appear (on the surface) that the third and final MRIs were redundant—since they had little apparent affect on beliefs. However, this is only true for the factual belief update that occurred according to the MRI outcome that was actually observed. Having access to an estimated model of how beliefs are updated in the form of decision dynamics, we can also compute counterfactual belief updates—that is belief updates that could have occurred if the MRI outcome in question were to be different.

In the particular case of patient (d), the tests were in fact highly informative, since (as it happened) the patient’s CDR-SB scores were suggestive of impairment, and (in the counterfactual) the doctor’s beliefs could have potentially leapt drastically towards MCI.

Clinician evaluation of INTERPOLE
We evaluated INTERPOLE by consulting nine clinicians from four different countries (United States, United Kingdom, the Netherlands, and China) for feedback.

To determine whether decision dynamics are a transparent way of modeling how information is aggregated by decision-makers, we presented the clinicians with the medical history of an example patient represented in three ways, using: i) only the most recent action-observation, ii) the complete action-observation trajectory, and iii) the belief trajectory as recovered by INTERPOLE. All nine clinicians preferred the belief trajectories over action-observation trajectories.

We also sought to establish whether the proposed representation of (possibly suboptimal) decision boundaries is a more transparent way of describing policies, compared with the representation of reward functions. To do this, we showed the clinicians the policies learned from ADNI in the form of decision boundaries (i.e. INTERPOLE) and reward functions. Seven out of the nine clinicians preferred the representation in terms of decision boundaries.

Further details regarding INTERPOLE can be found below. For more information on our work related to Alzheimer’s, click here.

Intersection with other areas of research

Quantitative epistemology will complement and build upon projects across the lab’s other key research areas, including decision support systems, predictive analytics, automated ML, individualized treatment effect inference, interpretability, synthetic data, and more.

These points of intersection (and the immense potential for additional intersection) should be clear from the following descriptions of some of our initial projects related to quantitative epistemology.

Our work so far

Quantitative epistemology has become an area of significant focus for our lab’s researchers in recent years. Some of our first papers are shared below.

Inverse active sensing
The first paper resulting from this push into new territory was titled “Inverse Active Sensing: Modeling and Understanding Timely Decision-Making,” and was published at ICML 2020. The paper takes the familiar concept of active sensing (the goal-oriented problem of efficiently selecting which information to acquire, and when and what decision to settle on) and inverts it, seeking to uncover an agent’s preferences and strategy for acquiring information given their observable decision-making behavior.

Inverse active sensing has a diverse range of potential applications both in and beyond healthcare. A particularly salient application might be understanding decision-making around diagnosis of patients. For instance, we expect doctors to care much more about correctly diagnosing a lethal disease than another condition that presents with similar symptoms, but do they actually? By how much? Inverse active sensing can help us answer questions like these by uncovering preferences that effectively underlie observed decision behavior.

Inverse Active Sensing: Modeling and Understanding Timely Decision-Making

Daniel Jarrett, Mihaela van der Schaar

ICML 2020

Evidence-based decision-making entails collecting (costly) observations about an underlying phenomenon of interest, and subsequently committing to an (informed) decision on the basis of accumulated evidence. In this setting, active sensing is the goal-oriented problem of efficiently selecting which acquisitions to make, and when and what decision to settle on. As its complement, inverse active sensing seeks to uncover an agent’s preferences and strategy given their observable decision-making behavior.

In this paper, we develop an expressive, unified framework for the general setting of evidence-based decision-making under endogenous, context-dependent time pressure—which requires negotiating (subjective) tradeoffs between accuracy, speediness, and cost of information. Using this language, we demonstrate how it enables modeling intuitive notions of surprise, suspense, and optimality in decision strategies (the forward problem).

Finally, we illustrate how this formulation enables understanding decision-making behavior by quantifying preferences implicit in observed decision strategies (the inverse problem).

Interpretable policy learning (INTERPOLE)
The motivation behind INTERPOLE, introduced in a paper published at ICLR 2021, was to create a transparent description of behavior capable of locating the factors that contribute to individual decisions, in a language that can readily understood by domain experts. Classical imitation learning approaches incorporate black-box hidden states that are rarely amenable to meaningful interpretation, while apprenticeship learning algorithms only offer high-level reward mappings that are not informative as to individual actions observed in the data. Additionally, INTERPOLE aims to accommodate partial observability, and operate completely offline.

During our work on INTERPOLE, we conducted experiments on both simulated and real-world data for the problem of Alzheimer’s disease diagnosis. We then sought feedback on our approach through a survey of 9 clinicians, who expressed an overwhelming preference for INTERPOLE by comparison with other potential approaches. Further details are provided earlier on this page.

Explaining by Imitating: Understanding Decisions by Interpretable Policy Learning

Alihan Hüyük, Daniel Jarrett, Cem Tekin, Mihaela van der Schaar

ICLR 2021

Understanding human behavior from observed data is critical for transparency and accountability in decision-making. Consider real-world settings such as healthcare, in which modeling a decision-maker’s policy is challenging—with no access to underlying states, no knowledge of environment dynamics, and no allowance for live experimentation. We desire learning a data-driven representation of decision- making behavior that (1) inheres transparency by design, (2) accommodates partial observability, and (3) operates completely offline.

To satisfy these key criteria, we propose a novel model-based Bayesian method for interpretable policy learning (“Interpole”) that jointly estimates an agent’s (possibly biased) belief-update process together with their (possibly suboptimal) belief-action mapping.

Through experiments on both simulated and real-world data for the problem of Alzheimer’s disease diagnosis, we illustrate the potential of our approach as an investigative device for auditing, quantifying, and understanding human decision-making behavior.

Counterfactual inverse reinforcement learning (CIRL)
In a paper published at ICLR 2021, we proposed learning explanations of expert decisions by modeling their reward function in terms of preferences with respect to counterfactual “what if” outcomes. In healthcare, for example, treatments often affect several patient covariates, by having both benefits and side-effects; decision-makers often make choices based on their preferences over these outcomes. By presenting decision-makers with counterfactuals, we can present them with potential outcomes of a particular action and model their preferences and reward functions. In the context of healthcare, doing this could enable us to quantify and inspect policies in different institutions and uncover the trade-offs and preferences associated with expert actions, as well as revealing the tendencies of individual practitioners to treat various diseases more or less aggressively.

Learning “What-if” Explanations for Sequential Decision-Making

Ioana Bica, Daniel Jarrett, Alihan Hüyük, Mihaela van der Schaar

ICLR 2021

Building interpretable parameterizations of real-world decision-making on the basis of demonstrated behavior–i.e. trajectories of observations and actions made by an expert maximizing some unknown reward function–is essential for introspecting and auditing policies in different institutions.

In this paper, we propose learning explanations of expert decisions by modeling their reward function in terms of preferences with respect to “”what if” outcomes: Given the current history of observations, what would happen if we took a particular action? To learn these cost-benefit tradeoffs associated with the expert’s actions, we integrate counterfactual reasoning into batch inverse reinforcement learning. This offers a principled way of defining reward functions and explaining expert behavior, and also satisfies the constraints of real-world decision-making—where active experimentation is often impossible (e.g. in healthcare). Additionally, by estimating the effects of different actions, counterfactuals readily tackle the off-policy nature of policy evaluation in the batch setting, and can naturally accommodate settings where the expert policies depend on histories of observations rather than just current states.

Through illustrative experiments in both real and simulated medical environments, we highlight the effectiveness of our batch, counterfactual inverse reinforcement learning approach in recovering accurate and interpretable descriptions of behavior.

Approximate Variational Reward Imitation Learning (AVRIL)
AVRIL, presented in a paper published at ICLR 2021, offers yet another potential approach to addressing the problem of studying decision-making in settings in which there is no access to knowledge of the environment dynamics nor intrinsic reward, nor even the ability to interact and test policies. As explained directly below, AVRIL offers reward inference in environments beyond the scope of current methods, as well as task performance competitive with focused offline imitation learning algorithms.

Scalable Bayesian Inverse Reinforcement Learning

Alex Chan, Mihaela van der Schaar

ICLR 2021

Bayesian inference over the reward presents an ideal solution to the ill-posed nature of the inverse reinforcement learning problem. Unfortunately current methods generally do not scale well beyond the small tabular setting due to the need for an inner-loop MDP solver, and even non-Bayesian methods that do themselves scale often require extensive interaction with the environment to perform well, being inappropriate for high stakes or costly applications such as healthcare.

In this paper we introduce our method, Approximate Variational Reward Imitation Learning (AVRIL), that addresses both of these issues by jointly learning an approximate posterior distribution over the reward that scales to arbitrarily complicated state spaces alongside an appropriate policy in a completely offline manner through a variational approach to said latent reward.

Applying our method to real medical data alongside classic control simulations, we demonstrate Bayesian reward inference in environments beyond the scope of current methods, as well as task performance competitive with focused offline imitation learning algorithms.

Inverse decision modeling (IDM)
In a paper accepted for publication at ICML 2021, we developed an expressive, unifying perspective on inverse decision modeling (IDM): a framework for learning parameterized representations of sequential decision behavior.

IDM enables us to quantify intuitive notions of bounded rationality—such as the apparent flexibility of decisions, tolerance for surprise, or optimism in beliefs—while also making such representations interpretable. In presenting IDM, we highlight its potential utility in real-world settings as an investigative device for auditing and understanding human decision-making.

Inverse Decision Modeling: Learning Interpretable Representations of Behavior

Daniel Jarrett, Alihan Hüyük, Mihaela van der Schaar

ICML 2021

Decision analysis deals with modeling and enhancing decision processes. A principal challenge in improving behavior is in obtaining a transparent description of existing behavior in the first place.

In this paper, we develop an expressive, unifying perspective on inverse decision modeling: a framework for learning parameterized representations of sequential decision behavior.

First, we formalize the forward problem (as a normative standard), subsuming common classes of control behavior.

Second, we use this to formalize the inverse problem (as a descriptive model), generalizing existing work on imitation/reward learning—while opening up a much broader class of research problems in behavior representation.

Finally, we instantiate this approach with an example (inverse bounded rational control), illustrating how this structure enables learning (interpretable) representations of (bounded) rationality—while naturally capturing intuitive notions of suboptimal actions, biased beliefs, and imperfect knowledge of environments.

The path ahead

The work above showcases our first few tentative steps into quantitative epistemology. We have committed to substantial further investment of our lab’s time and resources on a long-term basis.

Quantitative epistemology can yield fascinating new insights into how humans learn and make decisions, and can bring about a new type of human-machine partnership based on empowerment, not replacement. While existing approaches (shown in blue above) can be incorporated into our research and recent work by our own lab (shown in purple) has helped us lay a partial foundation for this new area of research, we are truly entering uncharted territory. There are many complex questions (whosn in green) to explore, and practically unlimited new discoveries to make. Our sincere hope is that our readers will share our vision for quantitative epistemology, and consider developing new machine learning methods within the quantitative epistemology agenda.

Going forward, our priorities will be:
– to hone our vision for what quantitative epistemology can become, how we can create a new human-machine partnership, and how in practice this can deliver social benefit (in and beyond healthcare);
– to construct a comprehensive theoretical foundation that will serve as the basis for development of models and methods (our ICML 2021 paper on inverse decision modeling is just one initial example of this); and
– to solve specific real-world problems in partnership with our network of clinical collaborators (such as the Alzheimer’s diagnosis example earlier on this page), while also using newly developed approaches to support clinical auditing, address variation in practice, and encourage the introduction of more quantitative and principled clinical guidelines in complex areas such as cancer and transplantation.

As we continue to expand the boundaries of quantitative epistemology ever further, this page will serve as a living map documenting our latest discoveries and reflecting our evolving understanding of this brand new area of research. Please continue to check back here for the latest updates.

You can find our related publications here.

Videos: NeurIPS 2021, ICML 2021, and Inspiration Exchange engagement session

This invited talk, entitled “Quantitative epistemology – empowering human meta-learning using machine learning,” was given by Mihaela van der Schaar on December 13, 2021, as part of the Workshop on Meta-Learning (MetaLearn) running alongside NeurIPS 2021.

On July 23, 2021, Mihaela van der Schaar gave a keynote talk entitled “Quantitative epistemology – conceiving a new human-machine partnership” as part of the ICML 2021 Interpretable Machine Learning in Healthcare (IMLH) Workshop.

The full talk can be found below, and is highly recommended viewing for anyone who would like to know more or get involved in the quantitative epistemology research agenda.

Our primary means of building a shared vision for machine learning for healthcare is through two groups of online engagement sessions: Inspiration Exchange (for machine learning students) and Revolutionizing Healthcare (for the healthcare community). If you would like to get involved, please visit the page below.

On July 12, 2021, our lab held an Inspiration Exchange session dedicated to introducing quantitative epistemology—including theory, approaches, and future directions for this brand new research area. The recording is available directly below.