Krikamol Muandet, Max Planck Institute for Intelligent Systems, Tübingen

Counterfactual Policy Evaluation and Optimization in Reproducing Kernel Hilbert Spaces

In this talk, I will discuss the problem of evaluating and learning optimal policies directly from observational (i.e., non-randomized) data using a novel framework called counterfactual mean embedding (CME). Identifying optimal policies is crucial for improving decision-making in online advertisement, finance, economics, and medical diagnosis. Classical approach, which is considered a gold standard for identifying optimal policies in these domains, is randomization. For example, an A/B testing has become one of the standard tools in online advertisement and recommendation systems. In medical domains, developments of new medical treatments depend exclusively on clinical controlled trials. Unfortunately, a randomization in A/B testing and clinical controlled trial may be expensive, time-consuming, unethical, or even impossible to implement in practice. To evaluate the policy from observational data, the CME maps the counterfactual distributions of potential outcomes under different treatments into a reproducing kernel Hilbert space (RKHS). Based on this representation, causal reasoning about the outcomes of different treatments can be performed over the entire landscape of counterfactual distribution using the kernel arsenal. Under some technical assumptions, we can also make a causal explanation of the resulting policies.

Joint work with Sorawit Saengkyongam, Motonobu Kanagawa, and Sanparith Marukatat.