Significant scientific problems require a combination of prediction and inference, however, the majority of machine learning techniques are not well-suited for estimating causal effects. Recently, several novel approaches have attempted to combine predictive modeling with causal inference to identify heterogenous treatment effects in observational data — subgroups that may have a significantly different outcome than the population average.
This talk will review two recent papers that employed the causal forest approach to estimate subgroup treatment effects in randomized clinical trials. The Systolic Blood Pressure Intervention Trial (SPRINT), compared standard versus intensive systolic blood pressure targets, while the Look AHEAD examined the effects of an intensive diabetes lifestyle intervention on cardiovascular mortality.
Reanalyzing these trials using causal forests, we found that in both cases, average treatment effects may have masked important sources of heterogeneity in trial outcomes. These findings bring to questions for future work: First, are there data-driven methods that can objectively identify subgroups for better precision? Second, can we use these large RCTs to build better predictive models on observational data?
I build and lead teams that create innovative data products. My expertise is in developing and executing data science strategy, especially in non-technical settings. I focus on ‘interpretable machine learning’ products that combine data science, domain expertise, and UX for mass product adoption. As a researcher, I have published in top tier artificial intelligence (AAAI, IJCAI) and data science (SDM, ICDM) conferences and mentored over 25 students including Ph.D. students and postdocs