Using Causal Forests for Subpopulation Identification in Randomized Clinical Trials

James Faghmous | Icahn School of Medicine at Mt. Sinai


Significant scientific problems require a combination of prediction and inference, however, the majority of machine learning techniques are not well-suited for estimating causal effects. Recently, several novel approaches have attempted to combine predictive modeling with causal inference to identify heterogenous treatment effects in observational data — subgroups that may have a significantly different outcome than the population average. 

This talk will review two recent papers that employed the causal forest approach to estimate subgroup treatment effects in randomized clinical trials. The Systolic Blood Pressure Intervention Trial (SPRINT), compared standard versus intensive systolic blood pressure targets, while the Look AHEAD examined the effects of an intensive diabetes lifestyle intervention on cardiovascular mortality.

Reanalyzing these trials using causal forests, we found that in both cases, average treatment effects may have masked important sources of heterogeneity in trial outcomes. These findings bring to questions for future work: First, are there data-driven methods that can objectively identify subgroups for better precision? Second, can we use these large RCTs to build better predictive models on observational data? 

Download Slides

james faghmous

Asst. Prof/ CTO of Arnhold InstituteIcahn School of Medicine at Mt. Sinai

James H. Faghmous is an assistant professor and founding Chief Technology Officer of the Arnhold Institute for Global Health at the Icahn School of Medicine — Mount Sinai in NYC. At the Arnhold Institute, James leads both data science research and product development where he launch the ATLAS data platform to close health disparities around the world. In 2016, ATLAS was selected as winner of the USAID Zika Grand Challenge. The following year, it won the Bill and Melinda Gates Foundation Explorations Grand Challenge. James’ research focuses on developing novel methods to combine heterogenous data to understand how social, economic, and environmental factors create health inequities.