A Multi-Armed Bandit Framework for Recommendations at Netflix

Jaya Kawale & Elliot Chow | Netflix


In this talk, we will present a general multi-armed bandit framework for recommending titles to our 117M+ members on the Netflix homepage. A key aspect of our framework is closed loop attribution to link how our members respond to a recommendation. Our framework performs frequent updates of policies using user feedback collected from a past time interval window.

We will take deeper look at the system architecture. We will illustrate the use of that framework by focusing on two example policies – a greedy exploit policy which maximize the probability a user will play a title and an incrementality-based policy. The latter is a novel online learning approach that takes the causal effect of a recommendation into account. An incrementality-based policy recommends titles that brings about the maximum increase in a specific quantity of interest, such as engagement. This helps discount the effect of recommendations when a user would have played anyway. We describe offline experiments and online A/B test results for both of these example policies.

Download Slides

Jaya Kawale Netflix

Jaya kawale

Sr. Research Scientist | Netflix

Jaya Kawale is a Senior Research Scientist at Netflix working on problems related to targeting and recommendations. She received her PhD in Computer Science from the University of Minnesota and has published research papers at several top-tier conferences. Her main areas of interest are large scale machine learning and data mining.
Elliot Chow Netflix

elliot chow

Software Engineer | Netflix

Elliot is a software engineer at Netflix on the Personalization Infrastructure team. Currently, he builds big data systems for personalizing recommendations for Netflix subscribers, using a variety of technologies including Scala, Spark/Spark Streaming, Kafka, and Cassandra.  He graduated from UC Berkeley (B.S.) and Stanford (M.S.) and has previously worked at eBay and Apple.