Content Based Recommendations: Using Word Embeddings to Automate Related Content Generation at BuzzFeed

Carolyn Huangci | BuzzFeed

ABOUT THE TALK

BuzzFeed's large social, search, and organic traffic footprint can be attributed to both its content and content curation strategies. Recirculation on the website is powered through feeds optimized for metrics such as CTR, recency, and user preference.

Surprisingly, one avenue of recirculation not explored until recently on the website was a way to identify and display related content. It was hypothesized that surfacing related content would result in an increase in both  pageviews per viewing session and improved SEO performance.

This talk will detail how word and sentence embedding models were used to vectorize BuzzFeed's content, how the model was validated and brought to production using AWS EMR and NSQ, and the impact the new unit has had on the website.

Download Slides

Carolyn Huangci

Associate Data Scientist | BuzzFeed

Carolyn Huangci is a data scientist at BuzzFeed, a media and tech company. She works on the Network Growth team focused on growing content views across BuzzFeed destinations. Prior to BuzzFeed she worked at Uber. She received her Bachelors in Applied Mathematics from University of California Berkeley.

Carolyn Huangci
Buy Tickets