Technical Talks

View All

Content Based Recommendations: Using Word Embeddings to Automate Related Content Generation at BuzzFeed

Carolyn Huangci Carolyn Huangci | Associate Data Scientist | BuzzFeed

BuzzFeed's large social, search, and organic traffic footprint can be attributed to both its content and content curation strategies. Recirculation on the website is powered through feeds optimized for metrics such as CTR, recency, and user preference.

Surprisingly, one avenue of recirculation not explored until recently on the website was a way to identify and display related content. It was hypothesized that surfacing related content would result in an increase in both  pageviews per viewing session and improved SEO performance.

This talk will detail how word and sentence embedding models were used to vectorize BuzzFeed's content, how the model was validated and brought to production using AWS EMR and NSQ, and the impact the new unit has had on the website.

Carolyn Huangci
Carolyn Huangci
Associate Data Scientist | BuzzFeed

Carolyn Huangci is a data scientist at BuzzFeed, a media and tech company. She works on the Network Growth team focused on growing content views across BuzzFeed destinations. Prior to BuzzFeed she worked at Uber. She received her Bachelors in Applied Mathematics from University of California Berkeley.