How Spotify Distills Terabytes of Raw Data into Meaningful Music Recommendations

Gandalf Hernandez | Spotify


To provide quality recommendations for features such as Discover Weekly, Release Radar, and Daily Mix, Spotify derives signals from the activities of more than 140M users, the contents of over 2 billion playlists, as well as the acoustic profiles from over 30M songs.
We will cover how we build systems at scale that can ingest, process and distill terabytes of raw data into datasets and services that teams use to build features. We will look at the different approaches taken, from streaming real-time listening history with Google Dataflow and Bigtable, to queue-based workflows for audio understanding, to batch processing with MapReduce, to training ML models on single machines.
This talk gives the listener a chance to see how disparate datasets, with different technical approaches, come together to power personalization at Spotify.

Download Slides

Gandalf Hernandez

Data/Backend Engineering Manager | Spotify

Gandalf is an Engineering Manager at Spotify. He works with teams that build recommendation engines, datasets, and services that power features throughout Spotify. Gandalf holds an MS in Computer Science from the Royal Institute of Technology, an MBA from New York University and an MA in Quantitative Methods in the Social Sciences from Columbia University.