To provide quality recommendations for features such as Discover Weekly, Release Radar, and Daily Mix, Spotify derives signals from the activities of more than 140M users, the contents of over 2 billion playlists, as well as the acoustic profiles from over 30M songs.
We will cover how we build systems at scale that can ingest, process and distill terabytes of raw data into datasets and services that teams use to build features. We will look at the different approaches taken, from streaming real-time listening history with Google Dataflow and Bigtable, to queue-based workflows for audio understanding, to batch processing with MapReduce, to training ML models on single machines.
This talk gives the listener a chance to see how disparate datasets, with different technical approaches, come together to power personalization at Spotify.
Gandalf is the Data & Backend Engineering Manager at Spotify. Music recommendations. NYU Stern MBA