Yelp’s local ad platform is composed of multiple Spark Streaming pipelines that consolidate events from hundreds of thousands of active ad campaigns and process terabytes of data a day. These help keep ad delivery within budget constraints and provide realtime campaign metrics to account managers. These pipelines are revenue critical and we needed them to have strong guarantees in the face of failures. As a framework Spark Streaming provides fault tolerant exactly-once semantics within the processing stages of a pipeline. However, for long running stateful pipelines that interact with other stateful systems, achieving end-to-end exactly-once guarantees required a more holistic approach.
We were able to achieve exactly-once guarantees within these pipelines by arriving at a strategy that makes stateful operations idempotent as long as the source data is at least partially ordered (we use Kafka) and there is a supporting key-value store that allows conditional writes (we use Cassandra). In this talk we will be focussing on the core idea behind our end-to-end exactly-once aggregation. We will discuss this in the context of a pipeline that serves multiple local ads use-cases and cover its business requirement, earlier approaches considered, and our eventual solution. We will conclude the talk with a few thoughts on how we are extending this idea further.
Amit is currently a Software Engineer at Yelp. He works on architecting and building pipelines for stream and bulk processing of data. Prior to Yelp he was at Amazon working within the big data ecosystem. Over the years he has worked in various areas including medical imaging, AI, robotics and embedded systems. In his free time he likes to tinker with code and electronics, and aspires to be an interdisciplinarian. He lives with his wife and two kids in California.