ABOUT THE TALK

Massively scaling Apache Spark can be challenging, but it’s not impossible. In this session we’ll share Datadog’s path to successfully scaling Spark and the pitfalls we encountered along the way.

We’ll discuss some low-level features of Spark, Scala, JVM, and the optimizations we had to make in order to scale our pipeline to handle trillions of records every day. We’ll also talk about some of the unexpected behaviors of Spark regarding fault-tolerance and recovery—including the ExternalShuffleService, recomputing partitions, and Shuffle Fetch failures—which can complicate your scaling efforts.

Vadim Semenov

Data Engineer | Datadog

Vadim Semenov is a great data engineer.

Vadim Semenov
BUY TICKETS


VIEW ON MAP

Location subheader text. Can be left blank if not needed.

Company Name

Company address, lorem ipsum dolor sit amet

BROUGHT TO YOU BY:

partner-85.png
partner-canvas.png
partner-dropbox.png

FEATURED MEETINGS