ABOUT THE TALK

Apache Spark is one of the most popular open source big data systems, with APIs available for Scala, Java, Python, and R. Spark's core building block of RDDs allow you to write functional program that are automatically distributed, and the new Dataset API allows us to also intermix relational style programming. In addition to allowing multiple types of programming on the same data, the Dataset API brings a more thorough optimizer and is able to better understand our programs. This talk will introduce the Dataset API while also looking how the optimizer and formats differ and the best ways to take advantage of this. The talk will wrap up with looking at ways in which Spark Datasets' can sometimes fail us (not everything is magic), and ways to work around them.

Holden Karau

| IBM

Holden Karau
BUY TICKETS


VIEW ON MAP

Location subheader text. Can be left blank if not needed.

Company Name

Company address, lorem ipsum dolor sit amet

BROUGHT TO YOU BY:

partner-85.png
partner-canvas.png
partner-dropbox.png

FEATURED MEETINGS