Technical Talks

View All

Easy, Scalable, Fault-tolerant Stream Processing with Structured Streaming in Apache Spark

Burak Yavuz Burak Yavuz | Software Engineer | Databricks

Structured Streaming, first introduced in Apache Spark 2.0 and GA with Spark 2.2, is a new stream processing engine built on Spark SQL, which revolutionized how developers could write stream processing applications. Structured Streaming enables users to express their computations the same way they would express a batch query on static data. Developers can express queries using powerful high-level APIs including DataFrames, Datasets and SQL. Then, the Spark SQL engine is capable of converting these batch-like transformations into an incremental execution plan that can process streaming data, while automatically handling late, out-of-order data and ensuring end-to-end exactly-once fault-tolerance guarantees. Finally we will showcase how you can harness all this power by connecting to systems that you know and love such as Apache Kafka and Apache Cassandra.

Burak Yavuz
Burak Yavuz
Software Engineer | Databricks

• Excels at problem solving and experimental methods to derive simplified CLEAR solutions.• Disciplined and hard working with great social, communications, analytical, and programming skills. Always a go-to person among peers.