Stream Processing has evolved quickly in a short time: a few years ago, stream processing was mostly simple real-time aggregations with limited throughput and consistency. Today, many stream processing applications have complex logic, strict correctness guarantees, high performance, low latency, and maintain large state without databases. Since then, stream processing has become much more sophisticated because the stream processors – the systems that run the application code, coordinate the distributed execution, route the data streams, and ensure correctness in the face of failures and crashes – have become much more technologically advanced. In this talk, we walk through some of the techniques and innovations behind Apache Flink, one of the most powerful open source stream processors.
In particular, we plan to discuss the evolution of stateful stream processing, Flink’s approach of fault-tolerance with distributed asynchronous and incremental snapshots, and how that approach looks today after multiple years of collaborative work with users running large scale stream processing deployments. Furthermore, we plan to discuss how stream processing is outgrowing its original space of real-time data processing and is becoming a technology that offers new approaches to data processing (including batch processing), real-time applications, and even distributed transactions.
Tzu-Li (Gordon) Tai is a Committer and PMC member of the Apache Flink project, and Software Engineer at Ververica. His contributions in Flink spans various components, including some of the most popular Flink streaming connectors (e.g. for Apache Kafka, AWS Kinesis, Elasticsearch, etc.), Flink's type serialization system, as well as several topics surrounding evolvability of stateful streaming applications. He is a frequent speaker at conferences such as Flink Forward, Strata Data, as well as many meetups related to Apache Flink or data engineering in general.