Technical Talks

View All

Data Processing with Apache Beam: Towards Portability and Beyond

Maximilian Michels Maximilian Michels | Independent Contractor | Google, PMC member of Apache Flink and Apache Beam

Apache Beam is a unified batch and streaming programming model for distributed data processing. Unlike other systems, Beam supports a range of different execution engines, e.g. Apache Flink, Apache Spark, or Google Cloud Dataflow.

But it doesn't stop there. The Beam API is not only available in Java, but you can also write your data processing jobs in Python or Go. This gives you much greater flexibility compared to other data processing APIs. You can finally leverage the features and libraries of your favorite programming language.

In this talk, I would like to give an introduction to the Beam programming model and explain how Beam achieves portability for different languages and execution engines.

Maximilian Michels
Maximilian Michels
Independent Contractor | Google, PMC member of Apache Flink and Apache Beam

Max is a software engineer and PMC member of Apache Flink and Apache Beam. During his studies at Free University of Berlin and Istanbul University, he worked at Zuse Institute Berlin on Scalaris, a distributed transactional database. Inspired by the principles of distributed systems and open-source, he helped to develop Apache Flink at dataArtisans and, in the course of, joined the Apache Beam community. After maintaining the SQL layer of the distributed database CrateDB, he is now working on the cross-language portability aspects of Apache Beam.