Technical Talks

View All

Data Lineage with Apache Airflow using OpenLineage

Julien Le Dem Julien Le Dem | Open Lineage Project Lead | LFAI & Data
Willy Lulciuc Willy Lulciuc | Founding Engineer | Datakin

As workflows increase in complexity, companies have come to depend on Airflow to manage inter-DAG dependencies. Airflow has quickly become an important component of the Modern Data Stack powering analytical reports, business metrics, and dashboards.

But what effects (if any) would upstream DAGs have on downstream DAGs if dataset consumption were delayed? What alerting rules should be in place to notify downstream DAGs of possible upstream processing issues or failures? How can we use data lineage to achieve the data observability we need to answer these questions?

In this talk, OpenLineage will be introduced, an open standard for collecting lineage metadata for jobs under execution, and how it works with Airflow. The presentation will walk through a practical example using Marquez, the reference implementation of OpenLineage. It will be explained how OpenLineage can help data teams maintain inter-DAG dependencies within their Airflow instance, capture metadata on historical DAG runs, and minimize data quality issues.

Julien Le Dem
Julien Le Dem
Open Lineage Project Lead | LFAI & Data

Julien Le Dem is the OpenLineage project lead at the LFAI&Data. He co-created the Parquet file format and is involved in several open source projects including OpenLineage, Marquez, Arrow, Iceberg and a few others. He was the Chief Architect of Astronomer and Co-Founded Datakin. Previously he held technical leadership positions at Wework, Dremio on the founding team, Twitter, where he also obtained a two-character Twitter handle (@J_); and Yahoo!, where he received his Hadoop initiation. His French accent makes his talks particularly attractive.

Willy Lulciuc
Willy Lulciuc
Founding Engineer | Datakin

Willy Lulciuc is the Founding Engineer of Datakin. He makes datasets discoverable and meaningful with metadata. He co-created Marquez and is now involved in the OpenLineage initiative. Previously, he worked on the Project Marquez team at WeWork. When he’s not reviewing code and creating indirections, he can be found experimenting with analog synthesizers.

FEATURED MEETINGS