With Airflow now ubiquitous for DAG orchestration, organizations increasingly depend on Airflow to manage complex inter-DAG dependencies and provide up-to-date runtime visibility into DAG execution. But what effects (if any) would upstream DAGs have on downstream DAGs if dataset consumption was delayed?
In this talk, we introduce Marquez: an open source metadata service for the collection, aggregation, and visualization of a data ecosystem’s metadata. We will demonstrate how metadata management with Marquez helps maintain inter-DAG dependencies, catalog historical runs of DAGs, and minimize data quality issues.
Willy Lulciuc is the Founding Engineer of Datakin. He makes datasets discoverable and meaningful with metadata. He co-created Marquez and is now involved in the OpenLineage initiative. Previously, he worked on the Project Marquez team at WeWork. When he’s not reviewing code and creating indirections, he can be found experimenting with analog synthesizers.