At WeWork, it's critical that we understand the complete context for all datasets. We also want to be able to explore dependencies between jobs and the datasets they produce and consume. To do this, WeWork needs metadata. In this talk I will focus on Marquez, a core service for the collection, aggregation and visualization of a data ecosystems metadata. Marquez maintains the provenance of how datasets are consumed and produced while providing global visibility into job runtime.
Willy Lulciuc is the Founding Engineer of Datakin. He makes datasets discoverable and meaningful with metadata. He co-created Marquez and is now involved in the OpenLineage initiative. Previously, he worked on the Project Marquez team at WeWork. When he’s not reviewing code and creating indirections, he can be found experimenting with analog synthesizers.