At WeWork, it's critical that we understand the complete context for all datasets. We also want to be able to explore dependencies between jobs and the datasets they produce and consume. To do this, WeWork needs metadata. In this talk I will focus on Marquez, a core service for the collection, aggregation and visualization of a data ecosystems metadata. Marquez maintains the provenance of how datasets are consumed and produced while providing global visibility into job runtime.
Willy Lulciuc is a Data Engineer at WeWork and works on the Project Marquez team in San Francisco, making datasets discoverable and meaningful. Previously, he worked on the real-time streaming data platform powering BounceX, and before that, designed and scaled sensor data streams at Canary. When he's not reviewing code and creating indirections, he can be found experimenting with analog synthesizers.