This talk would introduce Dagster, an open source framework for building and modeling data processing computations. Data processing systems typically span multiple runtime, storage, tooling, and organizational boundaries. But all the stages in a data processing system share a fundamental property: They are directed, acyclic graphs (DAGs) of functional computations that consume and produce data assets. Dagster defines a standard for containerizing, describing and operating these computations, and that standard is opinionated and informed by the best practices in the industry leading to more testable, more reliable, better structured data systems.
By defining a standard one can build these computations in tools that users know and love such as Jupyter Notebooks (via Papermill), Dbt, Spark and leverage that standard in order to build high quality developer- and ops-facing tools to inspect, operate, and monitor those computations. These tools range from our beautiful introspection and execution tool Dagit, to tools that schedule these computations on systems ranging from Airflow to Lambda, among others. Dagster embraces the chaotic reality of the modern data management, and is an abstraction designed for incremental adoption within an increasingly heterogenous ecosystem. We would describe both the technology and the technical and organizational insights gained by production use of Dagster.
Nick Schrock is the founder and CEO of Elementl, a company aiming to reshape the data management ecosystem, and the creator of Dagster, a new programming model for data processing. Previously, Nick was a Principal Engineer and Director of Engineering at Facebook. In that time, Nick co-created GraphQL, and led its implementation and adoption across the entire organization and product line. He also formed the Product Infrastructure group, whose engineers, in addition to GraphQL, created React, React Native, and many other broadly-used developer technologies, both inside Facebook and the technology industry at large.