How do you organize your data so that your users get the right answers at the right time? That question is a pretty good definition of data engineering — but it is also describes the purpose of every DBMS (database management system). And it’s not a coincidence that these are so similar.
This talk looks at the patterns that reoccur throughout data management — such as caching, partitioning, sorting, and derived data sets. As the speaker is the author of Apache Calcite, we first look at these patterns through the lens of Relational Algebra and DBMS architecture. But then we apply these patterns to the modern data pipeline, ETL and analytics. As a case study, we look at how Looker’s “derived tables” blur the line between ETL and caching, and leverage the power of cloud databases.
Julian Hyde is an expert in query optimization, database internals, in-memory analytics, and streaming. He is the founder of Apache Calcite, an open-source query planning framework that powers many database and streaming SQL engines, including Apache Beam, Flink and Hive. He was the original developer of the Mondrian OLAP engine, and was formerly Chief Architect at SQLstream. He is an architect at Looker.