A streaming database is a potentially intimidating thing to build. At Materialize, we've broken it down into what we feel are manageable parts, through three foundational choices that fit together well. The data are modeled as continually changing collections of records, as from *change data capture*, rather than ordered streams of events. The queries are standard SQL queries, whose answers are always "as if the query were continually re-executed on the data as it is now". The system architecture first assigns *virtual times* to both changed data and posed queries, and then computes the correct answers as fast as possible. In this talk, we will work through these three choices, the trade-offs, and how their simplifications lead us to a much more *manageable* streaming database.
Frank McSherry is Chief Scientist at Materialize, where he (and others) convert SQL into scale-out, streaming, and interactive dataflows. Before this, he developed the timely and differential dataflow Rust libraries (with colleagues at ETHZ), and led the Naiad research project and co-invented differential privacy while at MSR Silicon Valley. He has a PhD in computer science from the University of Washington.