When processing unbounded data sets (aka streaming), it is typical to enrich them with secondary data sets at some point. For example, given user activity it's common to join to user profiles. This could be for building a richer stream of events to be acted on, or for building denormalized state stores.
Using Apache Beam, this talk will present a few strategies for enriching unbounded sources with data from bounded and unbounded secondary sources. It will also discuss some of the tradeoffs between simplicity, consistency and complexity that we make.
Tim is a data engineer and consultant currently working as a Strategic Cloud Engineer at Google Singapore. He’s been working in data analytics since 2008 and helps customers build data analytics foundations, pipelines and applications on Google Cloud. He’s a fan of Dataflow and Apache Beam and also keen on Kubeflow.
Data Council, PO Box 2087, Wilson, WY 83014, USA - Phone: +1 (415) 800-4938 - EIN: 46-3540315 - Email: community (at) datacouncil.ai