Technical Talks

View All

Architecting the Data Lake: How to Ensure that Your ETL Pipelines Deliver High Quality Data

Paul Lappas Paul Lappas | CTO | Intermix

The data in your data ​lake and data warehouse is mission critical. ​It informs crucial company decisions. ​Your reputation is on the line ​when data is not accurate. As the data engineer in charge, ​it is critical that you have confidence in your data pipeline execution. ​Complex ETL demands accuracy when data is mission critical. B​ut​ data pipelines are changing often. Regressions are possible from minor changes to DAGs and tasks. This may have unintended impacts on tables​ ​and data flow​s​ which may not be discovered until much later ​when data has been already ​shipped to its end users.

Also, failure and DAG outages occur. ​When you've fixed those failures, you need confidence ​that data is ‘flowing’ again and things are back to normal. ​In this talk, we'll cover examples of tests that can be run against your tables and data models. You will learn about the different classes of tests, how to set them up, and the important metrics to monitor. You'll be the enabler of accurate business decisions, with confidence in your data quality and no more guesswork or surprises about data quality.

Paul Lappas
Paul Lappas
CTO | Intermix

Paul is the co-founder & CTO of Intermix. He's experienced in building technology, product, and scaling organizations and establishing high-performing engineering cultures. With over 10 patents, Paul's prominent work includes bringing one of the first IaaS cloud computing service providers to market, developing a data analytics platform and mobile SDKs used by 1B end-users, and currently solving problems in big data.