In the course of transforming, publishing and visualizing data, there’s risk of “bad data” creeping into your output at every turn, hurting data credibility and distracting teams from investigating real metric shifts. How does Netflix prevent bad data from causing bad decision-making? We use a variety of techniques to automate the basics, allowing us to focus our energy on the changes in data that indicate real problems with the Netflix product.
Hear examples of 1) the checks we impose at multiple steps of the data pipeline to identify source data quality issues and business metric shifts, 2) techniques for anomaly detection on datasets with many dimensions that are highly cardinal, 3) how to set up evaluations in an automated fashion and 4) how we make it easy for humans to investigate issues.
Laura Pruitt leads content delivery analytics at Netflix. Her team of data engineers and analytics experts evolve Netflix's understanding of its 1/3 share of the internet's traffic through sophisticated data processing, custom tools and deep analytic rigor. She developed her expertise in using data to understand customer behavior and software systems while working at GEICO, and now at Netflix tackles data and analytic challenges on truly massive scale.