Observability at the long tail: why sampling production data doesn’t work for rare events

Bernease Herman, Data Scientist | WhyLabs

Machine learning has a long-tail problem: even if our models work well in the average case, they often fail and lack proper monitoring for rare events. This is a concern because there are many business use cases where poor performance at the tail of the distribution can be detrimental to customer satisfaction or other metrics. At WhyLogs, we have done a number of experiments that illustrate the issues with the long-tail problem and how proper monitoring is needed to prevent catastrophic failures we've seen in the past. We built whylogs, an open-source library using data profiling instead of data sampling. This allows for better accuracy and precision while still scaling to petabyte scale data in both batch and streaming modes.

In this talk, we walk through example cases and experiments to demonstrate the failure cases of evaluation and monitoring on sampled data. We also discuss how we’ve addressed these issues using data sketching techniques. Finally, we offer an open source library, whylogs, that enables statistical data logging and profiling in only a few lines of code.



Bernease  Herman

Data Scientist | WhyLabs

Bernease Herman is a data scientist at WhyLabs, the AI Observability company, and a research scientist at the University of Washington eScience Institute. At WhyLabs, she is building model and data monitoring solutions using approximate statistics techniques. Earlier in her career, Bernease built ML-driven solutions for inventory planning at Amazon and conducted quantitative research at Morgan Stanley. Her academic research focuses on machine learning evaluation and interpretability with specialty on synthetic data and societal implications. Bernease serves as faculty for the University of Washington Master’s Program in Data Science program and as chair of the Rigorous Evaluation for AI Systems (REAIS) workshop series. She has published work in top machine learning conferences and workshops such as NeurIPS, ICLR, and FAccT. She is a PhD student at the University of Washington and holds a Bachelor’s degree in mathematics and statistics from the University of Michigan.

 Bernease  Herman