Building end-to-end data science solutions is a complex task that goes beyond simply winning prizes on Kaggle. Applying advanced machine learning techniques to real-word scenarios requires rigorous cleaning, preparing and feature-engineering of the data before we even get to discussions on algorithms. We then need to test various ML models, explore diverse configurations, and finally, productize our result. Doing this for massive quantities of data creates challenges at scale that become more complex when we also factor in the implications of bias and fairness in data representation.
In this talk, we’ll architect a production-grade ML Pipeline using feature engineering, model training and management tools from Apache Spark. We’ll see demos using Microsoft Azure services such as Azure DataBricks, Event Hub and Cognitive Services, that showcase these pipelines in action. And finally, we’ll explore relevant research, tools and best practices that can be used to craft responsible AI solutions with a focus on issues like bias and fairness in data representation.
Adi Polak is a Senior Software Engineer and Developer Advocate in the Azure Engineering organization at Microsoft. Her work focuses on microservices architecture, distributed systems, real-time processing, big data analysis, machine learning at scale and functional programming. Her advocacy work focuses on bringing her vast industry research & engineering experience to bear in helping teams design, architect and build cost-effective software and infrastructure solutions that emphasize scalability, team expertise and business market fit.