Technical Talks

Continuous Data Pipeline for Real-time Benchmarking & Data Set Augmentation
- Lightning Talks
Building and curating representative datasets during the model research and development is a critical component of getting a ML system with accuracy meeting the project's requirements. After the deployment of said model, monitoring accuracy and other statistical metrics in order to improve and adjust the model is a natural workflow. Models working with unstructured language data might experience data shift resulting in unpredictable and non-representative inference. With the help of open-source APIs and commercial or open-source annotation tools the building of annotations can be operationalized and the analyst workload reduced. In this talk I will cover the process of generating datasets and using them for real time precision/recall splits with the goal of detecting data shifts away from the in-sample space to prioritize future data collection and model retraining.

Senior Data Scientist
Ivan Aguilar
Teleskope
Ivan is a data scientist at Teleskope focused on building scalable models for detecting PII/PHI/Secrets and other compliance related entities within customers' clouds. Prior to joining Teleskope, Ivan was a ML Engineer at Forge.AI, a Boston based shop working on information extraction, content extraction, and other NLP related tasks. In his free time he is a fan of making stuff from raw materials, including ceramics, cooking.
Discover the data foundations powering today's AI breakthroughs. Join leading minds as we explore both cutting-edge AI and the infrastructure behind it. Reserve your spot at before tickets sell out!