Data Council Blog

Data Council Blog

Instrumental - Featured Startup SF '18

In this blog series leading up to our SF18 conference, we invite our featured startups to tell us more about their data engineering challenges. Today, we speak with Instrumental, an early-stage company building data systems to monitor and improve manufacturing line performance.

 
Q:  What surprised you most as an engineer about the work you did that you'll be telling us about in your talk? 
 

Simon Kozlov: The difference between “human performance” in the industry vs. the academic setting. We expected to see differences between humans and machine learning models where humans are expected to perform at their best - careful, thoughtful and dedicating their full attention to the task.

However, in many practical situations, humans get tired of repetitive work, can’t concentrate for long periods of time and have a lot of things on their mind other than the specific task at hand - in general, they act "human!"

These factors change the tradeoffs significantly - suddenly even a modestly accurate model can help catch significant issues and assist humans in their work! And if you figure out how to get feedback from them in real-time, this can make a model even more accurate. Rinse, repeat.


Q: What do you think a listener will get out of this this talk vs. other talks on distributed data processing and data versioning that they've previously heard?

Simon Kozlov: Most of the time, machine learning performance is limited by the quality of the data that is input to the model. This talk is about creating high-quality datasets out of raw production data which require experts to provide labels and judgment.

However, there are two big problems that we've found with experts: first, they’re very hard to find. Second, they never agree with each other.

In our talk, we’ll cover several ideas and practices that will help listeners deal with both issues.

 

New Call-to-action

 

About the Startups Track

The data-oriented Startups Track at DataEngConf features dozen of startups forging ahead with innovative approaches to data and new data technologies. We find the most interesting startups at the intersection of ML, AI, data infrastructure and new applications of data science and highlight them in technical talks by their CTOs and lead engineers who are building these platforms. 

Data Engineering, Data Warehouse, Data Strategy

Robert Winslow

Written by Robert Winslow

Robert is a seasoned software consultant with a decade of experience shipping great products. He thrives in early-stage startup environments, and works primarily in Go, Python, and Rust. He has led backend development at companies like RankScience and Spot.com; created a rigorous, open-source time-series benchmarking suite for InfluxData; and rapidly prototyped software in a skunkworks-type product lab. He’s taught graduate statistics at GalvanizeU and mentored at the Stanford d.school. He helps maintain Google’s FlatBuffers project, one of the world’s fastest serialization libraries. A colleague once described him as “the developer equivalent of ‘The Wolf’ from Pulp Fiction."