Machine Learning from Development to Production at Instacart

As Instacart has grown from a single data scientist building linear models, to multiple teams building and maintaining dozens of bespoke models, to a more mature organization collaborating across multiple fields, we’ve learned a few things the hard way. We’re open sourcing our solutions as Lore.

Common Problems

  1. Information overload makes it easy to miss newly available low hanging fruit when trying to keep up with all the machine learning packages, their features, nuances and bugs — much less implementing the latest from academia.
  2. Complexity grows because valuable models are the result of many iterative insights, making individual insights harder to maintain and communicate.
  3. Repeatability is non trivial when code, data and library dependencies change constantly in modern environments. Especially when someone else wrote the original, years ago.
  4. Glue code is often mundane and tedious to write. It’s a frequent source of bugs because there is much to write, more to maintain, and all of it has low mind-share.
  5. Performance bottlenecks are easy to hit when you’re working at high levels like python or SQL.

Our goal is to make machine learning approachable for Engineers and maintainable for Data Scientists. There are a lot of great libraries like numpy, pandas, scikit, tensorflow, xgboost, etc. that work together in our daily workflow. Lore is our codification of best practices that welds the valuable bits seamlessly into production models. We're open sourcing so we can learn from the community as well.

Montana works as a Machine Learning Engineer with a diverse team at Instacart to drive growth, reduce costs, and ensure they can reliably deliver the core product: your groceries, in an hour. He has also been the Chief Scientist at RescueTime and founder of two other startups that used machine learning to differentiate their apps.

