Transforming raw data into features to power machine learning models is one of the biggest challenges in production ML. At Tecton, we’re building a Feature Platform to help Data Scientists and ML Engineers tackle this problem. Recently, we’ve launched a new version of our product based on DuckDB and Arrow to complement the existing Spark- and Snowflake-based versions. Our goals were to provide a fast, delightful local development experience, decrease time-to-deployment for our customers, and provide flexibility in integrations with external systems.
In this talk, I will cover our decision to build this new product, our experiences with the implementation, and the outcomes to date. I will dive into technical details of the implementation, including:
Strategies for scaling up to larger dataset sizes without a distributed query engine
Implementing and distributing DuckDB extensions
Building interoperability with DeltaLake, outside of the Spark ecosystem
Performance comparisons with Spark
Leveraging cloud resources to scale up beyond laptop-size datasets without losing the laptop developer experience
Michael Eastham is the Chief Architect at Tecton AI. Previously, he was a software engineer at Google, working on Search.