Technical Talks

View All

Building a Unified Feature Platform with DuckDB and Arrow

Michael Eastham Michael Eastham | Chief Architect | Tecton

Transforming raw data into features to power machine learning models is one of the biggest challenges in production ML. At Tecton, we’re building a Feature Platform to help Data Scientists and ML Engineers tackle this problem. Recently, we’ve launched a new version of our product based on DuckDB and Arrow to complement the existing Spark- and Snowflake-based versions. Our goals were to provide a fast, delightful local development experience, decrease time-to-deployment for our customers, and provide flexibility in integrations with external systems.

In this talk, I will cover our decision to build this new product, our experiences with the implementation, and the outcomes to date. I will dive into technical details of the implementation, including:

  • Strategies for scaling up to larger dataset sizes without a distributed query engine

  • Implementing and distributing DuckDB extensions 

  • Building interoperability with DeltaLake, outside of the Spark ecosystem 

  • Performance comparisons with Spark 

  • Leveraging cloud resources to scale up beyond laptop-size datasets without losing the laptop developer experience

Michael Eastham
Michael Eastham
Chief Architect | Tecton

Michael Eastham is the Chief Architect at Tecton AI. Previously, he was a software engineer at Google, working on Search.