Building a Unified Feature Platform with DuckDB and Arrow

Technical Talks

Transforming raw data into features to power machine learning models is one of the biggest challenges in production ML. At Tecton, we’re building a Feature Platform to help Data Scientists and ML Engineers tackle this problem. Recently, we’ve launched a new version of our product based on DuckDB and Arrow to complement the existing Spark- and Snowflake-based versions. Our goals were to provide a fast, delightful local development experience, decrease time-to-deployment for our customers, and provide flexibility in integrations with external systems.

In this talk, I will cover our decision to build this new product, our experiences with the implementation, and the outcomes to date. I will dive into technical details of the implementation, including:

Strategies for scaling up to larger dataset sizes without a distributed query engine
Implementing and distributing DuckDB extensions
Building interoperability with DeltaLake, outside of the Spark ecosystem
Performance comparisons with Spark
Leveraging cloud resources to scale up beyond laptop-size datasets without losing the laptop developer experience

💾 Download Slides

Michael Eastham

Chief Architect | Tecton

Michael Eastham is the Chief Architect at Tecton AI. Previously, he was a software engineer at Google, working on Search.

Technical Talks

Building a Unified Feature Platform with DuckDB and Arrow

FEATURED MEETINGS

Follow / Join Us

Contact Us

Menu