Technical Talks

Spark, Dask, DuckDB, Polars: Benchmarks at Scale
Missing value detected...
Video will be populated after the conference
- Data Eng & Infrastructure
Large scale data analysis has blown up recently, and standard ad-hoc analysis tools like Apache Spark and Pandas are joined by new friends. In this talk we benchmark four popular large-data dataframe/database tools for large scale analysis Spark, Dask, DuckDB, and Polars on the popular TPC-H benchmark suite. We do this both on single machines and on the cloud at scales ranging from 10GB to 10TB.
No tool wins consistently, but by looking at detailed performance metrics we can better understand the technical nuance and challenges in this space, and guide projects towards better performance.

CEO
Matthew Rocklin
Coiled
Matthew is an open source software developer in the numeric Python ecosystem. He maintains several PyData libraries, but today focuses mostly on Dask a library for scalable computing. Matthew worked for Anaconda Inc for several years, then built out the Dask team at NVIDIA for RAPIDS, and most recently founded Coiled to improve Python's scalability with Dask for large organizations.
Matthew holds a bachelors degree from UC Berkeley in physics and mathematics, and a PhD in computer science from the University of Chicago.
Discover the data foundations powering today's AI breakthroughs. Join leading minds as we explore both cutting-edge AI and the infrastructure behind it. Reserve your spot at before tickets sell out!