Dagster a new open source framework for building and modelling data processing computations. It is also the first project of Nicholas Schrock and his company, Elementl, and its unveiling was one of the highlights of Data Council San Francisco 2019.

Nicholas summarized the vision behind it in his talk abstract: "Data processing systems typically span multiple runtime, storage, tooling, and organizational boundaries. But all the stages in a data processing system share a fundamental property: They are directed, acyclic graphs (DAGs) of functional computations that consume and produce data assets. Dagster defines a standard for containerizing, describing and operating these computations, and that standard is opinionated and informed by the best practices in the industry leading to more testable, more reliable, better structured data systems."



Alluxio is an open-source virtual distributed file system (VDFS). Initially as research project "Tachyon", Alluxio was created at the University of California, Berkeley's AMPLab as Haoyuan Li's Ph.D. Thesis, advised by Professor Scott Shenker & Professor Ion Stoica. Alluxio sits between computation and storage in the big-data analytics stack. It provides a data abstraction layer for computation frameworks, enabling applications to connect to numerous storage systems through a common interface. The software is published under the Apache License.

Alluxio can be deployed on-premise, in the cloud (e.g. Microsoft Azure, AWS, Google Compute Engine), or a hybrid cloud environment. It can run on bare-metal or in a containerized environments such as Kubernetes, Docker, Apache Mesos.

If you're interested in learning more, here's an awesome talk by Bin Fan founding engineer of Alluxio, Inc. and the PMC member of Alluxio open source project.



TimescaleDB is an open-source database designed to make SQL scalable for time-series data. It is engineered up from PostgreSQL, providing automatic partitioning across time and space (partitioning key), as well as full SQL support.

TimescaleDB is packaged as a PostgreSQL extension. All code is licensed under the Apache-2 open-source license, with the exception of source code under the tsl subdirectory, which is licensed under the Timescale License (TSL). For clarity, all code files reference licensing in their header.

Featured Open Source Speakers

Jesse Anderson

Data & Creative Engineer & Managing Director

Big Data Institute

Liana Napalkova

Senior Data Scientist and Big Data Developer

Eurecat Technology Center

Juan Carlos Castro

Digital Product Manager

Eurecat Technology Center

Wes McKinney


Ursa Labs, Member Apache Software Foundation

Maximilian Michels

Independent Contractor

Google, PMC member of Apache Flink and Apache Beam

Andreas Mueller

Associate Research Scientist

Data Science Institute, Columbia University

Allison King

Software Engineer


Ryan Williams

Software Engineer

Icahn School of Medicine at Mount Sinai

Eric Hanson

Principal Product Manager


Jacques Nadeau

Co-founder & CTO


Evan Sparks

Co-Founder & CEO


Shoumik Palkar

Ph.D Student

Stanford University Infolab

Asif Khalak

Director of Data Science

Collective Health

Sergio Martinez-Ortuno

Staff Data Scientist

Collective Health

Joseph Gonzalez

Assistant Professor & Co-Director

UC Berkeley

Omoju Miller

Senior Data Scientist, Machine Learning


Sahaana Suri

Graduate Researcher

Stanford University, Future Data Systems

Mike Freedman

Co-founder & CTO


Sam Stokes



Sanjay Krishnan

Graduate Researcher

UC Berkeley, AMPLab

Christian Romming

Founder & CEO


James Faghmous

Asst. Prof/ CTO of Arnhold Institute

Icahn School of Medicine at Mt. Sinai

Robert Winslow

Co-founder & CTO


Thomas La Piana

Data Engineer


Nick Rockwell

Chief Technology Officer

The New York Times

Featured Open Source Partners