Data Council Blog

Data Council Blog

Open Source Highlight: OpenLineage

OpenLineage is an API for collecting data lineage and metadata at runtime. While initiated by Datakin, the company behind Marquez, it was developed with the aim to create an open standard. As Datakin’s CTO Julien Le Dem explained in a blog post announcing the launch, OpenLineage is meant to answer the industry-wide need for data lineage, while making sure efforts in that direction aren’t fragmented or duplicated.

Open Source Highlight: Orchest

Orchest is an open-source tool for creating data science pipelines. Its core value proposition is to make it easy to combine notebooks and scripts with a visual pipeline editor (“build”); to make your notebooks executable (“run”); and to facilitate experiments (“discover”).

Open Source Highlight: Klio

Klio is a framework for easy large-scale processing and ML research on binary files, such as audio files -- its original use case. As a matter of fact, it was developed for audio intelligence at Spotify, which open-sourced it earlier this year at the 2020 International Society for Music Information Retrieval Conference.
| |

Open Source Highlight: n8n

Created by Berlin-based developer Jan Oberhauser in 2019, n8n presents itself as “a free and open workflow automation tool”. Think of it as a locally hosted Zapier on steroids.

Hot Data Tools pt. 2, End-to-End Data Scientists, and More: Top 10 Links From Across the Web

Here's our September 2020 roundup of good reads and podcast episodes that might be relevant to you as a data professional:

1. What Data Tools Don't Do (Data Council)

Our founder Pete Soderling co-authored a follow-on piece to his previous post with Great Expectations' core contributor Abe Gong and Partner at Amplify Partners Sarah Catanzaro, for which they had interviewed the makers of some of the hottest data tools. The focus is still the same: rather than what their data tools can do, we hear about what they don't do, as a way to better understand how they fit together. From ApertureData to Xplenty, this new installment covers 21 new tools, and you can read it here.

Should Datacoral Power Your New Data Infrastructure?

Today's companies aim to be data-driven, but data infrastructure is time intensive and costly to build, maintain, and secure.  A coral is the exoskeleton of a small marine animal that attaches and grows on almost anything. Once it starts growing, it can create large reefs, which support a diverse ecosystem of plants and animals. So what happens if you apply that philosophy to the world of data?