Data Council Blog

Data Council Blog

Community, Metadata Management, and More: Top 10 Links From Across the Web

Here's our March 2021 roundup of links from across the web that we selected for you:

1. How to Build a Community (Fishtown Analytics)

Claire Carroll's first personal blog post on community-building is a must-read. As Fishtown Analytics' community manager for the last 2.5 years, she's arguably behind the success of the dbt community and its best-in-class practices, so we expected good advice… but she really hit the ball out of the park with this one! The key takeaway is that you should start with wondering 'why' you want to build a community. Make sure to read the full post to understand why it received so much praise.
 

dbt at Shopify, Active Learning, and More: Top 10 Links From Across the Web

Here's our February 2021 roundup of links from across the web that we picked for you:

1. dbt at Shopify (Data Engineering Podcast)

The Data Engineering Podcast recently featured a very interesting discussion about dbt at Shopify. Engineering manager Zeeshan Qureshi and senior data engineer Michelle Ark explained how dbt answered Shopify’s need for an SQL-based solution that its data scientists could use autonomously. They also mentioned some of the best practices they followed for staging, and cost considerations related to BigQuery. Last but not least, they touched on some extensions they are considering, such as implementing Great Expectations for data quality control.

Storing Cold Metadata, Snowflake Data Cloud, and More: Top 10 Links From Across the Web

Here's our January 2021 roundup of links from across the web that could be relevant to you:

1. Storing Cold Metadata with Alki (Dropbox)

Dropbox shared insights into Alki, the petabyte-scale metadata store it designed for infrequently accessed metadata (“cold data”). The post details how one-size-fits-all database Edgestore was reaching capacity limits, and why audit logs were a good candidate to be moved elsewhere than on costly SSDs. After considering off-the-shelf options, the team settled on building its own solution on top of AWS services: Alki; with DynamoDB as the hot store, and S3 as the cold store. Like HBase or Cassandra, Alki is based on log-structured merge-trees (LSM trees), but is better suited to handle hot-then-cold audit logs, as well as future use cases at Dropbox.

The Modern Data Stack, Metadata Architectures, and More: Top 10 Links From Across the Web

Here's our December 2020 roundup of links from across the web that could be relevant to you:

1. The Modern Data Stack (Fishtown Analytics)

This long-form post on the dbt blog is a must-read. Titled “The Modern Data Stack: Past, Present, and Future,” it answers the question that Tristan Handy has been asking himself for the past two years: “What happened to the massive innovation we saw from 2012-2016?” His carefully thought-out analysis covers the natural cycles of technological shifts, defines the phase we are in as a ‘deployment’ one, and points out high-impact opportunity areas for the next few years - which you might find particularly useful if you are considering launching a new product.

NLP Heroes, Pinot, Data Testing, and More: Top 10 Links From Across the Web

Here's our November 2020 roundup of good reads and podcast episodes that might be relevant for your career in data:

1. Heroes of NLP: Quoc Le (Deeplearning.ai)

NLP researcher Quoc Le was recently Andrew Ng’s guest as part of the ‘Heroes of NLP’ video series. Their discussion covered Le’s impressive journey, from growing up in Vietnam and developing his first basic chatbot in high school to becoming Google Brain’s first intern, and everything that followed. This includes the ‘Google Cat’ experiment, the Meena chatbot project, and work on Seq2Seq models. Check out the conversation here, and consider subscribing to the series to hear from other guests such as Chris Manning, Kathleen McKeown, and Oren Etzioni.

Amberdata - Featured Startup SF '18

In this blog series leading up to our SF18 conference, we invite our featured startups to tell us more about their data engineering challenges. Today, we speak with Amberdata, an early-stage company building analysis tools for blockchain infrastructure, applications, and transactions.

Intermix - Featured Startup SF '18

In this blog series leading up to our SF18 conference, we invite our featured startups to tell us more about their data engineering challenges. Today, we speak with Intermix, an early-stage company building performance analytics tools for Amazon Redshift.

NuCypher - Featured Startup SF '18

In this blog series leading up to our SF18 conference, we invite our featured startups to tell us more about their data engineering challenges. Today, we speak with NuCypher, an early-stage company building a decentralized encryption service.

Wootric - Featured Startup SF '18

In this blog series leading up to our SF18 conference, we invite our featured startups to tell us more about their data engineering challenges. Today, we speak with Wootric, an early-stage company building customer experience tools powered by machine learning.

Halo Tech - Featured Startup SF '18

In this blog series leading up to our SF18 conference, we invite our featured startups to tell us more about their data engineering challenges. Today, we speak with Halo Tech, an early-stage startup that analyzes complex data to accelerate medical advancements.
  • 1
  • 2