Data Council - Data Science, Machine Learning, AI, and Engineering Blog

Data Council Blog
| |

Data Engineer Salaries Around The World (2019)

Your potential salary as a data engineer heavily depends on where you are based; but cost of living also varies around the world. Wondering where you can actually earn more? Let's take a closer look at the United States, Europe and Asia to compare and benchmark data engineering salaries.

How Histograms Can Help Improve Your Ops Monitoring

 

 

Life comes at you fast. Data even more so ...

When the engineering team at Circonus began to feel the pain of systems at scale, there were some common observability tools that provided them with a firehose of operational time series telemetry. However, managing all that data, yet alone making sense of it, was extremely difficult. And the existing tools they tried for managing time series metrics either didn't give mathematical insight, or fell over at modest workloads. They needed a better solution. So they decided to look into other statistical tooling options that had proven themselves for decades in other industries.

Amberdata - Featured Startup SF '18

In this blog series leading up to our SF18 conference, we invite our featured startups to tell us more about their data engineering challenges. Today, we speak with Amberdata, an early-stage company building analysis tools for blockchain infrastructure, applications, and transactions.

Intermix - Featured Startup SF '18

In this blog series leading up to our SF18 conference, we invite our featured startups to tell us more about their data engineering challenges. Today, we speak with Intermix, an early-stage company building performance analytics tools for Amazon Redshift.

How to "Democratize" the Responsibility for Data Quality Across your Organization

 

 

Writing endless data transformations wasn't sustainable for an engineering team handling hundreds of inputs. Here's how Clover Health enabled their business users to help.

It's rare to find an ETL system that's completely static. As organizations change and grow they develop new business requirements. Because of this their data pipelines must change and adapt, ultimately becoming more robust and full-featured. Yet constant development can make already brittle ETL systems seem even more fragile.

Furthermore, systems with large numbers of different types of inputs bring special challenges - building, testing and managing an exploding number of data transformations can become a daunting project for the engineering team. 

The Clover Health ETL system supports hundreds of inputs and more than 500 custom transformations in production as well as a large number of custom connections between their different ETL pipelines. When hearing about the magnitude of the system, one might rightfully wonder, "how does Clover guarantee and maintain data quality across so many different inputs and transforms?"

Exploring the development trajectory of Clover's system makes for a fascinating story; hearing about their data team's successes and pitfalls are illustrative lessons to other engineers as they seek to increase the robustness of their own ETL systems.

| |

Shattering the Trillion-Rows-Per-Second Barrier With MemSQL

Recently at a conference, I had the privilege of demonstrating MemSQL processing over a trillion rows per second on the latest Intel Skylake servers.

NuCypher - Featured Startup SF '18

In this blog series leading up to our SF18 conference, we invite our featured startups to tell us more about their data engineering challenges. Today, we speak with NuCypher, an early-stage company building a decentralized encryption service.

Wootric - Featured Startup SF '18

In this blog series leading up to our SF18 conference, we invite our featured startups to tell us more about their data engineering challenges. Today, we speak with Wootric, an early-stage company building customer experience tools powered by machine learning.

The Future of Distributed Databases is Relational

 

 

What if developers could ditch their No-SQL solutions and still get scalability from a more traditional relational datastore?

I've been noticing an interesting pattern recently where developers seem to be rejecting some of the newer, more en vogue data stores with limited functionality and use-cases (while promising easier scale) and returning to the comfortable tried-and-true paradigm of relational databases. It seems that we've hit a watershed point where developers finally believe they don't necessarily need to make a trade-off between database features on one hand and easy scalability on the other.

One such company enabling this return to the golden era of of RDBMS is Citus Data. Citus is blazing a trail in 'cloud-proofing' the gold standard of relational databases, PostgreSQL, through extensions that allow their customers to achieve much easier horizontal scalability than ever before. 

Halo Tech - Featured Startup SF '18

In this blog series leading up to our SF18 conference, we invite our featured startups to tell us more about their data engineering challenges. Today, we speak with Halo Tech, an early-stage startup that analyzes complex data to accelerate medical advancements.

Wanna be our Pen Pal?

Receive the latest news, tips and special events from our community directly to your inbox once in a while (we promise no spam)

Data Council Blog Signup

Data Council, PO Box 2087, Wilson, WY 83014, USA - Phone: +1 (415) 800-4938 - Email: community (at) datacouncil.ai