Data Council Blog

Rebuilding Open Source Analytics @ Airbnb

Written by Pete Soderling | 12/04/17 18:21

How open source allowed Airbnb to rebuild their expensive BI tool in less than one developer year

Granted Maxime Beauchemin isn't your average data engineer. As any Bay Area engineer worth their salt knows, anyone who worked on data at Facebook receives (deserves) a certain outsized respect from their peers.

 

Call him a '10x-er', or data unicorn or fill-in-your-elite-data-engineer-term-here; folks like Max are renown in the Valley for being engineers on the very early front lines of data wrangling during the data explosion.

 

But when faced with the challenge of rethinking data analytics and visualization tools for the Airbnb business users, even he couldn't imagine it being realistic for one engineer, part-time, working for less than a year to create a data exploration product that could compete with other enterprise quality business intelligence (BI) tools out there.

 

In fact, the story behind this project had much more modest beginnings; the original goal wasn't to completely replace the BI stack at Airbnb. Max simply wanted an easy way to visualize data out of Druid. So he built a new datavis tool called Superset to do it. From very early on, the business users loved it. But the tipping point from small project to full-blown data analytics platform came later. He decided to add support for SQL databases in order visualize data out of Presto without the need to create Tableau extract jobs. Total game-changer!

 

In this talk you'll hear the full story of Max's journey through this analytics platform development project, and better understand how various pieces of open source software enabled him to lead the charge.

Meet Maxime Beauchemin of Airbnb

Max Beauchemin works at Airbnb as part of the "Analytics & Experimentation Products team", developing open source products that reduce friction and help generating insight from data. He is also the creator and a lead maintainer of Apache Airflow. Before Airbnb, Maxime worked at Facebook on growth analytics, on clickstream analytics at Yahoo!, and as a data warehouse architect at Ubisoft.

 

The availability of high-quality open source data tooling isn't news, but the advent of newer tools such as Thrift, Druid, Spark Streaming and now Max's latest OS contribution, a datavis engine called Superset, make up the technical underpinning of his story.

 

Superset and Druid emerge as heroes of the project - but he also notes that the Superset can be configured to talk to any SQL database or be extended to talk to other noSQL databases as well. This provides coverage against a broad variety of datastores enabling Superset to easily consume data from disparate systems as well as work with large datasets to transparently enable realtime analytics.

 

And while Superset currently only offers a subset of what traditional BI vendors offer, the initial version of the tool proved successful enough with the Airbnb business users that the company now has a full engineering team working on it, even as it's garnered attention from a growing list of external contributors.

 

If you attend Max's talk at DataEngConf you'll walk away with an understanding of how he built his solution as well as insights into how realtime analytics work at Airbnb. But be careful - once you're aware of the power of this toolset, you might just leave the conference inspired to rebuild analytics at your own company!