InfluxDB is an open source time series database developed over the last 3 years. In that time we've tried different storage engines starting with LevelDB and testing out HyperLevelDB, RocksDB and BoltDB. Over a year ago we made the decision to write our own storage engine from scratch. Inspired by the LSM Tree underlying LevelDB and its variants, we created a new storage engine we're calling the TSM Tree (Time Structured Merge Tree). Over the last eight months we've added to this storage engine to provide index capabilities for mapping metadata to underlying time series.
This talk will briefly cover our journey with other storage engines and why we ultimately decided to write our own from scratch. The underlying InfluxDB storage engine is more like two storage engines in one: a time series storage engine and an inverted index for metadata. This talk will dive into the details about how each of these systems work, their design considerations and lessons learned along the way. We'll cover compression techniques for columnar time series storage, Robin Hood Hashing for quickly index lookups, and sketches for estimation of series cardinality at scale.
Paul Dix is cofounder and CTO of InfluxData, the company behind InfluxDB, the open source time series database. He has helped build software for startups, large companies and organizations like Microsoft, Google, McAfee, Thomson Reuters, and Air Force Space Command. He is the series editor for Addison Wesley's Data & Analytics book and video series. In 2010 Paul wrote the book Service Oriented Design with Ruby and Rails for Addison Wesley's Professional Ruby series. In 2009 he started the NYC Machine Learning Meetup, which has over 9,000 members. Paul holds a degree in computer science from Columbia University.