Apache Kudu: Fast Analytics on Fast Data

ABOUT THE TALK

Apache Kudu is a storage engine for structured data and one of the newest members of the Hadoop ecosystem. Able to scan at speeds approaching that of Parquet on HDFS, while also able to quickly update, retrieve, and delete individual rows like HBase, Kudu is well-suited for real-time analytics use cases and handling timeseries data. This talk will cover the motivation and design of Kudu, how to get started using Kudu, integrations with Hadoop components like SparkSQL, Flume, and Impala, features and limitations as of 1.0, and improvements coming in 1.1.

Apache Kudu: Fast Analytics on Fast Data

Will Berkeley

ABOUT THE TALK

WILL BERKELY