Apache Kudu is a storage engine for structured data and one of the newest members of the Hadoop ecosystem. Able to scan at speeds approaching that of Parquet on HDFS, while also able to quickly update, retrieve, and delete individual rows like HBase, Kudu is well-suited for real-time analytics use cases and handling timeseries data. This talk will cover the motivation and design of Kudu, how to get started using Kudu, integrations with Hadoop components like SparkSQL, Flume, and Impala, features and limitations as of 1.0, and improvements coming in 1.1.
Will Berkeley is an Apache Kudu PMC Member and Senior Solutions Architect at Cloudera.