Building a Reliable, Secure and Efficient Event Ingestion Pipeline

Suman Karumuri Suman Karumuri | Principal Engineer | Airbnb

In this talk, we will delve into the existing challenges in event data collection from mobile clients. Our focus will be on typical data pipelines which involve collecting event data from various clients and back-end services, and processing these through systems like Kafka, Flink, or Spark, and eventually storing in data warehouses or platforms like ElasticSearch. Despite the widespread use of these technologies, we face significant issues in terms of reliability, security, scalability, and efficiency, leading to compromised data quality and freshness.

The current event ingestion pipelines lack standardization, often requiring a significant amount of custom code for data ingestion into systems like Kafka. This results in data loss risks and inefficient processing, especially during outages or deployments. We will explore how the lack of reliable ingestion and custom requirements for data transformations and filtering contribute to operational inefficiencies and an increased error rate, especially in high-demand scenarios or during spikes in data ingestion.

Our proposal is an innovative event ingestion pipeline designed to be operationally reliable, cost-efficient, and secure enough for internet exposure. This new system will focus on lightweight data translation, ensuring data integrity by checking for PII leaks, and will be built primarily from open-source components. The key feature of this pipeline is its config-driven nature, allowing for most changes to be made without any code modifications, thereby significantly reducing deployment and operational complexities.

The audience will walk away with a clear understanding of the challenges involved in building a reliable and cost effective event ingestion pipeline and solutions on how to improve the reliability and cost effectiveness of these pipelines.

Principal Engineer | Airbnb

Suman Karumuri is currently a Principal Engineer and the tech lead at Airbnb. He works at the dynamic intersection of big data and Observability. His current interests are building reliable data ingestion pipelines and a cloud-native database called KalDB.