While building Okta’s next-gen security data platform we quickly realized pipelines needed the total cost of ownership of batch ETL, the latency of streaming, the flexibility of SQL, the durability of S3, the ease and enormous scalability of serverless, ultra-minimal footprint for constrained environments, and maximal cost efficiencies.
To satisfy these requirements we turned to serverless DuckDB’s for all data preprocessing, normalization, operational metadata harvesting, and more. After processing trillions of rows over hundreds of millions of files, we are continuing to learn and find new use cases for this model.
This talk will be about how we built our systems, what we’ve learned along the way, and what we are thinking about next. Do you really need Kafka in 2024? Do you want the latencies of batch ETL? Do you need the cost, complexity, and mess of ELT? Come find out!
Jake currently manages the Data Foundations team at Okta after transitioning from Principal Engineer on Okta's Defensive Cyber Operations team. He previously led data platform teams at Shopify and CarGurus, has taught various O'Reilly courses, and regularly contributes to data-oriented OSS projects.