In this talk I walk through how the real-time streaming data platform at Skyscanner evolved over time. This platform now processes hundreds of billions of events per day and includes all our application logs, metrics and business events. But we certainly did not get it right on day one. In fact, we are still not quite there.
Our story is a case study of developing a data platform in agile fashion. And how with data platforms, small decisions can have outsized effects. We went from having a batch-driven analysis in a data center, to a streaming platform that processes events in real-time, to something in-between. I will explain what got us here, and why you may want to skip some of the steps along the way.
Choosing the right mix of batch and real-time for your problem is critical, and I hope the war story I share here will help you make the right call for your organization. And if nothing else, it will show you that it’s never too late to course-correct.
Herman Schaaf is a senior software engineer at Skyscanner, where he works primarily on building the central data platform. In his free time he loves reading and traveling, but even then his mind often still wanders back to software, data structures, algorithms and distributed systems.