At Datadog, we collect almost a trillion metric data points per day from hosts, containers, services, and customers all over the world. We have built a highly elastic, cloud-based platform to power analytics, machine learning, and statistical analysis on this data at high scale.

In this talk, we will discuss the cloud-based platform we have built and how it differs from a traditional datacenter-based analytics stack. We will walk through the decisions we have made at each layer, including leveraging S3 for data storage; isolating job families on their own ephemeral, spot-instance-powered clusters; tailoring hardware to the job family; optimizing the development cycle with git-powered deployment, and more. We'll cover the pros and cons of these decisions vs a traditional stack in detail.

We'll also discuss the tooling we have built to manage this level of dynamism and make it simple for data scientists and engineers to use. Finally, we will end with recommendations for folks getting started with their own analytics platform in the cloud: tools, frameworks, and platforms you can build upon.