At Lyft, Mark built the Amundsen data catalog so data scientists could navigate hundreds of thousands of tables to distinguish trustworthy data from sandboxed, out-of-date data. When he took Amundsen open source, he helped dozens of data teams support a variety of demands to make data discoverable and self-serve. Time and again, Mark sees processes that seem “good enough” come back to bite data teams. For example, because it doesn’t directly involve creating the canonical data set, sending a Slack blast about an upcoming change feels like an adequate effort to warn users. But, key stakeholders frequently miss the message, the change still causes pain, and angry fingers still point back at the data team. Fortunately, there is a better way. With a little bit of inside knowledge, you can see who is actually using your data so you can better serve them as a customer. So roll up your sleeves because Mark is going to take you deep into query logs and APIs to see where all of that metadata lives, and he'll show you how to use it so you don’t lose any fingers during your next data change.
Mark is the co-founder/CEO of Stemma - a modern data catalog for building self-serve data culture used by Grafana, iRobot, SoFi, Convoy and many others. He is the co-creator of the leading open-source data catalog, Amundsen, used by Lyft, Instacart, Square, ING, Snap and many more! Mark was previously a developer on Apache Spark at Cloudera and is a committer and PMC member on a few open-source Apache project. He is a co-author of Hadoop Application Architectures.