At Lyft, like many other organizations, analysts and data scientists were spending more than 1/3rd of their time discovering and establishing trust in the data they use. Lyft has made its analysts and data scientists over 20% more productive by creating an open source data discovery and metadata engine, Amundsen. In this talk, we will deep dive into the product and architecture of Amundsen and discuss how Amundsen leverages centralized metadata, page rank, and a comprehensive data graph to achieve its goal.
Square has been leveraging Amundsen for a different use case. We will share the product Square has built on top of Amundsen to power more granular column-level access control for its data lakes, including Snowflake and BigQuery. We will provide an overview of how Square is tagging additional metadata to understand the data subjects, data storage security, and PII semantic types associated with columns, and uses this enriched metadata to drive purpose-driven access control for its data users.
The talk will end with an insight into current challenges and how we may solve them in the future.
Mark is the co-founder/CEO of Stemma - a modern data catalog for building self-serve data culture used by Grafana, iRobot, SoFi, Convoy and many others. He is the co-creator of the leading open-source data catalog, Amundsen, used by Lyft, Instacart, Square, ING, Snap and many more! Mark was previously a developer on Apache Spark at Cloudera and is a committer and PMC member on a few open-source Apache project. He is a co-author of Hadoop Application Architectures.
Alyssa Ransbury is a Security Engineer at Square and contributor to the Amundsen project. She supports and leads privacy engineering efforts across dozens of product teams and is interested in finding data where it shouldn't be.