At Lyft, like many other organizations, analysts and data scientists were spending more than 1/3rd of their time discovering and establishing trust in the data they use. Lyft has made its analysts and data scientists over 20% more productive by creating an open source data discovery and metadata engine, Amundsen. In this talk, we will deep dive into the product and architecture of Amundsen and discuss how Amundsen leverages centralized metadata, page rank, and a comprehensive data graph to achieve its goal.
Square has been leveraging Amundsen for a different use case. We will share the product Square has built on top of Amundsen to power more granular column-level access control for its data lakes, including Snowflake and BigQuery. We will provide an overview of how Square is tagging additional metadata to understand the data subjects, data storage security, and PII semantic types associated with columns, and uses this enriched metadata to drive purpose-driven access control for its data users.
The talk will end with an insight into current challenges and how we may solve them in the future.
Mark Grover is a product manager at Lyft and the co-founder of Amundsen project. Mark’s a committer on Apache Bigtop, a committer and PPMC member on Apache Spot (incubating), and a committer and PMC member on Apache Sentry. He’s also contributed to a number of open source projects, including Apache Hadoop, Apache Hive, Apache Sqoop, and Apache Flume. He’s a coauthor of Hadoop Application Architectures and wrote a section in Programming Hive.
Alyssa Ransbury is a Security Engineer at Square and contributor to the Amundsen project. She supports and leads privacy engineering efforts across dozens of product teams and is interested in finding data where it shouldn't be.