At Lyft, like many other organizations, analysts and data scientists were spending more than 1/3rd of their time discovering and establishing trust in the data they use. Lyft has made its analysts and data scientists over 20% more productive by creating an open source data discovery and metadata engine, Amundsen. In this talk, we will deep dive into the product and architecture of Amundsen and discuss how Amundsen leverages centralized metadata, page rank, and a comprehensive data graph to achieve its goal.
Square has been leveraging Amundsen for a different use case. We will share the product Square has built on top of Amundsen to power more granular column-level access control for its data lakes, including Snowflake and BigQuery. We will provide an overview of how Square is tagging additional metadata to understand the data subjects, data storage security, and PII semantic types associated with columns, and uses this enriched metadata to drive purpose-driven access control for its data users.
The talk will end with an insight into current challenges and how we may solve them in the future.
Mark Grover is a Product Manager at Lyft and co-creator of the Amundsen project. He is also a co-author of O'Reilly's Hadoop Application Architectures, which has sold over 10,000 copies worldwide. He is a committer on Apache Bigtop, Apache Sentry, Apache Spot (incubating), and previously worked as an engineer at Cloudera.
Alyssa Ransbury is a Security Engineer at Square and contributor to the Amundsen project. She supports and leads privacy engineering efforts across dozens of product teams and is interested in finding data where it shouldn't be.