At Alpha Health we are developing genuine data products and hence we regard data as one of our most valuable assets. For such products that are in constant development, data also evolves continuously to meet the requirement. Hence, from the product version to version, some data entities are expected to change and even to look completely different. To keep track of the changes over time while improving the understanding of our data, we need to accurately define our data entities using schemas and how these schemas evolve from version to version.
This talk aims to cover the integration of JSON-schemas in our data flow in real-time, the benefits of using VEA (Validating, Evolving & Anonymizing) and how this approach can empower the whole company to bring data to the next level.
We will focus on how to detect and put invalid data into quarantine, how we evolve data into its latest schema version in a streamlined manner and how we generate de-identified, GDPR-compliant data.
We will go over the multiple benefits and challenges we found during the implementation from managing data models, to iterating the infrastructure.
Albert Franzi is a Software Engineer who fell so in love with data that ended up as Data Engineer for Alpha Health. He believes in a world where distributed organizations can work together to build common and reusable tools to empower their users and projects. Albert cares deeply about unified data and models as well as data quality and enrichment. He also has a secret plan to conquer the world with data, insights, and penguins.