Using Apache Arrow, Calcite and Parquet to build a Relational Cache

Technical Talks

Everybody wants to get to data faster. As we move from more general solution to specific optimization techniques, the level of performance impact grows. This talk will discuss how layering in-memory caching, columnar storage and relational caching can combine to provide a substantial improvement in overall data science and analytical workloads. It will include a detailed overview of how you can use Apache Arrow, Calcite and Parquet to achieve multiple magnitudes improvement in performance over what is currently possible.

We'll start by talking about in-memory caches and the difference between block-based and data-aware caching strategies. We'll discuss the deployment design of this type of solution as well as cover the strengths of each. There will also be a discussion of the relationship of security and predicate application in these scenarios. Then we'll go into detail about how columnar storage formats can further enhance performance by minimizing read time, optimizing for vectorized in-memory processing and powerful compression techniques.

Lastly, we'll introduce a much more advanced way to speed access to data called relational caching. Relational caching builds a cache on columnar in-memory caching techniques but also includes a full comprehension of how data is being used and how different forms of data relate to each other. This will include leveraging multiple sorting and partitioning strategies as well as maintaining multiple related derivations of data for different types of access patterns. As part of this and we also cover approaches to data ttl, relational cache consistency and several different approaches to data mutation and real-time updates.

Jacques Nadeau

Co-founder & CTO | Dremio

Jacques is currently co-founder and CTO of YapMap.com, a Visual Search & Discovery Engine for the social web. Jacques is an industry veteran with over 15 years of Internet technology experience. Prior to leading the technology efforts at YapMap, he was Director of Product with Quigo (acquired by AOL in 2007). Previously he led product development efforts at Offermatica (now part of Omniture/Adobe) and built the Avenue A | Razorfish analytics technology and services practice. Jacques has a B.A. from Pomona College.

Technical Talks

Using Apache Arrow, Calcite and Parquet to build a Relational Cache

FEATURED MEETINGS

Follow / Join Us

Contact Us

Menu