Data Council Blog
Open Source Highlight: Apache Iceberg
Apache Iceberg is an open table format for very large analytic datasets. You can use it with Presto or Spark to add tables that use a high-performance format that vows to work just like a SQL table.

While it was initially developed at Netflix, it is now open-sourced, with contributors from Apple, LinkedIn, GoDataDriven, Lyft, WeWork, and more. As a matter of fact, it is of great value to companies that need to query huge quantities of data; as its site points out, "Iceberg is used in production where a single table can contain tens of petabytes of data and even these huge tables can be read without a distributed SQL engine."
It is worth noting that users don't need to know about partitioning to get fast queries, in part thanks to hidden partitioning and other features aimed at preventing user mistakes. In addition, members of the community are working on making it easy to migrate from Hive to Iceberg, so stay tuned for more information on that front.
If you'd like to hear more about Iceberg, and also Arrow and other tools that can help you architect an open cloud data lake platform, make sure to check out the talk that Ryan Murray from Dremio recently gave as part of our virtual London meetup.
Subscribe to Email Updates
Receive relevant content, news and event updates from our community directly into your inbox. Enter your email (we promise no spam!) and tell us below which region(s) you are interested in:
Fresh Posts
Categories
- Analytics (15)
- Apache Arrow (3)
- Artificial Intelligence (7)
- Audio Research (1)
- big data (7)
- BigQuery (2)
- Careers (2)
- Data Discovery (2)
- data engineer salary (1)
- Data Engineering (46)
- Data Infrastructure (2)
- Data Lakes (1)
- Data Pipelines (6)
- Data Science (33)
- Data Strategy (14)
- Data Visualization (6)
- Data Warehouse (10)
- Data Warehousing (2)
- Databases (4)
- datacoral (1)
- disaster management (1)
- Event Updates (12)
- functional programming (1)
- Learning (1)
- Machine Learning (18)
- memsql (1)
- nosql (1)
- Open Source (21)
- ops (1)
- postgresql (1)
- Redshift (1)
- sharding (1)
- Snowflake (1)
- Speaker Spotlight (5)
- SQL (2)
- Startups (12)
.png?width=347&height=97&name=Data%20Council%20AI%20logo%20(1).png)