Introduction to Apache Iceberg
Apache Iceberg is an open table format for huge analytic datasets. Iceberg brings the reliability and simplicity of SQL tables to big data, while making it possible for engines like Spark, Trino, Flink, Presto, Hive, and Impala to safely work with the same tables, at the same time.What is Iceberg?
Iceberg is a high-performance format for huge analytic tables that works just like a SQL table. Instead of managing data files directly, Iceberg adds a layer of metadata that tracks all files in a table, enabling features that are difficult or impossible with traditional table formats.Iceberg is used in production where a single table can contain tens of petabytes of data, and even these huge tables can be read without a distributed SQL engine.
Why Iceberg?
Traditional table formats like Hive tables suffer from several problems:- No ACID guarantees: Changes to tables are not atomic, leading to inconsistent reads
- Partition maintenance: Users must manually manage partition columns and filters
- Schema evolution issues: Changing schemas can accidentally corrupt data
- Slow planning: Listing files in object stores takes O(n) operations with partition count
- No time travel: Can’t query historical data or rollback bad changes
Key Features
ACID Transactions
Iceberg provides serializable isolation for all table operations. All changes are atomic - readers never see partial or uncommitted changes, and writers use optimistic concurrency to safely handle conflicts.Schema Evolution
Add, drop, rename, update, or reorder columns without fear. Schema evolution in Iceberg:- Never requires rewriting data files
- Has no side effects (won’t inadvertently un-delete data)
- Supports column renaming with full lineage tracking
Hidden Partitioning
Partitioning in Iceberg is hidden from users. Unlike Hive, you don’t need to:- Manually produce partition values when writing
- Remember to add partition filters to queries
- Worry about using the wrong format or column
Time Travel and Versioning
Every write creates a new snapshot. Query historical data or rollback to previous versions:Partition Evolution
Change partitioning layout as data volume or query patterns change, without rewriting data:Table Format Architecture
Iceberg uses a three-level metadata structure:Metadata File
The table metadata file is the entry point. It stores the table schema, partition spec, snapshots, and a pointer to the current snapshot. This file is updated atomically on every commit.
Manifest List
Each snapshot has a manifest list that tracks all manifest files. The manifest list stores partition value ranges for each manifest, enabling metadata filtering during scan planning.
Manifest Files
Manifest files track data files and their statistics. Each manifest stores partition data and column-level stats (row count, null count, min/max values) for each data file.
Performance Benefits
Fast Scan Planning
Fast Scan Planning
Iceberg’s metadata tree enables O(1) RPC calls to plan queries, instead of O(n) directory listings. Even petabyte-scale tables can be planned on a single node.
- Manifest lists act as an index over manifest files
- Partition and column statistics enable aggressive file pruning
- No distributed planning required for metadata operations
Advanced Data Filtering
Advanced Data Filtering
Data files are pruned using:
- Partition values (automatic from hidden partitioning)
- Column-level statistics (min/max, null count, value count)
- Expression pushdown (predicates evaluated at metadata level)
Distributed Planning
Distributed Planning
File pruning and predicate pushdown happens in parallel on worker nodes, removing the metastore as a bottleneck.
Reliability Guarantees
Iceberg was designed to solve correctness problems in eventually-consistent cloud object stores:- Serializable isolation: All table changes occur in a linear history of atomic updates
- Reliable reads: Readers use a consistent snapshot without holding locks
- No file listing: Works with any object store; no consistent listing required
- Concurrent writes: Multiple writers use optimistic concurrency with automatic retry
- Safe operations: Enables compaction, late data handling, and partition changes safely
Multi-Engine Support
Iceberg tables work with multiple compute engines simultaneously:- Apache Spark - Most feature-rich engine for Iceberg
- Trino / PrestoDB - Fast SQL queries
- Apache Flink - Streaming reads and writes
- Apache Hive - Batch processing
- Dremio - Data lakehouse platform
- Impala - High-performance SQL
Open Standard
Iceberg is:- Apache 2.0 licensed - Completely open source
- Vendor neutral - No single company controls the format
- Well specified - Detailed specification ensures compatibility
- Multi-language - Implementations in Java, Python, Rust, Go, and C++
Getting Started
Ready to start using Iceberg? Check out these resources:Quick Start
Get started with Iceberg in minutes
Java API Guide
Learn the Java API for programmatic access
Architecture
Deep dive into the table format specification
Community
Join the Apache Iceberg community
Next Steps
Now that you understand what Iceberg is and why it exists, you can:- Follow the Quick Start Guide to create your first table
- Learn the Java API for programmatic table management
- Explore advanced features like branching, tagging, and table maintenance
- Read the specification to understand the format in detail