Introduction to Apache Iceberg

Apache Iceberg is an open table format for huge analytic datasets. Iceberg brings the reliability and simplicity of SQL tables to big data, while making it possible for engines like Spark, Trino, Flink, Presto, Hive, and Impala to safely work with the same tables, at the same time.

What is Iceberg?

Iceberg is a high-performance format for huge analytic tables that works just like a SQL table. Instead of managing data files directly, Iceberg adds a layer of metadata that tracks all files in a table, enabling features that are difficult or impossible with traditional table formats.

Iceberg is used in production where a single table can contain tens of petabytes of data, and even these huge tables can be read without a distributed SQL engine.

Why Iceberg?

Traditional table formats like Hive tables suffer from several problems:

No ACID guarantees: Changes to tables are not atomic, leading to inconsistent reads
Partition maintenance: Users must manually manage partition columns and filters
Schema evolution issues: Changing schemas can accidentally corrupt data
Slow planning: Listing files in object stores takes O(n) operations with partition count
No time travel: Can’t query historical data or rollback bad changes

Iceberg solves all of these problems with a modern table format designed from the ground up for cloud object stores.

Key Features

ACID Transactions

Iceberg provides serializable isolation for all table operations. All changes are atomic - readers never see partial or uncommitted changes, and writers use optimistic concurrency to safely handle conflicts.

// All operations are atomic
table.newAppend()
    .appendFile(dataFile)
    .commit();  // Either succeeds completely or fails

Schema Evolution

Add, drop, rename, update, or reorder columns without fear. Schema evolution in Iceberg:

Never requires rewriting data files
Has no side effects (won’t inadvertently un-delete data)
Supports column renaming with full lineage tracking

table.updateSchema()
    .addColumn("new_column", Types.StringType.get())
    .renameColumn("old_name", "new_name")
    .commit();

Hidden Partitioning

Partitioning in Iceberg is hidden from users. Unlike Hive, you don’t need to:

Manually produce partition values when writing
Remember to add partition filters to queries
Worry about using the wrong format or column

Hive approach (error-prone):

-- Must manually calculate partition value
INSERT INTO logs PARTITION (event_date)
  SELECT level, message, event_time, format_time(event_time, 'YYYY-MM-dd')
  FROM source;

-- Must remember partition filter or scan everything
SELECT * FROM logs
WHERE event_time > '2024-01-01'
  AND event_date = '2024-01-01';  -- Easy to forget!

Iceberg approach (automatic):

-- Partition values calculated automatically
INSERT INTO logs SELECT level, message, event_time FROM source;

-- Partition pruning happens automatically
SELECT * FROM logs WHERE event_time > '2024-01-01';

Time Travel and Versioning

Every write creates a new snapshot. Query historical data or rollback to previous versions:

-- Query table as of a timestamp
SELECT * FROM my_table 
TIMESTAMP AS OF '2024-01-01 10:00:00';

-- Query table as of a snapshot ID
SELECT * FROM my_table 
VERSION AS OF 5678901234;

-- View all snapshots
SELECT * FROM my_table.snapshots;

Partition Evolution

Change partitioning layout as data volume or query patterns change, without rewriting data:

// Start with daily partitioning
PartitionSpec spec = PartitionSpec.builderFor(schema)
    .day("event_time")
    .build();

// Later, evolve to hourly partitioning for new data
table.updateSpec()
    .removeField("event_time_day")
    .addField(Expressions.hour("event_time"))
    .commit();

Table Format Architecture

Iceberg uses a three-level metadata structure:

Metadata File

The table metadata file is the entry point. It stores the table schema, partition spec, snapshots, and a pointer to the current snapshot. This file is updated atomically on every commit.

Manifest List

Each snapshot has a manifest list that tracks all manifest files. The manifest list stores partition value ranges for each manifest, enabling metadata filtering during scan planning.

Manifest Files

Manifest files track data files and their statistics. Each manifest stores partition data and column-level stats (row count, null count, min/max values) for each data file.

Data Files

The actual data in Parquet, Avro, or ORC format.

Metadata File
  └─ Manifest List (Snapshot)
      ├─ Manifest File 1
      │   ├─ data-file-1.parquet (with stats)
      │   ├─ data-file-2.parquet (with stats)
      │   └─ data-file-3.parquet (with stats)
      └─ Manifest File 2
          ├─ data-file-4.parquet (with stats)
          └─ data-file-5.parquet (with stats)

Performance Benefits

Fast Scan Planning

Iceberg’s metadata tree enables O(1) RPC calls to plan queries, instead of O(n) directory listings. Even petabyte-scale tables can be planned on a single node.

Manifest lists act as an index over manifest files
Partition and column statistics enable aggressive file pruning
No distributed planning required for metadata operations

Advanced Data Filtering

Data files are pruned using:

Partition values (automatic from hidden partitioning)
Column-level statistics (min/max, null count, value count)
Expression pushdown (predicates evaluated at metadata level)

This can provide 10x performance improvements for selective queries.

Distributed Planning

File pruning and predicate pushdown happens in parallel on worker nodes, removing the metastore as a bottleneck.

Reliability Guarantees

Iceberg was designed to solve correctness problems in eventually-consistent cloud object stores:

Serializable isolation: All table changes occur in a linear history of atomic updates
Reliable reads: Readers use a consistent snapshot without holding locks
No file listing: Works with any object store; no consistent listing required
Concurrent writes: Multiple writers use optimistic concurrency with automatic retry
Safe operations: Enables compaction, late data handling, and partition changes safely

Traditional formats like Hive rely on file listing and rename operations, which are unsafe on S3 and other eventually-consistent stores. Iceberg solves this by tracking all files in metadata.

Multi-Engine Support

Iceberg tables work with multiple compute engines simultaneously:

Apache Spark - Most feature-rich engine for Iceberg
Trino / PrestoDB - Fast SQL queries
Apache Flink - Streaming reads and writes
Apache Hive - Batch processing
Dremio - Data lakehouse platform
Impala - High-performance SQL

All engines can safely read and write the same Iceberg tables concurrently.

Open Standard

Iceberg is:

Apache 2.0 licensed - Completely open source
Vendor neutral - No single company controls the format
Well specified - Detailed specification ensures compatibility
Multi-language - Implementations in Java, Python, Rust, Go, and C++

Getting Started

Ready to start using Iceberg? Check out these resources:

Quick Start

Get started with Iceberg in minutes

Java API Guide

Learn the Java API for programmatic access

Architecture

Deep dive into the table format specification

Community

Join the Apache Iceberg community

Next Steps

Now that you understand what Iceberg is and why it exists, you can:

Follow the Quick Start Guide to create your first table
Learn the Java API for programmatic table management
Explore advanced features like branching, tagging, and table maintenance
Read the specification to understand the format in detail

Getting Started

Core Concepts

Table Operations

Query Engines

Catalogs & Storage

Advanced Features

Migration

Integrations

Introduction to Apache Iceberg

Introduction to Apache Iceberg

What is Iceberg?

Why Iceberg?

Key Features

ACID Transactions

Schema Evolution

Hidden Partitioning

Time Travel and Versioning

Partition Evolution

Table Format Architecture

Performance Benefits

Reliability Guarantees

Multi-Engine Support

Open Standard

Getting Started

Quick Start

Java API Guide

Architecture

Community

Next Steps

Getting Started

Core Concepts

Table Operations

Query Engines

Catalogs & Storage

Advanced Features

Migration

Integrations

​Introduction to Apache Iceberg

​What is Iceberg?

​Why Iceberg?

​Key Features

​ACID Transactions

​Schema Evolution

​Hidden Partitioning

​Time Travel and Versioning

​Partition Evolution

​Table Format Architecture

​Performance Benefits

​Reliability Guarantees

​Multi-Engine Support

​Open Standard

​Getting Started

Quick Start

Java API Guide

Architecture

Community

​Next Steps

Introduction to Apache Iceberg

What is Iceberg?

Why Iceberg?

Key Features

ACID Transactions

Schema Evolution

Hidden Partitioning

Time Travel and Versioning

Partition Evolution

Table Format Architecture

Performance Benefits

Reliability Guarantees

Multi-Engine Support

Open Standard

Getting Started

Next Steps