Skip to main content
Iceberg supports branches and tags as named references to snapshots, enabling sophisticated snapshot lifecycle management beyond basic time travel. These features are essential for data quality workflows, auditing, and experimental data engineering.

Understanding Snapshots

Every commit to an Iceberg table creates a snapshot - a complete, immutable view of the table at a point in time:
// Each write operation creates a new snapshot
table.newAppend()
  .appendFile(dataFile1)
  .commit(); // Creates snapshot 1

table.newAppend()
  .appendFile(dataFile2)  
  .commit(); // Creates snapshot 2

// View snapshot history
for (Snapshot snap : table.snapshots()) {
  System.out.println(snap.snapshotId() + ": " + snap.operation());
}
Snapshots enable:
  • Reader isolation - Queries see a consistent view
  • Time travel - Query historical data
  • Rollback - Revert to previous states
  • Incremental processing - Track changes between snapshots

Snapshot Retention

By default, all snapshots are retained until explicitly expired. The expire_snapshots procedure removes old snapshots:
-- Expire snapshots older than 7 days
CALL catalog_name.system.expire_snapshots(
  table => 'db.table',
  older_than => TIMESTAMP '2024-03-01 00:00:00'
);
However, basic retention has limitations:
  • All snapshots are treated equally
  • Important snapshots can be accidentally expired
  • No way to retain specific historical points
Branches and tags solve these problems by providing independent lifecycle management.

Tags: Named Historical Snapshots

Tags are named references to snapshots with their own retention policies:
-- Create a tag for end-of-month snapshot
ALTER TABLE prod.db.table 
CREATE TAG `EOM-2024-02` AS OF VERSION 12345 RETAIN 180 DAYS;

-- Create a tag for compliance audit (retain forever)
ALTER TABLE prod.db.table
CREATE TAG `AUDIT-Q1-2024` AS OF VERSION 23456;

-- Query using a tag  
SELECT * FROM prod.db.table VERSION AS OF 'EOM-2024-02';

Tag Properties

  • Immutable - Tags always point to the same snapshot
  • Named - Easy to remember and reference (Q4-2023 vs snapshot ID 8372649283746)
  • Independent retention - Each tag has its own max age
  • Lightweight - Just metadata, no data duplication

Tag Retention

Tags control when both the reference and the snapshot can be deleted:
-- Tag retained for 7 days, then expired
CREATE TAG `weekly-backup` RETAIN 7 DAYS;

-- Tag retained forever (default)
CREATE TAG `production-release-v2.0`;

-- Update tag retention  
ALTER TABLE db.table
REPLACE TAG `weekly-backup` RETAIN 14 DAYS;
When expire_snapshots runs:
  1. Expired tags are removed
  2. Snapshots referenced only by expired tags can be deleted
  3. Snapshots referenced by active tags are preserved

Tag Use Cases

Retain monthly snapshots for auditing:
-- Retain end-of-month snapshots for 7 years
ALTER TABLE financial_data
CREATE TAG `EOM-2024-01` AS OF VERSION 1000 RETAIN 2555 DAYS;

ALTER TABLE financial_data  
CREATE TAG `EOM-2024-02` AS OF VERSION 2000 RETAIN 2555 DAYS;
Mark production releases:
-- Tag production deployments (retain forever)
ALTER TABLE product_catalog
CREATE TAG `prod-release-2024-03-01` AS OF VERSION 5432;

-- Reproduce exactly what customers saw
SELECT * FROM product_catalog 
VERSION AS OF 'prod-release-2024-03-01';
Create recovery points before risky operations:
-- Before major data migration
ALTER TABLE user_data
CREATE TAG `pre-migration-backup` RETAIN 30 DAYS;

-- Perform migration...

-- Rollback if needed
CALL catalog_name.system.rollback_to_tag('db.user_data', 'pre-migration-backup');
Implement tiered retention (daily/weekly/monthly/yearly):
-- Daily snapshots retained for 1 week
CREATE TAG `daily-2024-03-01` RETAIN 7 DAYS;

-- Weekly snapshots retained for 1 month  
CREATE TAG `weekly-2024-W09` RETAIN 30 DAYS;

-- Monthly snapshots retained for 6 months
CREATE TAG `monthly-2024-03` RETAIN 180 DAYS;

-- Yearly snapshots retained forever
CREATE TAG `yearly-2024`;

Branches: Independent Lineages

Branches are mutable named references that can have new snapshots committed to them:
-- Create a branch from current snapshot
ALTER TABLE db.table CREATE BRANCH test_branch;

-- Create branch from specific snapshot  
ALTER TABLE db.table 
CREATE BRANCH experiment AS OF VERSION 12345;

-- Write to a branch (Spark)
SET spark.wap.branch = test_branch;
INSERT INTO db.table VALUES (1, 'test');

-- Query branch data
SELECT * FROM db.table.branch_test_branch;

Branch vs Tag

FeatureTagBranch
MutableNo - always points to same snapshotYes - moves as new commits are made
WritableNo - read-only referenceYes - can commit new snapshots
LineageSingle snapshotChain of snapshots (history)
RetentionMax reference ageMax reference age + snapshot retention
Use caseMark historical pointsDevelopment, testing, staging

Branch Retention

Branches have two retention settings:
-- Create branch with retention policies
ALTER TABLE db.table 
CREATE BRANCH test_branch 
RETAIN 7 DAYS                    -- Branch reference expires in 7 days
WITH SNAPSHOT RETENTION 2 SNAPSHOTS; -- Keep last 2 snapshots on branch
  • Branch retention - How long the branch reference exists
  • Snapshot retention - How many snapshots to keep on the branch
When expire_snapshots runs:
  • Snapshots beyond the retention count are deleted
  • After branch expires, all its snapshots can be deleted

Branch Use Cases

Validate data before making it visible:
-- Enable WAP
ALTER TABLE prod.db.table SET TBLPROPERTIES (
  'write.wap.enabled'='true'
);

-- Create audit branch
ALTER TABLE prod.db.table 
CREATE BRANCH audit_branch RETAIN 7 DAYS;

-- Write to audit branch (Spark)
SET spark.wap.branch = audit_branch;
INSERT INTO prod.db.table SELECT * FROM staging.new_data;

-- Validate data quality
SELECT 
  count(*) as total,
  count(DISTINCT user_id) as unique_users
FROM prod.db.table.branch_audit_branch;

-- Publish if validation passes
CALL catalog_name.system.fast_forward(
  'prod.db.table', 'main', 'audit_branch'
);
Test changes without affecting production:
-- Create experiment branch
ALTER TABLE analytics.events 
CREATE BRANCH new_metric_experiment RETAIN 14 DAYS;

-- Write experimental data
SET spark.wap.branch = new_metric_experiment;
INSERT INTO analytics.events 
SELECT *, compute_new_metric(data) as new_metric
FROM source;

-- Analyze results
SELECT avg(new_metric) 
FROM analytics.events.branch_new_metric_experiment;

-- Merge if successful, or let branch expire
Separate staging from production data:
-- Create staging branch
ALTER TABLE db.table CREATE BRANCH staging;

-- Load staging data
SET spark.wap.branch = staging;
COPY INTO db.table FROM 's3://bucket/staging/';

-- Test queries against staging
SELECT * FROM db.table.branch_staging WHERE ...;

-- Promote to main after testing
CALL catalog_name.system.fast_forward('db.table', 'main', 'staging');
Isolate concurrent data pipelines:
-- Pipeline A writes to branch A
CREATE BRANCH pipeline_a RETAIN 1 DAYS;

-- Pipeline B writes to branch B
CREATE BRANCH pipeline_b RETAIN 1 DAYS;

-- Merge both when complete
-- (requires conflict resolution if overlapping data)

Schema with Branches and Tags

Important: Schema is tracked at the table level, not per branch.
When working with branches:
  • Writing to a branch uses the table’s current schema
  • Querying a branch uses the table’s current schema
  • Time travel to a snapshot uses the snapshot’s historical schema
Example:
-- Create table and branch
CREATE TABLE db.table (id bigint, data string, col float);
INSERT INTO db.table VALUES (1, 'a', 1.0);

ALTER TABLE db.table CREATE BRANCH test_branch;

-- Evolve schema (drops col, adds new_col)
ALTER TABLE db.table DROP COLUMN col;
ALTER TABLE db.table ADD COLUMN new_col date;

-- Query branch - uses CURRENT schema (has new_col, not col)
SELECT * FROM db.table.branch_test_branch;
-- Returns: id=1, data='a', new_col=NULL

-- Time travel to snapshot - uses SNAPSHOT's schema (has col)
SELECT * FROM db.table VERSION AS OF <snapshot-id>;
-- Returns: id=1, data='a', col=1.0

Working with Branches and Tags

Creating

-- Create tag
ALTER TABLE db.table CREATE TAG tag_name;

-- From specific snapshot
ALTER TABLE db.table CREATE TAG tag_name AS OF VERSION 12345;

-- With retention  
ALTER TABLE db.table CREATE TAG tag_name RETAIN 30 DAYS;

Reading

-- Query branch
SELECT * FROM db.table.branch_branch_name;

-- Or using VERSION AS OF
SELECT * FROM db.table VERSION AS OF 'branch_name';

-- Query tag
SELECT * FROM db.table VERSION AS OF 'tag_name';

-- List all references
SELECT * FROM db.table.refs;

Writing

-- Set branch for writes
SET spark.wap.branch = branch_name;
INSERT INTO db.table VALUES (...);

-- Or write directly to branch table
INSERT INTO db.table.branch_branch_name VALUES (...);

Merging

-- Fast-forward main to branch tip
-- (only if main hasn't diverged)
CALL catalog_name.system.fast_forward(
  table => 'db.table',
  branch => 'main',
  to => 'staging_branch'
);

Deleting

-- Drop a tag
ALTER TABLE db.table DROP TAG tag_name;

-- Drop a branch (and its snapshots if no longer referenced)
ALTER TABLE db.table DROP BRANCH branch_name;

Retention Policy Example

Comprehensive retention strategy:
-- Main branch: Retain 90 days, keep 100 snapshots minimum
ALTER TABLE prod.events 
CREATE OR REPLACE BRANCH main 
WITH SNAPSHOT RETENTION 100 SNAPSHOTS 90 DAYS;

-- Daily tags: Retain 7 days
CREATE TAG `daily-2024-03-01` RETAIN 7 DAYS;
CREATE TAG `daily-2024-03-02` RETAIN 7 DAYS;

-- Weekly tags: Retain 30 days  
CREATE TAG `weekly-2024-W09` RETAIN 30 DAYS;

-- Monthly tags: Retain 180 days
CREATE TAG `monthly-2024-03` RETAIN 180 DAYS;

-- Yearly tags: Retain forever
CREATE TAG `yearly-2024`;

-- Audit branch: Retain 14 days, keep 5 snapshots
CREATE BRANCH audit 
RETAIN 14 DAYS 
WITH SNAPSHOT RETENTION 5 SNAPSHOTS;

-- Run expiration (respects all retention policies)
CALL catalog_name.system.expire_snapshots(
  table => 'prod.events',
  older_than => TIMESTAMP '2024-01-01 00:00:00'
);

Best Practices

Tags are perfect for points you want to preserve:
  • End of reporting periods
  • Production releases
  • Compliance checkpoints
  • Pre/post migration backups
Branches work well for ongoing development:
  • Feature development and testing
  • Data quality validation
  • Staging environments
  • Experimental analyses
Balance storage cost with recovery needs:
  • Short-lived branches (1-7 days) for testing
  • Medium-term tags (30-90 days) for regular backups
  • Long-term tags (years) for compliance
Use clear naming conventions:
  • daily-YYYY-MM-DD for daily snapshots
  • weekly-YYYY-Www for weekly snapshots
  • monthly-YYYY-MM for monthly snapshots
  • prod-release-vX.Y.Z for releases
  • experiment-description for tests
Too many references can slow metadata operations:
  • Regularly clean up expired branches
  • Automate tag creation/cleanup
  • Use expire_snapshots regularly

Learn More

Table Format

Understand snapshots and metadata structure

Reliability

Learn about Iceberg’s consistency guarantees