Skip to main content

Overview

Iceberg uses Apache Spark’s DataSourceV2 API for catalog implementations. To use Iceberg DDL commands, first configure Spark catalogs.

CREATE TABLE

Spark 3+ can create tables in any Iceberg catalog using the USING iceberg clause:
CREATE TABLE prod.db.sample (
    id bigint NOT NULL COMMENT 'unique id',
    data string
) USING iceberg;

Table Options

Table create commands support the full range of Spark create clauses:
  • PARTITIONED BY (partition-expressions) - Configure partitioning
  • LOCATION '(fully-qualified-uri)' - Set table location
  • COMMENT 'table documentation' - Add table description
  • TBLPROPERTIES ('key'='value', ...) - Set table configuration
CREATE TABLE ... LIKE ... syntax is not supported.

Partitioned Tables

Create partitioned tables using PARTITIONED BY:
CREATE TABLE prod.db.sample (
    id bigint,
    data string,
    category string
) USING iceberg
PARTITIONED BY (category);

Partition Transforms

Iceberg supports the following partition transforms:
TransformDescriptionExample
year(ts)Partition by yearPARTITIONED BY (year(ts))
month(ts)Partition by monthPARTITIONED BY (month(ts))
day(ts) or date(ts)Partition by datePARTITIONED BY (day(ts))
hour(ts) or date_hour(ts)Partition by date and hourPARTITIONED BY (hour(ts))
bucket(N, col)Hash bucket (mod N)PARTITIONED BY (bucket(16, id))
truncate(L, col)Truncate to length LPARTITIONED BY (truncate(10, data))
For strings, truncate limits to the given length. For integers and longs, it creates bins: truncate(10, i) produces partitions 0, 10, 20, 30, etc.

CREATE TABLE AS SELECT (CTAS)

Create tables populated with query results:
CREATE TABLE prod.db.sample
USING iceberg
AS SELECT id, data FROM source;
CTAS is atomic when using SparkCatalog, but not atomic when using SparkSessionCatalog.

REPLACE TABLE AS SELECT (RTAS)

Atomically replace table contents while preserving history:
REPLACE TABLE prod.db.sample
USING iceberg
AS SELECT id, data FROM source;
The schema and partition spec will be replaced if changed. To avoid modifying the schema, use INSERT OVERWRITE instead.

DROP TABLE

Drop behavior changed in Iceberg 0.14:
  • Before 0.14: DROP TABLE deleted table metadata and contents
  • From 0.14: DROP TABLE only removes from catalog; use DROP TABLE PURGE to delete contents

Remove from Catalog Only

DROP TABLE prod.db.sample;

Remove from Catalog and Delete Contents

DROP TABLE prod.db.sample PURGE;

ALTER TABLE

Iceberg provides full ALTER TABLE support in Spark 3:
  • Rename tables
  • Set or remove table properties
  • Add, delete, and rename columns
  • Add, delete, and rename nested fields
  • Reorder columns
  • Widen numeric types
  • Change column nullability

Rename Table

ALTER TABLE prod.db.sample RENAME TO prod.db.new_name;

Table Properties

ALTER TABLE prod.db.sample SET TBLPROPERTIES (
    'read.split.target-size'='268435456'
);

Add Columns

ALTER TABLE prod.db.sample
ADD COLUMNS (
    new_column string COMMENT 'new column docs'
);
For arrays and maps, use element and value keywords to access nested columns:
  • ADD COLUMN points.element.z double - Add field to array element
  • ADD COLUMN points.value.b int - Add field to map value

Rename Columns

ALTER TABLE prod.db.sample RENAME COLUMN data TO payload;
ALTER TABLE prod.db.sample RENAME COLUMN location.lat TO latitude;
Nested rename only affects the leaf field. Renaming location.lat to latitude results in location.latitude.

Alter Column Type and Properties

Safe type widening is supported:
ALTER TABLE prod.db.sample 
ALTER COLUMN measurement TYPE double;
Safe type conversions:
  • intbigint
  • floatdouble
  • decimal(P,S)decimal(P2,S) where P2 > P
  • Cannot change nullable column to non-nullable with SET NOT NULL
  • Use ADD COLUMN and DROP COLUMN for struct type changes

Drop Columns

ALTER TABLE prod.db.sample DROP COLUMN id;
ALTER TABLE prod.db.sample DROP COLUMN point.z;

SQL Extensions

The following commands require Iceberg SQL extensions.

Partition Evolution

ALTER TABLE prod.db.sample 
ADD PARTITION FIELD catalog;

ALTER TABLE prod.db.sample 
ADD PARTITION FIELD bucket(16, id);

ALTER TABLE prod.db.sample 
ADD PARTITION FIELD bucket(16, id) AS shard;
Dynamic partition overwrite behavior changes when partitioning changes. For example, moving from daily to hourly partitions will cause overwrites to affect hourly partitions instead of daily ones.

Write Ordering

Configure automatic data sorting for writes:
ALTER TABLE prod.db.sample 
WRITE ORDERED BY category, id;

ALTER TABLE prod.db.sample 
WRITE ORDERED BY category ASC NULLS LAST, id DESC;

Identifier Fields

ALTER TABLE prod.db.sample 
SET IDENTIFIER FIELDS id;

ALTER TABLE prod.db.sample 
SET IDENTIFIER FIELDS id, data;
Identifier fields must be NOT NULL columns. Setting identifier fields enables Flink upsert operations.

Branching and Tagging

ALTER TABLE prod.db.sample 
CREATE BRANCH `audit-branch`;

ALTER TABLE prod.db.sample 
CREATE BRANCH IF NOT EXISTS `audit-branch`;

ALTER TABLE prod.db.sample 
CREATE BRANCH `audit-branch` AS OF VERSION 1234
RETAIN 30 DAYS 
WITH SNAPSHOT RETENTION 3 SNAPSHOTS 2 DAYS;

Iceberg Views

Iceberg views require Spark 3.4+.

Create View

CREATE VIEW view_name 
AS SELECT * FROM table_name;

CREATE VIEW IF NOT EXISTS view_name 
AS SELECT * FROM table_name;

Manage Views

CREATE OR REPLACE VIEW view_name
AS SELECT id FROM table_name;

Next Steps

Query Data

Learn about SELECT queries and time travel

Write Data

Master INSERT, MERGE, and UPDATE operations

Configuration

Configure Spark catalogs and options

Procedures

Maintain tables with stored procedures