Spark DDL Operations - Apache Iceberg Documentation

Overview

Iceberg uses Apache Spark’s DataSourceV2 API for catalog implementations. To use Iceberg DDL commands, first configure Spark catalogs.

CREATE TABLE

Spark 3+ can create tables in any Iceberg catalog using the USING iceberg clause:

CREATE TABLE prod.db.sample (
    id bigint NOT NULL COMMENT 'unique id',
    data string
) USING iceberg;

Table Options

Table create commands support the full range of Spark create clauses:

PARTITIONED BY (partition-expressions) - Configure partitioning
LOCATION '(fully-qualified-uri)' - Set table location
COMMENT 'table documentation' - Add table description
TBLPROPERTIES ('key'='value', ...) - Set table configuration

CREATE TABLE ... LIKE ... syntax is not supported.

Partitioned Tables

Create partitioned tables using PARTITIONED BY:

CREATE TABLE prod.db.sample (
    id bigint,
    data string,
    category string
) USING iceberg
PARTITIONED BY (category);

Partition Transforms

Iceberg supports the following partition transforms:

Transform	Description	Example
`year(ts)`	Partition by year	`PARTITIONED BY (year(ts))`
`month(ts)`	Partition by month	`PARTITIONED BY (month(ts))`
`day(ts)` or `date(ts)`	Partition by date	`PARTITIONED BY (day(ts))`
`hour(ts)` or `date_hour(ts)`	Partition by date and hour	`PARTITIONED BY (hour(ts))`
`bucket(N, col)`	Hash bucket (mod N)	`PARTITIONED BY (bucket(16, id))`
`truncate(L, col)`	Truncate to length L	`PARTITIONED BY (truncate(10, data))`

For strings, truncate limits to the given length. For integers and longs, it creates bins: truncate(10, i) produces partitions 0, 10, 20, 30, etc.

CREATE TABLE AS SELECT (CTAS)

Create tables populated with query results:

CREATE TABLE prod.db.sample
USING iceberg
AS SELECT id, data FROM source;

CTAS is atomic when using SparkCatalog, but not atomic when using SparkSessionCatalog.

REPLACE TABLE AS SELECT (RTAS)

Atomically replace table contents while preserving history:

REPLACE TABLE prod.db.sample
USING iceberg
AS SELECT id, data FROM source;

The schema and partition spec will be replaced if changed. To avoid modifying the schema, use INSERT OVERWRITE instead.

DROP TABLE

Drop behavior changed in Iceberg 0.14:

Before 0.14: DROP TABLE deleted table metadata and contents
From 0.14: DROP TABLE only removes from catalog; use DROP TABLE PURGE to delete contents

Remove from Catalog Only

DROP TABLE prod.db.sample;

Remove from Catalog and Delete Contents

DROP TABLE prod.db.sample PURGE;

ALTER TABLE

Iceberg provides full ALTER TABLE support in Spark 3:

Rename tables
Set or remove table properties
Add, delete, and rename columns
Add, delete, and rename nested fields
Reorder columns
Widen numeric types
Change column nullability

Rename Table

ALTER TABLE prod.db.sample RENAME TO prod.db.new_name;

Table Properties

ALTER TABLE prod.db.sample SET TBLPROPERTIES (
    'read.split.target-size'='268435456'
);

Add Columns

ALTER TABLE prod.db.sample
ADD COLUMNS (
    new_column string COMMENT 'new column docs'
);

For arrays and maps, use element and value keywords to access nested columns:

ADD COLUMN points.element.z double - Add field to array element
ADD COLUMN points.value.b int - Add field to map value

Rename Columns

ALTER TABLE prod.db.sample RENAME COLUMN data TO payload;
ALTER TABLE prod.db.sample RENAME COLUMN location.lat TO latitude;

Nested rename only affects the leaf field. Renaming location.lat to latitude results in location.latitude.

Alter Column Type and Properties

Safe type widening is supported:

ALTER TABLE prod.db.sample 
ALTER COLUMN measurement TYPE double;

Safe type conversions:

int → bigint
float → double
decimal(P,S) → decimal(P2,S) where P2 > P

Cannot change nullable column to non-nullable with SET NOT NULL
Use ADD COLUMN and DROP COLUMN for struct type changes

Drop Columns

ALTER TABLE prod.db.sample DROP COLUMN id;
ALTER TABLE prod.db.sample DROP COLUMN point.z;

SQL Extensions

The following commands require Iceberg SQL extensions.

Partition Evolution

ALTER TABLE prod.db.sample 
ADD PARTITION FIELD catalog;

ALTER TABLE prod.db.sample 
ADD PARTITION FIELD bucket(16, id);

ALTER TABLE prod.db.sample 
ADD PARTITION FIELD bucket(16, id) AS shard;

Dynamic partition overwrite behavior changes when partitioning changes. For example, moving from daily to hourly partitions will cause overwrites to affect hourly partitions instead of daily ones.

Write Ordering

Configure automatic data sorting for writes:

ALTER TABLE prod.db.sample 
WRITE ORDERED BY category, id;

ALTER TABLE prod.db.sample 
WRITE ORDERED BY category ASC NULLS LAST, id DESC;

Identifier Fields

ALTER TABLE prod.db.sample 
SET IDENTIFIER FIELDS id;

ALTER TABLE prod.db.sample 
SET IDENTIFIER FIELDS id, data;

Identifier fields must be NOT NULL columns. Setting identifier fields enables Flink upsert operations.

Branching and Tagging

ALTER TABLE prod.db.sample 
CREATE BRANCH `audit-branch`;

ALTER TABLE prod.db.sample 
CREATE BRANCH IF NOT EXISTS `audit-branch`;

ALTER TABLE prod.db.sample 
CREATE BRANCH `audit-branch` AS OF VERSION 1234
RETAIN 30 DAYS 
WITH SNAPSHOT RETENTION 3 SNAPSHOTS 2 DAYS;

Iceberg Views

Iceberg views require Spark 3.4+.

Create View

CREATE VIEW view_name 
AS SELECT * FROM table_name;

CREATE VIEW IF NOT EXISTS view_name 
AS SELECT * FROM table_name;

Manage Views

CREATE OR REPLACE VIEW view_name
AS SELECT id FROM table_name;

Next Steps

Query Data

Learn about SELECT queries and time travel

Write Data

Master INSERT, MERGE, and UPDATE operations

Configuration

Configure Spark catalogs and options

Procedures

Maintain tables with stored procedures

Documentation Index

​Overview

​CREATE TABLE

​Table Options

​Partitioned Tables

​Partition Transforms

​CREATE TABLE AS SELECT (CTAS)

​REPLACE TABLE AS SELECT (RTAS)

​DROP TABLE

​Remove from Catalog Only

​Remove from Catalog and Delete Contents

​ALTER TABLE

​Rename Table

​Table Properties

​Add Columns

​Rename Columns

​Alter Column Type and Properties

​Drop Columns

​SQL Extensions

​Partition Evolution

​Write Ordering

​Identifier Fields

​Branching and Tagging

​Iceberg Views

​Create View

​Manage Views

​Next Steps

Query Data

Write Data

Configuration

Procedures

Overview

CREATE TABLE

Table Options

Partitioned Tables

Partition Transforms

CREATE TABLE AS SELECT (CTAS)

REPLACE TABLE AS SELECT (RTAS)

DROP TABLE

Remove from Catalog Only

Remove from Catalog and Delete Contents

ALTER TABLE

Rename Table

Table Properties

Add Columns

Rename Columns

Alter Column Type and Properties

Drop Columns

SQL Extensions

Partition Evolution

Write Ordering

Identifier Fields

Branching and Tagging

Iceberg Views

Create View

Manage Views

Next Steps