Overview
Iceberg uses Apache Spark’s DataSourceV2 API for catalog implementations. To use Iceberg DDL commands, first configure Spark catalogs.CREATE TABLE
Spark 3+ can create tables in any Iceberg catalog using theUSING iceberg clause:
Table Options
Table create commands support the full range of Spark create clauses:PARTITIONED BY (partition-expressions)- Configure partitioningLOCATION '(fully-qualified-uri)'- Set table locationCOMMENT 'table documentation'- Add table descriptionTBLPROPERTIES ('key'='value', ...)- Set table configuration
CREATE TABLE ... LIKE ... syntax is not supported.Partitioned Tables
Create partitioned tables usingPARTITIONED BY:
Partition Transforms
Iceberg supports the following partition transforms:| Transform | Description | Example |
|---|---|---|
year(ts) | Partition by year | PARTITIONED BY (year(ts)) |
month(ts) | Partition by month | PARTITIONED BY (month(ts)) |
day(ts) or date(ts) | Partition by date | PARTITIONED BY (day(ts)) |
hour(ts) or date_hour(ts) | Partition by date and hour | PARTITIONED BY (hour(ts)) |
bucket(N, col) | Hash bucket (mod N) | PARTITIONED BY (bucket(16, id)) |
truncate(L, col) | Truncate to length L | PARTITIONED BY (truncate(10, data)) |
For strings,
truncate limits to the given length. For integers and longs, it creates bins: truncate(10, i) produces partitions 0, 10, 20, 30, etc.CREATE TABLE AS SELECT (CTAS)
Create tables populated with query results:REPLACE TABLE AS SELECT (RTAS)
Atomically replace table contents while preserving history:The schema and partition spec will be replaced if changed. To avoid modifying the schema, use
INSERT OVERWRITE instead.DROP TABLE
Drop behavior changed in Iceberg 0.14:
- Before 0.14:
DROP TABLEdeleted table metadata and contents - From 0.14:
DROP TABLEonly removes from catalog; useDROP TABLE PURGEto delete contents
Remove from Catalog Only
Remove from Catalog and Delete Contents
ALTER TABLE
Iceberg provides fullALTER TABLE support in Spark 3:
- Rename tables
- Set or remove table properties
- Add, delete, and rename columns
- Add, delete, and rename nested fields
- Reorder columns
- Widen numeric types
- Change column nullability
Rename Table
Table Properties
Add Columns
For arrays and maps, use
element and value keywords to access nested columns:ADD COLUMN points.element.z double- Add field to array elementADD COLUMN points.value.b int- Add field to map value
Rename Columns
Nested rename only affects the leaf field. Renaming
location.lat to latitude results in location.latitude.Alter Column Type and Properties
Safe type widening is supported:int→bigintfloat→doubledecimal(P,S)→decimal(P2,S)where P2 > P
Drop Columns
SQL Extensions
The following commands require Iceberg SQL extensions.Partition Evolution
Write Ordering
Configure automatic data sorting for writes:Identifier Fields
Identifier fields must be
NOT NULL columns. Setting identifier fields enables Flink upsert operations.Branching and Tagging
Iceberg Views
Iceberg views require Spark 3.4+.
Create View
Manage Views
Next Steps
Query Data
Learn about SELECT queries and time travel
Write Data
Master INSERT, MERGE, and UPDATE operations
Configuration
Configure Spark catalogs and options
Procedures
Maintain tables with stored procedures