Overview
To write to Iceberg tables in Spark, first configure Spark catalogs. Some write operations require Iceberg SQL extensions.Feature Support
| Feature | Spark Support | Notes |
|---|---|---|
| SQL INSERT INTO | ✔️ All versions | Requires spark.sql.storeAssignmentPolicy=ANSI |
| SQL MERGE INTO | ✔️ All versions | Requires Iceberg Spark extensions |
| SQL INSERT OVERWRITE | ✔️ All versions | Requires ANSI assignment policy |
| SQL DELETE FROM | ✔️ All versions | Row-level deletes require extensions |
| SQL UPDATE | ✔️ All versions | Requires Iceberg Spark extensions |
| DataFrame writes | ✔️ All versions | DataFrameWriterV2 recommended |
| DataFrame CTAS/RTAS | ✔️ All versions | Requires DSv2 API |
| DataFrame MERGE | ✔️ Spark 4.0+ | Requires DSv2 API |
Writing with SQL
INSERT INTO
Append new data to a table:MERGE INTO
Perform upserts and conditional updates:MERGE INTO is recommended over INSERT OVERWRITE because Iceberg only replaces affected data files, and dynamic overwrite behavior can change if partitioning changes.
Snapshot Summary (Spark 4.1+)
MERGE INTO operations provide detailed metrics in snapshot summaries:| Metric | Description |
|---|---|
spark.merge-into.num-target-rows-copied | Rows copied without modification |
spark.merge-into.num-target-rows-deleted | Rows deleted |
spark.merge-into.num-target-rows-updated | Rows updated |
spark.merge-into.num-target-rows-inserted | Rows inserted |
spark.merge-into.num-target-rows-matched-updated | Rows updated by MATCHED clause |
spark.merge-into.num-target-rows-not-matched-by-source-deleted | Rows deleted by NOT MATCHED BY SOURCE |
INSERT OVERWRITE
Replace table or partition data:DELETE FROM
Remove rows matching a filter:If the filter matches entire partitions, Iceberg performs a metadata-only delete. Otherwise, only affected data files are rewritten.
UPDATE
Update rows matching a filter:Writing to Branches
Write to specific branches for Write-Audit-Publish (WAP) workflows:The branch must exist before writing. Use CREATE BRANCH to create branches.
Writing with DataFrames
The DataFrameWriterV2 API is recommended for DataFrame writes:Appending Data
Overwriting Data
Creating Tables
Merging Data (Spark 4.0+)
Writing to Branches
Schema Merge
Enable automatic schema evolution during writes:
Schema merge behavior:
- New columns in source are added to target (set to NULL for existing rows)
- Missing columns in source are set to NULL (insert) or unchanged (update)
Distribution Modes
Control how Spark distributes data before writing:| Mode | Description | Use Case |
|---|---|---|
hash (default) | Hash-based shuffle by partition values | General purpose, balanced performance |
range | Range-based shuffle with global sorting | Better read performance, higher write cost |
none | No automatic distribution | Manual control, requires sorting |
The
hash distribution mode (new default in Iceberg 1.2.0) automatically sorts data by partition value, eliminating the need for manual sorting in most cases.Controlling File Sizes
Target File Size
Control output file sizes with table properties:Spark Task Size
Adjust Spark’s Adaptive Query Execution (AQE) to control task sizes:The in-memory Spark row size is larger than on-disk columnar-compressed size. A larger AQE partition size than the target file size is typically needed.
Fanout Writers
For streaming workloads on partitioned tables:Write Options
Common write options for DataFrameWriterV2:| Option | Description | Default |
|---|---|---|
write-format | File format (parquet, avro, orc) | Table default |
target-file-size-bytes | Target output file size | Table property |
compression-codec | Compression codec | Table default |
distribution-mode | Distribution strategy | hash |
fanout-enabled | Enable fanout writer | false |
isolation-level | Isolation level for overwrites | null |
snapshot-property.<key> | Custom snapshot metadata | - |
Example with Options
Next Steps
Procedures
Maintain tables with compaction and cleanup
Configuration
Configure write performance and behavior
Structured Streaming
Stream data with continuous writes
DDL Operations
Create and alter table schemas