Overview
Spark is currently the most feature-rich compute engine for Iceberg operations. Apache Iceberg uses Spark’s DataSourceV2 API for data source and catalog implementations, providing comprehensive support for table management, queries, and writes.Key Features
Full DDL Support
Create, alter, and manage Iceberg tables with complete SQL DDL operations
Advanced Queries
Time travel, metadata tables, and efficient scan planning
Row-Level Operations
MERGE INTO, UPDATE, and DELETE operations for data modification
Streaming Support
Structured Streaming reads and writes with incremental processing
Compatibility
Iceberg integrates with Apache Spark through the DataSourceV2 API, with different levels of support across Spark versions:| Feature | Availability | Notes |
|---|---|---|
| SQL INSERT INTO | ✔️ All versions | Requires ANSI assignment policy (default since Spark 3.0) |
| SQL MERGE INTO | ✔️ All versions | Requires Iceberg Spark extensions |
| SQL DELETE FROM | ✔️ All versions | Row-level deletes require extensions |
| SQL UPDATE | ✔️ All versions | Requires Iceberg Spark extensions |
| DataFrame writes | ✔️ All versions | DataFrameWriterV2 API recommended |
| Structured Streaming | ✔️ All versions | Append and complete modes |
Type Compatibility
Iceberg automatically converts between Spark and Iceberg types:Spark to Iceberg Type Mapping
| Spark Type | Iceberg Type | Notes |
|---|---|---|
| boolean | boolean | |
| byte, short, integer | integer | Promotion supported |
| long | long | |
| float | float | |
| double | double | |
| decimal | decimal | |
| date | date | |
| timestamp | timestamp with timezone | |
| timestamp_ntz | timestamp without timezone | |
| string, char, varchar | string | |
| binary | binary | Can write to fixed type with length assertion |
| struct | struct | |
| array | list | |
| map | map |
Iceberg to Spark Type Mapping
| Iceberg Type | Spark Type | Supported |
|---|---|---|
| boolean | boolean | ✔️ |
| integer | integer | ✔️ |
| long | long | ✔️ |
| float | float | ✔️ |
| double | double | ✔️ |
| decimal | decimal | ✔️ |
| date | date | ✔️ |
| time | - | ❌ Not supported |
| timestamp with timezone | timestamp | ✔️ |
| timestamp without timezone | timestamp_ntz | ✔️ |
| string | string | ✔️ |
| uuid | string | ✔️ |
| fixed | binary | ✔️ |
| binary | binary | ✔️ |
| struct | struct | ✔️ |
| list | array | ✔️ |
| map | map | ✔️ |
| variant | variant | ✔️ (Spark 4.0+) |
| unknown | null | ✔️ (Spark 4.0+) |
Getting Started
Next Steps
Getting Started
Set up your first Iceberg table with Spark
DDL Operations
Learn about CREATE, ALTER, and DROP commands
Query Data
Execute queries and explore metadata tables
Write Data
Insert, update, and merge data into tables