Skip to main content

Installation

The latest version of Iceberg is .

Using Spark Shell

To use Iceberg in a Spark shell, add the runtime JAR using the --packages option:
spark-shell --packages org.apache.iceberg:iceberg-spark-runtime-3.5:{{ icebergVersion }}
If you want to include Iceberg in your Spark installation permanently, add the iceberg-spark-runtime JAR to Spark’s jars folder.

Using Spark SQL

For Spark SQL with catalog configuration:
spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.5:{{ icebergVersion }} \
    --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
    --conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \
    --conf spark.sql.catalog.spark_catalog.type=hive \
    --conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog \
    --conf spark.sql.catalog.local.type=hadoop \
    --conf spark.sql.catalog.local.warehouse=$PWD/warehouse

Configuring Catalogs

Iceberg catalogs enable SQL commands to manage tables and load them by name. Configure catalogs using properties under spark.sql.catalog.(catalog_name).

Hadoop Catalog Example

Create a path-based catalog named local for tables under a warehouse directory:
spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.local.type=hadoop
spark.sql.catalog.local.warehouse=/path/to/warehouse

Hive Metastore Catalog

Configure a Hive-based catalog with session catalog support:
spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog
spark.sql.catalog.spark_catalog.type=hive

Creating Your First Table

1

Create a Simple Table

Use the CREATE TABLE command to create your first Iceberg table:
CREATE TABLE local.db.table (
    id bigint,
    data string
) USING iceberg;
2

Insert Data

Add data to your table using INSERT INTO:
INSERT INTO local.db.table 
VALUES (1, 'a'), (2, 'b'), (3, 'c');
3

Query Data

Read data from your table:
SELECT count(1) as count, data
FROM local.db.table
GROUP BY data;

Row-Level Updates

Iceberg adds row-level SQL updates to Spark:

MERGE INTO

Update existing rows and insert new ones in a single operation:
MERGE INTO local.db.table t 
USING (SELECT * FROM updates) u 
ON t.id = u.id
WHEN MATCHED THEN UPDATE SET t.data = u.data
WHEN NOT MATCHED THEN INSERT *;

DELETE FROM

Remove rows matching a condition:
DELETE FROM local.db.table 
WHERE ts < '2024-01-01';

Writing with DataFrames

Iceberg supports the v2 DataFrame write API for programmatic writes:
val data: DataFrame = ...
data.writeTo("local.db.table").append()
The old v1 write API is supported but not recommended. Use the v2 writeTo API instead.

Reading with DataFrames

Load tables by name using spark.table:
val df = spark.table("local.db.table")
df.count()

Inspecting Tables

Use metadata tables to inspect table history and snapshots:

View Snapshots

SELECT * FROM local.db.table.snapshots;
+-------------------------+----------------+-----------+-----------+----------------------------------------------------+
| committed_at            | snapshot_id    | parent_id | operation | manifest_list                                      |
+-------------------------+----------------+-----------+-----------+----------------------------------------------------+
| 2019-02-08 03:29:51.215 | 57897183625154 | null      | append    | s3://.../table/metadata/snap-57897183625154-1.avro |
+-------------------------+----------------+-----------+-----------+----------------------------------------------------+

View History

SELECT * FROM local.db.table.history;

View Files

SELECT * FROM local.db.table.files;

Type Conversion

Spark to Iceberg

When creating tables or writing data, Spark types are automatically converted:
Spark TypeIceberg TypeNotes
booleanboolean
byte, short, integerintegerPromoted to integer
longlong
floatfloat
doubledouble
decimaldecimal
timestamptimestamp with timezone
timestamp_ntztimestamp without timezone
string, char, varcharstring
binarybinaryAssertion on length for fixed type
Numeric types support promotion during writes. For example, you can write Spark integer to Iceberg long.

Iceberg to Spark

When reading from Iceberg tables:
Iceberg TypeSpark TypeSupported
timestamp with timezonetimestamp✔️
timestamp without timezonetimestamp_ntz✔️
uuidstring✔️
time-❌ Not supported
variantvariant✔️ Spark 4.0+
unknownnull✔️ Spark 4.0+

Next Steps

DDL Commands

Learn about CREATE, ALTER, and DROP operations

Query Data

Explore SELECT queries and metadata tables

Write Data

Master INSERT INTO and MERGE INTO operations

Procedures

Maintain tables with stored procedures