Documentation Index
Fetch the complete documentation index at: https://mintlify.com/apache/iceberg/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Iceberg provides stored procedures for table maintenance and management. Procedures are available when using Iceberg SQL extensions.In Spark 4.0+, procedures are supported natively but are case-sensitive.
Using Procedures
Call procedures from any configured catalog using theCALL statement:
Argument Passing
Snapshot Management
rollback_to_snapshot
Roll back a table to a specific snapshot:table(required) - Table namesnapshot_id(required) - Target snapshot ID
rollback_to_timestamp
Roll back to a snapshot at a specific time:table(required) - Table nametimestamp(required) - Target timestamp
set_current_snapshot
Set the current snapshot (not limited to ancestors):cherrypick_snapshot
Apply changes from a snapshot without removing the original:Only append and dynamic overwrite snapshots can be cherry-picked.
fast_forward
Fast-forward a branch to another branch’s head:Metadata Management
expire_snapshots
Remove old snapshots and unreferenced data files:table(required) - Table nameolder_than- Expiration timestamp (default: 5 days ago)retain_last- Minimum snapshots to keep (default: 1)max_concurrent_deletes- Thread pool size for deletionsstream_results- Stream results to prevent driver OOMsnapshot_ids- Specific snapshot IDs to expire
deleted_data_files_countdeleted_position_delete_files_countdeleted_equality_delete_files_countdeleted_manifest_files_countdeleted_manifest_lists_count
remove_orphan_files
Remove files not referenced in table metadata:table(required) - Table nameolder_than- Remove files older than this (default: 3 days ago)location- Specific directory to scandry_run- Preview without deleting (default: false)max_concurrent_deletes- Thread pool sizestream_results- Stream results to prevent OOM
rewrite_data_files
Compact small files and optimize data layout:target-file-size-bytes- Target output file size (default: 512 MB)min-file-size-bytes- Files below this are rewritten (default: 75% of target)max-file-size-bytes- Files above this are rewritten (default: 180% of target)min-input-files- Minimum files to trigger rewrite (default: 5)rewrite-all- Force rewrite all files (default: false)remove-dangling-deletes- Remove orphaned delete files (default: false)
rewrite_manifests
Optimize manifest files for better scan planning:rewrite_position_delete_files
Compact position delete files and remove dangling deletes:Table Migration
snapshot
Create a lightweight copy for testing:Snapshot tables share data files with the source table. Use
DROP TABLE to clean up when done testing.migrate
Replace a Hive/Spark table with an Iceberg table:table(required) - Table to migrateproperties- Properties for the new Iceberg tabledrop_backup- Don’t retain original table (default: false)backup_table_name- Custom backup name (default:table_BACKUP_)
add_files
Add files from external sources:register_table
Register an existing metadata file in a catalog:Change Data Capture
create_changelog_view
Create a view showing table changes:_change_type- INSERT, DELETE, UPDATE_BEFORE, UPDATE_AFTER_change_ordinal- Order of changes_commit_snapshot_id- Snapshot where change occurred
Table Statistics
compute_table_stats
Calculate NDV statistics for columns:compute_partition_stats
Compute partition statistics incrementally:Metadata Information
ancestors_of
Report snapshot ancestry:Best Practices
Regular Maintenance
Regular Maintenance
Run maintenance procedures on a schedule:
- Daily:
expire_snapshotsfor active tables - Weekly:
rewrite_data_filesfor frequently updated tables - Monthly:
remove_orphan_filesfor all tables
Streaming Tables
Streaming Tables
For tables with streaming writes:
- Use longer trigger intervals (1+ minutes)
- Regularly run
rewrite_data_filesto compact small files - Run
rewrite_manifeststo optimize metadata
Safe Orphan Removal
Safe Orphan Removal
Always use dry run first:
Next Steps
Writes
Learn about write operations and distribution
Configuration
Configure Spark for optimal performance
Queries
Query tables and inspect metadata
Structured Streaming
Maintain streaming tables