ExpireSnapshots
TheExpireSnapshots action removes old snapshots from a table and deletes data files that are no longer needed. This is essential for managing storage costs and maintaining table performance.
Interface
Overview
The ExpireSnapshots action is similar to the table API’sExpireSnapshots operation but can leverage a query engine to distribute the deletion work. It safely removes:
- Old snapshot metadata
- Manifest files no longer referenced
- Data and delete files orphaned by expired snapshots
- Statistics files associated with expired snapshots
- Optionally, unused partition specs and schemas
Methods
expireSnapshotId
Expires a specific snapshot identified by its ID.snapshotId- The ID of the snapshot to expire
this for method chaining
Example:
expireOlderThan
Expires all snapshots older than the given timestamp.timestampMillis- A timestamp in milliseconds (fromSystem.currentTimeMillis())
this for method chaining
Example:
retainLast
Retains the most recent ancestors of the current snapshot, even if they would otherwise be expired.numSnapshots- The number of recent snapshots to retain
this for method chaining
Example:
Snapshots explicitly marked for expiration by ID will still be removed, even if they are among the most recent.
deleteWith
Provides a custom delete function for removing files.deleteFunc- A function that accepts file paths to delete
this for method chaining
Example:
executeDeleteWith
Provides an executor service for parallel file deletion.executorService- The executor service to use for parallel deletes
this for method chaining
This is only used if a custom delete function is provided or if the FileIO doesn’t support bulk deletes.
cleanExpiredMetadata
Enables removal of unused table metadata like partition specs and schemas.clean-trueto remove unused metadata,falseto keep it
this for method chaining
Example:
Result
TheResult interface provides statistics about the expiration operation.
Methods
Usage Examples
Basic Expiration
Retain Recent Snapshots
Expire Specific Snapshot
With Metadata Cleanup
Best Practices
- Set reasonable retention periods: Balance storage costs with the need for time travel
- Use retainLast() as a safety net: Always keep a minimum number of snapshots
- Monitor storage: Track the result metrics to understand storage savings
- Schedule regular expiration: Run this action periodically to prevent unbounded growth
- Test in development: Verify expiration behavior before running in production
Related
- DeleteOrphanFiles - Remove files not referenced by any snapshot
- RewriteDataFiles - Optimize data file layout
- Table API ExpireSnapshots - Low-level snapshot expiration