Documentation Index
Fetch the complete documentation index at: https://mintlify.com/apache/iceberg/llms.txt
Use this file to discover all available pages before exploring further.
DeleteOrphanFiles
TheDeleteOrphanFiles action identifies and deletes orphan files in a table that are not reachable by any valid snapshot. This is essential for reclaiming storage space from failed writes and other operations.
Interface
Overview
Orphan files can accumulate in a table for several reasons:- Failed write operations that didn’t commit
- Interrupted jobs that wrote data but didn’t create snapshots
- Files from unsuccessful transactions
- Leftover files from testing or development
- Lists all files in table storage
- Identifies files not referenced by any snapshot
- Safely deletes files older than a safety threshold
- Can process both data files and metadata files
Methods
location
Specify a location to scan for orphan files.location- The path to scan for orphan files
this for method chaining
Example:
If not set, the root table location will be scanned, potentially removing both orphan data and metadata files.
olderThan
Only delete files older than the specified timestamp.olderThanTimestamp- Timestamp in milliseconds (fromSystem.currentTimeMillis())
this for method chaining
Example:
Defaults to 3 days ago if not specified. This safety measure prevents deleting files from concurrent operations.
deleteWith
Provide a custom delete function.deleteFunc- A function that accepts file paths to delete
this for method chaining
Example:
executeDeleteWith
Provide an executor service for parallel deletion.executorService- The executor service for parallel deletes
this for method chaining
Only used if a custom delete function is provided or the FileIO doesn’t support bulk deletes.
prefixMismatchMode
Control how to handle files with mismatched authority/scheme.newPrefixMismatchMode- Mode for handling prefix mismatches
this for method chaining
Modes:
ERROR(default) - Throw an exception on mismatchIGNORE- Skip files with mismatchesDELETE- Consider mismatched files as orphans
equalSchemes
Define schemes that should be considered equivalent.newEqualSchemes- Map of equivalent scheme groups
this for method chaining
Example:
equalAuthorities
Define authorities that should be considered equivalent.newEqualAuthorities- Map of equivalent authority groups
this for method chaining
Example:
Result
TheResult interface provides information about deleted files.
Methods
Usage Examples
Basic Orphan File Deletion
Custom Time Threshold
Preview Mode
Specific Location
Handle Scheme Mismatches
With Progress Tracking
Safety Considerations
Best Practices
- Run during maintenance windows: Minimize concurrent activity
- Use conservative time thresholds: 7+ days for production tables
- Preview before deleting: Always run in preview mode first
- Schedule regular cleanup: Run periodically to prevent accumulation
- Monitor storage savings: Track the result to measure impact
- Document scheme equivalences: Maintain a record of equal schemes/authorities
Performance Considerations
Costs
- Lists all files in the specified location (expensive for large tables)
- Requires reading table metadata
- May require multiple API calls to cloud storage
Optimization Tips
- Use
location()to limit scope to specific directories - Run during off-peak hours
- Consider parallel execution for very large tables
- Use bulk delete APIs when available
When to Run
Run DeleteOrphanFiles when:- After failed operations: Jobs that crashed or were cancelled
- Storage costs are high: Significant orphan file accumulation
- After major migrations: Moving or restructuring tables
- During maintenance: Regular cleanup schedules
- Before decommissioning: Final cleanup before table removal
Related
- ExpireSnapshots - Remove old snapshots and their files
- RewriteDataFiles - Optimize data file layout
- File Management - Understanding Iceberg file organization