ConvertEqualityDeleteFiles
TheConvertEqualityDeleteFiles action converts equality delete files to position delete files. This optimization can improve query performance by using more efficient position-based deletes.
Interface
Overview
Equality delete files identify deleted rows by column values, while position delete files identify them by file and row position. Position deletes are typically more efficient for queries because:- They require less computation to apply
- They have a smaller storage footprint
- They can be applied more quickly during scans
- Reading the equality delete conditions
- Identifying matching row positions in data files
- Writing new position delete files
- Removing the original equality delete files
Methods
filter
Filter which equality delete files to convert based on partition values.expression- An Iceberg expression used to find deletes. The filter will be converted to a partition filter with an inclusive projection.
this for method chaining
Example:
Any file that may contain rows matching this filter will be included in the conversion. The filter uses inclusive projection for partition-level matching.
Result
TheResult interface provides statistics about the conversion operation.
Methods
convertedEqualityDeleteFilesCount
Returns the count of equality delete files that were converted. Returns:int - Number of converted files
addedPositionDeleteFilesCount
Returns the count of position delete files that were created. Returns:int - Number of new position delete files
Usage Examples
Convert All Equality Deletes
Convert Deletes in Specific Partitions
Convert Deletes with Multiple Filters
Monitor Conversion Progress
Best Practices
- Run after delete operations: Convert equality deletes after batch delete operations complete
- Use partition filters: For large tables, convert deletes partition by partition to manage resource usage
- Monitor file counts: Track the conversion ratio to understand delete file characteristics
- Combine with compaction: Consider running alongside data file compaction for comprehensive optimization
- Test impact: Measure query performance improvements after conversion
Performance Considerations
- Read overhead: The action must read data files to determine row positions
- Write amplification: Multiple position delete files may be created from a single equality delete file
- Query improvement: Queries typically benefit from faster delete application
Related
- RewritePositionDeleteFiles - Optimize position delete files
- RemoveDanglingDeleteFiles - Remove obsolete delete files
- RewriteDataFiles - Optimize data file layout