AppendFiles

The AppendFiles interface provides an API for appending new data files to an Iceberg table.

Overview

AppendFiles accumulates file additions, produces a new snapshot of the table, and commits that snapshot as the current. This is the primary interface for adding new data to a table.

Interface

public interface AppendFiles extends SnapshotUpdate<AppendFiles>

Core Methods

appendFile()

Appends a data file to the table.

AppendFiles appendFile(DataFile file)

Parameters:

file - A data file to append

Returns: This for method chaining Example:

DataFile dataFile = DataFiles.builder(spec)
    .withPath("/path/to/data.parquet")
    .withFileSizeInBytes(1024)
    .withRecordCount(100)
    .build();

table.newAppend()
    .appendFile(dataFile)
    .commit();

appendManifest()

Appends a manifest file to the table.

AppendFiles appendManifest(ManifestFile file)

Parameters:

file - A manifest file containing only appended files

Returns: This for method chaining Description: The manifest must contain only appended files. All files in the manifest will be appended to the table in the snapshot created by this update. The manifest will be used directly if snapshot ID inheritance is enabled (format version > 1 or explicitly enabled). Otherwise, it will be rewritten to assign all entries this update’s snapshot ID. Lifecycle Management:

If the manifest is rewritten, the caller must manage the lifecycle of the original manifest
If the manifest is used directly and the commit succeeds, it becomes part of table metadata
If the manifest gets merged with others, it will be deleted automatically on success
If the commit fails, the manifest is never deleted

Example:

ManifestFile manifest = ...; // Pre-created manifest

table.newAppend()
    .appendManifest(manifest)
    .commit();

Examples

Basic Append Operation

import org.apache.iceberg.Table;
import org.apache.iceberg.AppendFiles;
import org.apache.iceberg.DataFile;
import org.apache.iceberg.DataFiles;
import org.apache.iceberg.PartitionSpec;

// Create data file
PartitionSpec spec = table.spec();
DataFile file = DataFiles.builder(spec)
    .withPath("/data/2024/01/data-001.parquet")
    .withFileSizeInBytes(10485760)
    .withRecordCount(50000)
    .withPartitionPath("date=2024-01-15")
    .build();

// Append to table
AppendFiles append = table.newAppend();
append.appendFile(file)
      .commit();

System.out.println("Appended file to snapshot: " + 
    table.currentSnapshot().snapshotId());

Appending Multiple Files

import java.util.List;
import java.util.ArrayList;

// Collect multiple data files
List<DataFile> dataFiles = new ArrayList<>();

for (String path : filePaths) {
    DataFile file = DataFiles.builder(spec)
        .withPath(path)
        .withFileSizeInBytes(getFileSize(path))
        .withRecordCount(getRecordCount(path))
        .build();
    dataFiles.add(file);
}

// Append all files in single transaction
AppendFiles append = table.newAppend();
for (DataFile file : dataFiles) {
    append.appendFile(file);
}
append.commit();

System.out.println("Appended " + dataFiles.size() + " files");

Append with Metrics

import org.apache.iceberg.Metrics;
import org.apache.iceberg.types.Types;
import java.nio.ByteBuffer;
import java.util.Map;
import java.util.HashMap;

// Create data file with metrics
Map<Integer, Long> valueCounts = new HashMap<>();
Map<Integer, Long> nullValueCounts = new HashMap<>();
Map<Integer, ByteBuffer> lowerBounds = new HashMap<>();
Map<Integer, ByteBuffer> upperBounds = new HashMap<>();

valueCounts.put(1, 50000L);      // id column
nullValueCounts.put(1, 0L);
lowerBounds.put(1, longToBuffer(1L));
upperBounds.put(1, longToBuffer(50000L));

Metrics metrics = new Metrics(
    50000L,                    // row count
    null,                      // column sizes
    valueCounts,               // value counts
    nullValueCounts,           // null value counts
    null,                      // nan value counts  
    lowerBounds,               // lower bounds
    upperBounds                // upper bounds
);

DataFile file = DataFiles.builder(spec)
    .withPath("/data/file.parquet")
    .withFileSizeInBytes(10485760)
    .withMetrics(metrics)
    .build();

table.newAppend()
    .appendFile(file)
    .commit();

Partitioned Append

import org.apache.iceberg.PartitionData;

// Create partition spec
PartitionSpec spec = table.spec();

// Create partition data
PartitionData partition = new PartitionData(spec.partitionType());
partition.put(0, "2024-01-15");  // date partition

// Create data file with partition
DataFile file = DataFiles.builder(spec)
    .withPath("/data/date=2024-01-15/data-001.parquet")
    .withFileSizeInBytes(10485760)
    .withRecordCount(50000)
    .withPartition(partition)
    .build();

table.newAppend()
    .appendFile(file)
    .commit();

Append with Snapshot Properties

import org.apache.iceberg.SnapshotSummary;

// Append with custom snapshot properties
AppendFiles append = table.newAppend();

for (DataFile file : dataFiles) {
    append.appendFile(file);
}

append.set("spark.app.id", "application_123")
      .set("written-by", "ETL Pipeline v2.0")
      .commit();

// Check snapshot properties
Snapshot snapshot = table.currentSnapshot();
Map<String, String> summary = snapshot.summary();
System.out.println("Written by: " + summary.get("written-by"));

Appending Manifest Files

import org.apache.iceberg.ManifestFile;
import org.apache.iceberg.ManifestFiles;
import org.apache.iceberg.io.OutputFile;

// Create manifest with multiple data files
OutputFile manifestOutput = table.io()
    .newOutputFile("/metadata/manifest-001.avro");

ManifestWriter<DataFile> writer = ManifestFiles.write(
    table.spec(),
    manifestOutput
);

try {
    for (DataFile file : dataFiles) {
        writer.add(file);
    }
} finally {
    writer.close();
}

ManifestFile manifest = writer.toManifestFile();

// Append the manifest
table.newAppend()
    .appendManifest(manifest)
    .commit();

Atomic Multi-File Append

import org.apache.iceberg.exceptions.CommitFailedException;

try {
    AppendFiles append = table.newAppend();
    
    // Add all files
    for (DataFile file : newDataFiles) {
        append.appendFile(file);
    }
    
    // Commit atomically
    append.commit();
    
    System.out.println("Successfully appended " + newDataFiles.size() + " files");
    
} catch (CommitFailedException e) {
    // Handle commit failure - no files were added
    System.err.println("Append failed: " + e.getMessage());
    // Retry or handle error
}

Append with Validation

import org.apache.iceberg.FileFormat;

// Validate and append files
AppendFiles append = table.newAppend();

for (DataFile file : dataFiles) {
    // Validate file
    if (file.recordCount() == 0) {
        System.err.println("Skipping empty file: " + file.path());
        continue;
    }
    
    if (file.format() != FileFormat.PARQUET) {
        System.err.println("Skipping non-Parquet file: " + file.path());
        continue;
    }
    
    append.appendFile(file);
}

append.commit();

Incremental Append Pattern

class IncrementalAppender {
    private final Table table;
    private final List<DataFile> buffer = new ArrayList<>();
    private static final int BATCH_SIZE = 100;
    
    public IncrementalAppender(Table table) {
        this.table = table;
    }
    
    public void addFile(DataFile file) {
        buffer.add(file);
        
        if (buffer.size() >= BATCH_SIZE) {
            flush();
        }
    }
    
    public void flush() {
        if (buffer.isEmpty()) {
            return;
        }
        
        AppendFiles append = table.newAppend();
        for (DataFile file : buffer) {
            append.appendFile(file);
        }
        
        append.commit();
        
        System.out.println("Flushed " + buffer.size() + " files");
        buffer.clear();
    }
    
    public void close() {
        flush();
    }
}

// Usage
IncrementalAppender appender = new IncrementalAppender(table);
try {
    for (DataFile file : streamOfFiles) {
        appender.addFile(file);
    }
} finally {
    appender.close();
}

Commit Behavior

When committing, changes are applied to the latest table snapshot:

Conflict Resolution: Commit conflicts are resolved by applying changes to the new latest snapshot and reattempting the commit
Atomicity: All files are added atomically - either all succeed or none are added
Snapshot Creation: A new snapshot is created containing all existing data plus the appended files

Inherited Methods

From SnapshotUpdate:

commit() - Commits the changes and creates a new snapshot
set(String key, String value) - Sets a summary property
deleteWith(Consumer<String> deleteFunc) - Sets a delete callback

Core API

Catalog API

Scan API

Write API

Actions API

REST Catalog API

Types & Expressions

Overview

Interface

Core Methods

appendFile()

appendManifest()

Examples

Basic Append Operation

Appending Multiple Files

Append with Metrics

Partitioned Append

Append with Snapshot Properties

Appending Manifest Files

Atomic Multi-File Append

Append with Validation

Incremental Append Pattern

Commit Behavior

Inherited Methods

See Also

​Overview

​Interface

​Core Methods

​appendFile()

​appendManifest()

​Examples

​Basic Append Operation

​Appending Multiple Files

​Append with Metrics

​Partitioned Append

​Append with Snapshot Properties

​Appending Manifest Files

​Atomic Multi-File Append

​Append with Validation

​Incremental Append Pattern

​Commit Behavior

​Inherited Methods

​See Also

Overview

Interface

Core Methods

appendFile()

appendManifest()

Examples

Basic Append Operation

Appending Multiple Files

Append with Metrics

Partitioned Append

Append with Snapshot Properties

Appending Manifest Files

Atomic Multi-File Append

Append with Validation

Incremental Append Pattern

Commit Behavior

Inherited Methods

See Also