Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/apache/iceberg/llms.txt

Use this file to discover all available pages before exploring further.

DataFile

DataFile represents a physical data file tracked by an Iceberg table. It carries the file path, format, partition data, record counts, file size, and column-level metrics used during planning.

What it contains

  • File path and format
  • Partition values for the file
  • Record count and file size
  • Column metrics such as null counts and bounds

Common usage

You most often work with DataFile when you append new files, inspect scan tasks, or rewrite existing files as part of maintenance operations.
DataFile dataFile = DataFiles.builder(table.spec())
    .withPath("s3://warehouse/db/table/data.parquet")
    .withFileSizeInBytes(10_485_760L)
    .withRecordCount(100_000L)
    .build();