Schema Class

The Schema class represents the structure of a data table in Apache Iceberg. It defines the columns, their types, and identifier fields. Package: org.apache.iceberg

Overview

The Schema class provides:

Column definitions with field IDs and types
Nested field support through struct types
Schema evolution tracking with schema IDs
Identifier field management (similar to primary keys)
Field lookup by name or ID
Schema projection and selection

Schema IDs are only populated when reading from/writing to table metadata. Otherwise, the schema ID defaults to 0.

Constructors

Schema(List<NestedField> columns)

public Schema(List<NestedField> columns)

Creates a new schema with the given columns.

columns

List<NestedField>

required

List of top-level columns

Schema(List<NestedField> columns, Set<Integer> identifierFieldIds)

public Schema(List<NestedField> columns, Set<Integer> identifierFieldIds)

Creates a new schema with columns and identifier fields.

columns

List<NestedField>

required

List of top-level columns

identifierFieldIds

Set<Integer>

required

Set of field IDs that form the identifier

Schema(int schemaId, List<NestedField> columns, Set<Integer> identifierFieldIds)

public Schema(int schemaId, List<NestedField> columns, Set<Integer> identifierFieldIds)

Creates a new schema with a specific schema ID.

schemaId

int

required

The schema ID

columns

List<NestedField>

required

List of top-level columns

identifierFieldIds

Set<Integer>

required

Set of field IDs that form the identifier

Basic Methods

schemaId()

public int schemaId()

Returns the schema ID for this schema. Returns: The schema ID (defaults to 0 if not set)

columns()

public List<NestedField> columns()

Returns a list of the columns (top-level fields) in this schema. Returns: List of NestedField objects

asStruct()

public StructType asStruct()

Returns the underlying struct type for this schema. Returns: The StructType representation of this schema

highestFieldId()

public int highestFieldId()

Returns the highest field ID in this schema, including nested fields. Returns: The highest field ID

Field Lookup

findField(int id)

public NestedField findField(int id)

Returns the field identified by the given field ID.

int

required

The field ID to look up

Returns: The field with the given ID, or null if not found Example:

NestedField field = schema.findField(1);
if (field != null) {
    System.out.println("Field name: " + field.name());
}

findField(String name)

public NestedField findField(String name)

Returns a field by name. The name may be a top-level or nested field.

name

String

required

The field name (e.g., “id” or “user.email”)

Returns: The field with the given name, or null if not found Example:

NestedField userEmail = schema.findField("user.email");

caseInsensitiveFindField(String name)

public NestedField caseInsensitiveFindField(String name)

Returns a field by name using case-insensitive matching.

name

String

required

The field name (case-insensitive)

Returns: The field with the given name, or null if not found

findType(String name)

public Type findType(String name)

Returns the type of a field identified by name.

name

String

required

The field name

Returns: The field’s Type, or null if not found

findType(int id)

public Type findType(int id)

Returns the type of a field identified by field ID.

int

required

The field ID

Returns: The field’s Type, or null if not found

findColumnName(int id)

public String findColumnName(int id)

Returns the full column name for the given field ID.

int

required

The field ID

Returns: The full column name (e.g., “user.email”), or null if not found

idToName()

public Map<Integer, String> idToName()

Returns a map of field IDs to qualified field names. Returns: Map of field ID to qualified field name

Identifier Fields

Identifier fields in Iceberg are similar to primary keys in relational databases. They consist of a unique set of primitive fields that should uniquely identify a row. However, Iceberg does not enforce uniqueness.

identifierFieldIds()

public Set<Integer> identifierFieldIds()

Returns the set of identifier field IDs. Returns: Set of field IDs that form the identifier Identifier Field Rules:

Must be primitive types (not structs, lists, or maps)
Must be required fields (not optional)
Cannot be float or double types
Must be at root level or nested in a chain of required structs
Can include nested fields (e.g., “user.last_name”)

identifierFieldNames()

public Set<String> identifierFieldNames()

Returns the set of identifier field names. Returns: Set of qualified field names that form the identifier Example:

Set<String> identifierNames = schema.identifierFieldNames();
// Might return: ["user_id", "timestamp"]

Schema Projection

select(String… names)

public Schema select(String... names)

Creates a projection schema for a subset of columns.

names

String...

required

Column names to select

Returns: A projection schema containing only the selected columns Example:

Schema projection = schema.select("id", "name", "user.email");

select(Collection<String> names)

public Schema select(Collection<String> names)

Creates a projection schema for a subset of columns.

names

Collection<String>

required

Column names to select

Returns: A projection schema containing only the selected columns

caseInsensitiveSelect(String… names)

public Schema caseInsensitiveSelect(String... names)

Creates a projection schema using case-insensitive column matching.

names

String...

required

Column names to select (case-insensitive)

Returns: A projection schema containing only the selected columns

caseInsensitiveSelect(Collection<String> names)

public Schema caseInsensitiveSelect(Collection<String> names)

Creates a projection schema using case-insensitive column matching.

names

Collection<String>

required

Column names to select (case-insensitive)

Returns: A projection schema containing only the selected columns

Schema Comparison

sameSchema(Schema anotherSchema)

public boolean sameSchema(Schema anotherSchema)

Checks whether this schema is equivalent to another schema while ignoring the schema ID.

anotherSchema

Schema

required

The schema to compare with

Returns: True if the schemas are equivalent (same structure and identifier fields)

Aliases

getAliases()

public Map<String, Integer> getAliases()

Returns the alias map for this schema, if set. Alias maps are created when translating external schemas (like Avro) to Iceberg format. Returns: Map of column aliases to field IDs, or null if no aliases

aliasToId(String alias)

public Integer aliasToId(String alias)

Returns the column ID for the given column alias.

alias

String

required

A full column name from the unconverted schema

Returns: The column ID, or null if the alias doesn’t exist

idToAlias(Integer fieldId)

public String idToAlias(Integer fieldId)

Returns the full column name in the unconverted schema for the given column ID.

fieldId

Integer

required

A column ID in this schema

Returns: The column alias, or null if not found

Data Access

accessorForField(int id)

public Accessor<StructLike> accessorForField(int id)

Returns an accessor for retrieving data from StructLike rows.

int

required

The field ID

Returns: An Accessor to retrieve values from a StructLike row

Accessors do not retrieve data contained in lists or maps.

Static Methods

checkCompatibility(Schema schema, int formatVersion)

public static void checkCompatibility(Schema schema, int formatVersion)

Checks the compatibility of the schema with a format version. This validates that the schema does not contain types released in later format versions.

schema

Schema

required

The schema to check

formatVersion

int

required

The table format version

Throws: IllegalStateException if the schema is incompatible

indexFields(Collection<Schema> schemas)

public static Map<Integer, NestedField> indexFields(Collection<Schema> schemas)

Indexes all fields from multiple schemas. This method favors field definitions from higher schema IDs to handle type promotions.

schemas

Collection<Schema>

required

The collection of schemas to index

Returns: Map of field IDs to fields

Usage Examples

Creating a Schema

import org.apache.iceberg.Schema;
import org.apache.iceberg.types.Types;
import static org.apache.iceberg.types.Types.NestedField.*;

Schema schema = new Schema(
    required(1, "id", Types.LongType.get()),
    optional(2, "data", Types.StringType.get()),
    required(3, "timestamp", Types.TimestampType.withZone()),
    optional(4, "user", Types.StructType.of(
        required(5, "name", Types.StringType.get()),
        optional(6, "email", Types.StringType.get())
    ))
);

Creating a Schema with Identifier Fields

import com.google.common.collect.ImmutableSet;

Schema schema = new Schema(
    Arrays.asList(
        required(1, "user_id", Types.LongType.get()),
        required(2, "event_id", Types.LongType.get()),
        optional(3, "event_type", Types.StringType.get()),
        required(4, "timestamp", Types.TimestampType.withZone())
    ),
    ImmutableSet.of(1, 2)  // user_id and event_id form the identifier
);

Looking Up Fields

Schema schema = ...;

// By name
NestedField field = schema.findField("user.email");
if (field != null) {
    System.out.println("Type: " + field.type());
}

// By ID
NestedField fieldById = schema.findField(1);
String fieldName = schema.findColumnName(1);

// Get type directly
Type type = schema.findType("timestamp");

Schema Projection

Schema schema = ...;

// Select specific columns
Schema projection = schema.select("id", "timestamp", "user.name");

// Select all columns using wildcard
Schema allColumns = schema.select("*");

// Case-insensitive selection
Schema caseInsensitiveProj = schema.caseInsensitiveSelect("ID", "TIMESTAMP");

Working with Identifier Fields

Schema schema = ...;

// Get identifier field IDs
Set<Integer> identifierIds = schema.identifierFieldIds();
System.out.println("Identifier field IDs: " + identifierIds);

// Get identifier field names
Set<String> identifierNames = schema.identifierFieldNames();
System.out.println("Identifier fields: " + identifierNames);

Comparing Schemas

Schema schema1 = ...;
Schema schema2 = ...;

// Check if schemas are equivalent (ignoring schema ID)
if (schema1.sameSchema(schema2)) {
    System.out.println("Schemas are equivalent");
}

// Exact equality (including schema ID)
if (schema1.equals(schema2)) {
    System.out.println("Schemas are identical");
}

Source Code Reference

Source: org/apache/iceberg/Schema.java:56

Documentation Index

​Schema Class

​Overview

​Constructors

​Schema(List<NestedField> columns)

​Schema(List<NestedField> columns, Set<Integer> identifierFieldIds)

​Schema(int schemaId, List<NestedField> columns, Set<Integer> identifierFieldIds)

​Basic Methods

​schemaId()

​columns()

​asStruct()

​highestFieldId()

​Field Lookup

​findField(int id)

​findField(String name)

​caseInsensitiveFindField(String name)

​findType(String name)

​findType(int id)

​findColumnName(int id)

​idToName()

​Identifier Fields

​identifierFieldIds()

​identifierFieldNames()

​Schema Projection

​select(String… names)

​select(Collection<String> names)

​caseInsensitiveSelect(String… names)

​caseInsensitiveSelect(Collection<String> names)

​Schema Comparison

​sameSchema(Schema anotherSchema)

​Aliases

​getAliases()

​aliasToId(String alias)

​idToAlias(Integer fieldId)

​Data Access

​accessorForField(int id)

​Static Methods

​checkCompatibility(Schema schema, int formatVersion)

​indexFields(Collection<Schema> schemas)

​Usage Examples

​Creating a Schema

​Creating a Schema with Identifier Fields

​Looking Up Fields

​Schema Projection

​Working with Identifier Fields

​Comparing Schemas

​Source Code Reference

Schema Class

Overview

Constructors

Schema(List<NestedField> columns)

Schema(List<NestedField> columns, Set<Integer> identifierFieldIds)

Schema(int schemaId, List<NestedField> columns, Set<Integer> identifierFieldIds)

Basic Methods

schemaId()

columns()

asStruct()

highestFieldId()

Field Lookup

findField(int id)

findField(String name)

caseInsensitiveFindField(String name)

findType(String name)

findType(int id)

findColumnName(int id)

idToName()

Identifier Fields

identifierFieldIds()

identifierFieldNames()

Schema Projection

select(String… names)

select(Collection<String> names)

caseInsensitiveSelect(String… names)

caseInsensitiveSelect(Collection<String> names)

Schema Comparison

sameSchema(Schema anotherSchema)

Aliases

getAliases()

aliasToId(String alias)

idToAlias(Integer fieldId)

Data Access

accessorForField(int id)

Static Methods

checkCompatibility(Schema schema, int formatVersion)

indexFields(Collection<Schema> schemas)

Usage Examples

Creating a Schema

Creating a Schema with Identifier Fields

Looking Up Fields

Schema Projection

Working with Identifier Fields

Comparing Schemas

Source Code Reference