Skip to main content

Schema Class

The Schema class represents the structure of a data table in Apache Iceberg. It defines the columns, their types, and identifier fields. Package: org.apache.iceberg

Overview

The Schema class provides:
  • Column definitions with field IDs and types
  • Nested field support through struct types
  • Schema evolution tracking with schema IDs
  • Identifier field management (similar to primary keys)
  • Field lookup by name or ID
  • Schema projection and selection
Schema IDs are only populated when reading from/writing to table metadata. Otherwise, the schema ID defaults to 0.

Constructors

Schema(List<NestedField> columns)

public Schema(List<NestedField> columns)
Creates a new schema with the given columns.
columns
List<NestedField>
required
List of top-level columns

Schema(List<NestedField> columns, Set<Integer> identifierFieldIds)

public Schema(List<NestedField> columns, Set<Integer> identifierFieldIds)
Creates a new schema with columns and identifier fields.
columns
List<NestedField>
required
List of top-level columns
identifierFieldIds
Set<Integer>
required
Set of field IDs that form the identifier

Schema(int schemaId, List<NestedField> columns, Set<Integer> identifierFieldIds)

public Schema(int schemaId, List<NestedField> columns, Set<Integer> identifierFieldIds)
Creates a new schema with a specific schema ID.
schemaId
int
required
The schema ID
columns
List<NestedField>
required
List of top-level columns
identifierFieldIds
Set<Integer>
required
Set of field IDs that form the identifier

Basic Methods

schemaId()

public int schemaId()
Returns the schema ID for this schema. Returns: The schema ID (defaults to 0 if not set)

columns()

public List<NestedField> columns()
Returns a list of the columns (top-level fields) in this schema. Returns: List of NestedField objects

asStruct()

public StructType asStruct()
Returns the underlying struct type for this schema. Returns: The StructType representation of this schema

highestFieldId()

public int highestFieldId()
Returns the highest field ID in this schema, including nested fields. Returns: The highest field ID

Field Lookup

findField(int id)

public NestedField findField(int id)
Returns the field identified by the given field ID.
id
int
required
The field ID to look up
Returns: The field with the given ID, or null if not found Example:
NestedField field = schema.findField(1);
if (field != null) {
    System.out.println("Field name: " + field.name());
}

findField(String name)

public NestedField findField(String name)
Returns a field by name. The name may be a top-level or nested field.
name
String
required
The field name (e.g., “id” or “user.email”)
Returns: The field with the given name, or null if not found Example:
NestedField userEmail = schema.findField("user.email");

caseInsensitiveFindField(String name)

public NestedField caseInsensitiveFindField(String name)
Returns a field by name using case-insensitive matching.
name
String
required
The field name (case-insensitive)
Returns: The field with the given name, or null if not found

findType(String name)

public Type findType(String name)
Returns the type of a field identified by name.
name
String
required
The field name
Returns: The field’s Type, or null if not found

findType(int id)

public Type findType(int id)
Returns the type of a field identified by field ID.
id
int
required
The field ID
Returns: The field’s Type, or null if not found

findColumnName(int id)

public String findColumnName(int id)
Returns the full column name for the given field ID.
id
int
required
The field ID
Returns: The full column name (e.g., “user.email”), or null if not found

idToName()

public Map<Integer, String> idToName()
Returns a map of field IDs to qualified field names. Returns: Map of field ID to qualified field name

Identifier Fields

Identifier fields in Iceberg are similar to primary keys in relational databases. They consist of a unique set of primitive fields that should uniquely identify a row. However, Iceberg does not enforce uniqueness.

identifierFieldIds()

public Set<Integer> identifierFieldIds()
Returns the set of identifier field IDs. Returns: Set of field IDs that form the identifier Identifier Field Rules:
  • Must be primitive types (not structs, lists, or maps)
  • Must be required fields (not optional)
  • Cannot be float or double types
  • Must be at root level or nested in a chain of required structs
  • Can include nested fields (e.g., “user.last_name”)

identifierFieldNames()

public Set<String> identifierFieldNames()
Returns the set of identifier field names. Returns: Set of qualified field names that form the identifier Example:
Set<String> identifierNames = schema.identifierFieldNames();
// Might return: ["user_id", "timestamp"]

Schema Projection

select(String… names)

public Schema select(String... names)
Creates a projection schema for a subset of columns.
names
String...
required
Column names to select
Returns: A projection schema containing only the selected columns Example:
Schema projection = schema.select("id", "name", "user.email");

select(Collection<String> names)

public Schema select(Collection<String> names)
Creates a projection schema for a subset of columns.
names
Collection<String>
required
Column names to select
Returns: A projection schema containing only the selected columns

caseInsensitiveSelect(String… names)

public Schema caseInsensitiveSelect(String... names)
Creates a projection schema using case-insensitive column matching.
names
String...
required
Column names to select (case-insensitive)
Returns: A projection schema containing only the selected columns

caseInsensitiveSelect(Collection<String> names)

public Schema caseInsensitiveSelect(Collection<String> names)
Creates a projection schema using case-insensitive column matching.
names
Collection<String>
required
Column names to select (case-insensitive)
Returns: A projection schema containing only the selected columns

Schema Comparison

sameSchema(Schema anotherSchema)

public boolean sameSchema(Schema anotherSchema)
Checks whether this schema is equivalent to another schema while ignoring the schema ID.
anotherSchema
Schema
required
The schema to compare with
Returns: True if the schemas are equivalent (same structure and identifier fields)

Aliases

getAliases()

public Map<String, Integer> getAliases()
Returns the alias map for this schema, if set. Alias maps are created when translating external schemas (like Avro) to Iceberg format. Returns: Map of column aliases to field IDs, or null if no aliases

aliasToId(String alias)

public Integer aliasToId(String alias)
Returns the column ID for the given column alias.
alias
String
required
A full column name from the unconverted schema
Returns: The column ID, or null if the alias doesn’t exist

idToAlias(Integer fieldId)

public String idToAlias(Integer fieldId)
Returns the full column name in the unconverted schema for the given column ID.
fieldId
Integer
required
A column ID in this schema
Returns: The column alias, or null if not found

Data Access

accessorForField(int id)

public Accessor<StructLike> accessorForField(int id)
Returns an accessor for retrieving data from StructLike rows.
id
int
required
The field ID
Returns: An Accessor to retrieve values from a StructLike row
Accessors do not retrieve data contained in lists or maps.

Static Methods

checkCompatibility(Schema schema, int formatVersion)

public static void checkCompatibility(Schema schema, int formatVersion)
Checks the compatibility of the schema with a format version. This validates that the schema does not contain types released in later format versions.
schema
Schema
required
The schema to check
formatVersion
int
required
The table format version
Throws: IllegalStateException if the schema is incompatible

indexFields(Collection<Schema> schemas)

public static Map<Integer, NestedField> indexFields(Collection<Schema> schemas)
Indexes all fields from multiple schemas. This method favors field definitions from higher schema IDs to handle type promotions.
schemas
Collection<Schema>
required
The collection of schemas to index
Returns: Map of field IDs to fields

Usage Examples

Creating a Schema

import org.apache.iceberg.Schema;
import org.apache.iceberg.types.Types;
import static org.apache.iceberg.types.Types.NestedField.*;

Schema schema = new Schema(
    required(1, "id", Types.LongType.get()),
    optional(2, "data", Types.StringType.get()),
    required(3, "timestamp", Types.TimestampType.withZone()),
    optional(4, "user", Types.StructType.of(
        required(5, "name", Types.StringType.get()),
        optional(6, "email", Types.StringType.get())
    ))
);

Creating a Schema with Identifier Fields

import com.google.common.collect.ImmutableSet;

Schema schema = new Schema(
    Arrays.asList(
        required(1, "user_id", Types.LongType.get()),
        required(2, "event_id", Types.LongType.get()),
        optional(3, "event_type", Types.StringType.get()),
        required(4, "timestamp", Types.TimestampType.withZone())
    ),
    ImmutableSet.of(1, 2)  // user_id and event_id form the identifier
);

Looking Up Fields

Schema schema = ...;

// By name
NestedField field = schema.findField("user.email");
if (field != null) {
    System.out.println("Type: " + field.type());
}

// By ID
NestedField fieldById = schema.findField(1);
String fieldName = schema.findColumnName(1);

// Get type directly
Type type = schema.findType("timestamp");

Schema Projection

Schema schema = ...;

// Select specific columns
Schema projection = schema.select("id", "timestamp", "user.name");

// Select all columns using wildcard
Schema allColumns = schema.select("*");

// Case-insensitive selection
Schema caseInsensitiveProj = schema.caseInsensitiveSelect("ID", "TIMESTAMP");

Working with Identifier Fields

Schema schema = ...;

// Get identifier field IDs
Set<Integer> identifierIds = schema.identifierFieldIds();
System.out.println("Identifier field IDs: " + identifierIds);

// Get identifier field names
Set<String> identifierNames = schema.identifierFieldNames();
System.out.println("Identifier fields: " + identifierNames);

Comparing Schemas

Schema schema1 = ...;
Schema schema2 = ...;

// Check if schemas are equivalent (ignoring schema ID)
if (schema1.sameSchema(schema2)) {
    System.out.println("Schemas are equivalent");
}

// Exact equality (including schema ID)
if (schema1.equals(schema2)) {
    System.out.println("Schemas are identical");
}

Source Code Reference

Source: org/apache/iceberg/Schema.java:56