Skip to content

rewrite_position_delete_files fails with ValidationException for tables with array columns #15080

@bk-mz

Description

@bk-mz

Summary

The rewrite_position_delete_files procedure fails with a ValidationException when run on tables that have array columns containing primitive fields. This is a regression introduced in Iceberg 1.8.0.

Error

org.apache.iceberg.exceptions.ValidationException: Invalid partition field parent: list<struct<5: value: optional long, 6: count: optional int>>
	at org.apache.iceberg.PartitionSpec.checkCompatibility(PartitionSpec.java:674)
	at org.apache.iceberg.PartitionSpec.checkCompatibility(PartitionSpec.java:658)
	at org.apache.iceberg.PartitionSpec$Builder.add(PartitionSpec.java:514)
	at org.apache.iceberg.PartitionSpec$Builder.identity(PartitionSpec.java:542)
	at org.apache.iceberg.expressions.ExpressionUtil.lambda$identitySpec$5(ExpressionUtil.java:745)
	at java.base/java.lang.Iterable.forEach(Iterable.java:75)
	at org.apache.iceberg.expressions.ExpressionUtil.identitySpec(ExpressionUtil.java:744)
	at org.apache.iceberg.expressions.ExpressionUtil.extractByIdInclusive(ExpressionUtil.java:275)
	at org.apache.iceberg.spark.source.PositionDeletesRowReader.open(PositionDeletesRowReader.java:95)

Root Cause

Commit 9fb80b7 added validation in PartitionSpec.checkCompatibility() that partition field parents must be StructType.

When reading position deletes, ExpressionUtil.nonConstantFieldIds() collects ALL primitive field IDs from the table schema, including those nested inside arrays. Then ExpressionUtil.identitySpec() attempts to create identity partitions for these fields, which fails validation because the parent type is a list, not a struct.

Reproduction

Tables with array columns containing primitive fields trigger this bug:

CREATE TABLE test_table (
  id BIGINT, 
  data STRING, 
  items ARRAY<STRUCT<value:BIGINT, count:INT>>
) USING iceberg 
TBLPROPERTIES('format-version'='2', 'write.delete.mode'='merge-on-read');

INSERT INTO test_table VALUES 
  (1, 'a', array(named_struct('value', cast(10 as bigint), 'count', 1))),
  (2, 'b', array(named_struct('value', cast(20 as bigint), 'count', 2)));

DELETE FROM test_table WHERE id = 1;
DELETE FROM test_table WHERE id = 2;

-- This fails with ValidationException
CALL system.rewrite_position_delete_files(table => 'test_table', options => map('rewrite-all','true'));

Reproducer PR

See PR #15079 for a test case that reproduces this issue.

Environment

  • Iceberg version: 1.8.0+ (regression from 1.7.1)
  • Spark version: 3.5.x
  • Table format version: 2

Workaround

Use Iceberg 1.7.1 or earlier until this is fixed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions