-
Notifications
You must be signed in to change notification settings - Fork 3k
Description
Summary
The rewrite_position_delete_files procedure fails with a ValidationException when run on tables that have array columns containing primitive fields. This is a regression introduced in Iceberg 1.8.0.
Error
org.apache.iceberg.exceptions.ValidationException: Invalid partition field parent: list<struct<5: value: optional long, 6: count: optional int>>
at org.apache.iceberg.PartitionSpec.checkCompatibility(PartitionSpec.java:674)
at org.apache.iceberg.PartitionSpec.checkCompatibility(PartitionSpec.java:658)
at org.apache.iceberg.PartitionSpec$Builder.add(PartitionSpec.java:514)
at org.apache.iceberg.PartitionSpec$Builder.identity(PartitionSpec.java:542)
at org.apache.iceberg.expressions.ExpressionUtil.lambda$identitySpec$5(ExpressionUtil.java:745)
at java.base/java.lang.Iterable.forEach(Iterable.java:75)
at org.apache.iceberg.expressions.ExpressionUtil.identitySpec(ExpressionUtil.java:744)
at org.apache.iceberg.expressions.ExpressionUtil.extractByIdInclusive(ExpressionUtil.java:275)
at org.apache.iceberg.spark.source.PositionDeletesRowReader.open(PositionDeletesRowReader.java:95)
Root Cause
Commit 9fb80b7 added validation in PartitionSpec.checkCompatibility() that partition field parents must be StructType.
When reading position deletes, ExpressionUtil.nonConstantFieldIds() collects ALL primitive field IDs from the table schema, including those nested inside arrays. Then ExpressionUtil.identitySpec() attempts to create identity partitions for these fields, which fails validation because the parent type is a list, not a struct.
Reproduction
Tables with array columns containing primitive fields trigger this bug:
CREATE TABLE test_table (
id BIGINT,
data STRING,
items ARRAY<STRUCT<value:BIGINT, count:INT>>
) USING iceberg
TBLPROPERTIES('format-version'='2', 'write.delete.mode'='merge-on-read');
INSERT INTO test_table VALUES
(1, 'a', array(named_struct('value', cast(10 as bigint), 'count', 1))),
(2, 'b', array(named_struct('value', cast(20 as bigint), 'count', 2)));
DELETE FROM test_table WHERE id = 1;
DELETE FROM test_table WHERE id = 2;
-- This fails with ValidationException
CALL system.rewrite_position_delete_files(table => 'test_table', options => map('rewrite-all','true'));Reproducer PR
See PR #15079 for a test case that reproduces this issue.
Environment
- Iceberg version: 1.8.0+ (regression from 1.7.1)
- Spark version: 3.5.x
- Table format version: 2
Workaround
Use Iceberg 1.7.1 or earlier until this is fixed.