Skip to content

Conversation

@fbertsch
Copy link

  • Extract the latest instance of that column name from historical schemas
  • Required columns will be come non-required
  • The parent field must already be present in the schema
  • Add spark procedure call to run undelete

- Extract the latest instance of that column name from historical
  schemas
- Required columns will be come non-required
- The parent field must already be present in the schema
- Add spark procedure call to run undelete
@fbertsch
Copy link
Author

fbertsch commented Jan 20, 2026

@bryanck - could you take a look?

This API is imperfect, since it doesn't allow an undelete of a field that has been recreated, i.e.:

  1. ALTER TABLE ADD COLUMN x string
  2. ALTER TABLE DROP COLUMN x
  3. ALTER TABLE ADD COLUMN x int
  4. ALTER TABLE DROP COLUMN x
  5. CALL system.undelete_column('my.iceberg_table', 'x')

The string version of x is unrecoverable. I doubt this is such a large use case. To support that, we'd need an API to expose the historical fields of a table, and then undelete by ID, which is probably too much complexity for a good API.

@fbertsch
Copy link
Author

Fixes #14488

@github-actions github-actions bot added the docs label Jan 20, 2026
old: "method org.apache.iceberg.orc.ORC.WriteBuilder org.apache.iceberg.orc.ORC.WriteBuilder::config(java.lang.String,\
\ java.lang.String)"
justification: "Removing deprecations for 1.2.0"
"1.10.0":
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generated yaml changed the structure - I can try to move it back around to minimize this diff

import org.apache.iceberg.types.Types;
import org.junit.jupiter.api.Test;

public class TestSchemaUndelete extends HadoopTableTestBase {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test coverage looks good. the only corner case I could add would be

  • add and delete a column count twice (so having different ids)
  • verify the undeleted column is the most recent one.

would be a simple addition to testUndeletePreservesFieldId

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, thank you!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks. new test looks good; ensures that scan happens in the correct order. You'll need actual iceberg committers to review and approve the rest of the patch now...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants