Skip to content

Conversation

@smaheshwar-pltr
Copy link
Contributor

@smaheshwar-pltr smaheshwar-pltr commented Jan 20, 2026

Motivation

There are a couple of issues related to table replaces. BaseTransaction.commitReplaceTransaction() does not re-apply replacement and transaction updates onto refreshed metadata. When concurrent changes occur, the transaction therefore commits stale metadata.

When a REPLACE transaction commits after concurrent changes (appends, snapshot expiration, other replaces), it overwrites those changes with stale metadata. This can lead to snapshot history loss, and concurrent snapshot expiration can even cause table corruption. (#15090)

V3 tables require that snapshot.first-row-id >= table.next-row-id when adding a snapshot. The snapshot's first-row-id is set from base.nextRowId() when the snapshot is produced.

With REST catalogs, updates are sent to the server and applied to the server's current metadata. If a concurrent commit advanced the server's next-row-id, the snapshot's first-row-id (based on stale metadata) will be behind:

  Cannot add a snapshot, first-row-id is behind table next-row-id: 100 < 150

This fails the entire replace operation, potentially requiring data files to be rewritten. In other words, V3 concurrent replaces generally fail. (#15091)

This PR

This PR's approach to resolving these issues is re-imagining replace (and createOrReplace) transactions to "rebase" their changes onto a re-built replacement of refreshed table metadata. This does "break" a lot of existing tests / behaviour, though.

@github-actions github-actions bot added the core label Jan 20, 2026
@Test
public void testConcurrentReplaceTransactions() {
@ParameterizedTest
@ValueSource(ints = {2, 3})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#15091 shows the failure of this V3 test, prior to this PR

Comment on lines +2699 to +2700
// All three successfully committed snapshots should be present
assertThat(afterSecondReplace.snapshots()).hasSize(3);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#15090 shows the failure of this added line, prior to this PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you just add a new test please to show where exactly stuff fails with V3?

Copy link
Contributor Author

@smaheshwar-pltr smaheshwar-pltr Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense - I've updated the PR description in the meantime to cover this.
(Any concurrent change to a table's snapshots causes the replace transaction to fail entirely for the REST catalog, due to server-side row-lineage validation. I'll put up an issue to track, actually)

@smaheshwar-pltr smaheshwar-pltr changed the title Fix: Replace transactions rebase onto refreshed metadata [WIP] Fix: Replace transactions rebase onto refreshed metadata Jan 20, 2026
private boolean hasLastOpCommitted;
private final MetricsReporter reporter;

private Schema replaceSchema;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: This change turned out to be more breaking than I expected. If we want to proceed, see if this can be cleaned up

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Realise this is the sort of change that'd require a dev list discussion - I wanted to experiment with this approach first)

if (base != underlyingOps.refresh()) {
// use refreshed the metadata
try {
underlyingOps.refresh();
Copy link
Contributor Author

@smaheshwar-pltr smaheshwar-pltr Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: TestHiveCreateReplaceTable#testCreateOrReplaceTableTxnTableDeletedConcurrently shows an NPE where this refresh actually returns null instead of NoSuchTableException being thrown. Consider handling that here, fixing if it's a bug, or leaving for now (as that's maybe how concurrent appends with dropped failed prior to this PR)

@smaheshwar-pltr smaheshwar-pltr changed the title [WIP] Fix: Replace transactions rebase onto refreshed metadata [WIP] Replace transactions rebase onto refreshed metadata Jan 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants