-
Notifications
You must be signed in to change notification settings - Fork 3k
[WIP] Replace transactions rebase onto refreshed metadata #15092
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[WIP] Replace transactions rebase onto refreshed metadata #15092
Conversation
| @Test | ||
| public void testConcurrentReplaceTransactions() { | ||
| @ParameterizedTest | ||
| @ValueSource(ints = {2, 3}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#15091 shows the failure of this V3 test, prior to this PR
| // All three successfully committed snapshots should be present | ||
| assertThat(afterSecondReplace.snapshots()).hasSize(3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#15090 shows the failure of this added line, prior to this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you just add a new test please to show where exactly stuff fails with V3?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense - I've updated the PR description in the meantime to cover this.
(Any concurrent change to a table's snapshots causes the replace transaction to fail entirely for the REST catalog, due to server-side row-lineage validation. I'll put up an issue to track, actually)
| private boolean hasLastOpCommitted; | ||
| private final MetricsReporter reporter; | ||
|
|
||
| private Schema replaceSchema; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: This change turned out to be more breaking than I expected. If we want to proceed, see if this can be cleaned up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Realise this is the sort of change that'd require a dev list discussion - I wanted to experiment with this approach first)
| if (base != underlyingOps.refresh()) { | ||
| // use refreshed the metadata | ||
| try { | ||
| underlyingOps.refresh(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: TestHiveCreateReplaceTable#testCreateOrReplaceTableTxnTableDeletedConcurrently shows an NPE where this refresh actually returns null instead of NoSuchTableException being thrown. Consider handling that here, fixing if it's a bug, or leaving for now (as that's maybe how concurrent appends with dropped failed prior to this PR)
Motivation
There are a couple of issues related to table replaces.
BaseTransaction.commitReplaceTransaction()does not re-apply replacement and transaction updates onto refreshed metadata. When concurrent changes occur, the transaction therefore commits stale metadata.When a
REPLACEtransaction commits after concurrent changes (appends, snapshot expiration, other replaces), it overwrites those changes with stale metadata. This can lead to snapshot history loss, and concurrent snapshot expiration can even cause table corruption. (#15090)V3 tables require that
snapshot.first-row-id>=table.next-row-idwhen adding a snapshot. The snapshot'sfirst-row-idis set frombase.nextRowId()when the snapshot is produced.With REST catalogs, updates are sent to the server and applied to the server's current metadata. If a concurrent commit advanced the server's
next-row-id, the snapshot'sfirst-row-id(based on stale metadata) will be behind:This fails the entire replace operation, potentially requiring data files to be rewritten. In other words, V3 concurrent replaces generally fail. (#15091)
This PR
This PR's approach to resolving these issues is re-imagining replace (and createOrReplace) transactions to "rebase" their changes onto a re-built replacement of refreshed table metadata. This does "break" a lot of existing tests / behaviour, though.