-
Notifications
You must be signed in to change notification settings - Fork 3k
Core: Skip unnecessary metadata refresh after merge append #14709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Core: Skip unnecessary metadata refresh after merge append #14709
Conversation
68bb3bd to
94827aa
Compare
|
Hey @flyrain @amogh-jahagirdar @nastra , Note, I also see an unconditional table refresh in |
7a98e4f to
7e72af1
Compare
|
@flyrain @amogh-jahagirdar @nastra Would you mind taking a look at this improvement? |
| long snapshotId = snapshotId(); | ||
| Snapshot justSaved = ops().refresh().snapshot(snapshotId); | ||
|
|
||
| Snapshot justSaved = ops().current().snapshot(snapshotId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the changes, @gaborkaszab ! Should we check if ops is an instance of RESTTableOperations to ensure this is REST operation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is the other approach I was considering. I didn't want (Merging)SnapshotProducer to know about what is the underlying TableOps implementation and I preferred this more general approach. Basically, instead of asking "is this REST ops?" I ask "do we have this snapshot" that seemed more robust and general.
WDYT @flyrain ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation. To be clear, I'm all for a generic way instead of dedicating to one type of table operation. My confusion and concern came from the PR description as the following:
With RESTTableOperations it's not required to perform a refresh after committing a merge append because the commit already refreshes table metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing that out, @flyrain ! I updated the description to be a bit more generic, and use RESTTableOperations as an example.
|
Hey @flyrain , |
7e72af1 to
cb1a4e4
Compare
|
Hey @flyrain |
With RESTTableOperations it's not required to perform a refresh after committing a merge append because the commit already refreshes table metadata. Initially an appendFiles().commit() required the following messages sent to REST catalog: 1) Load table at the beginning of SnapshotProducer.apply() 2) Update table to send the updates to the catalog 3) Load table again in MergingSnapshotProducer.updateEvent() The last step isn't needed when ops is RESTTableOperations.
cb1a4e4 to
bac8711
Compare
|
Resolved merge conflict. |
amogh-jahagirdar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for missing this one to review @gaborkaszab, I think it makes a lot of sense to first just check the current metadata after a commit for the snapshot instead of always eagerly issuing a refresh. Just had some comments on testing. Would be good to get @flyrain @nastra input as well.
| table.newAppend().appendFile(FILE_A).commit(); | ||
| assertThat(CustomMetricsReporter.COMMIT_COUNTER.get()).isEqualTo(1); | ||
| CustomMetricsReporter.COMMIT_COUNTER.set(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry not entirely following, what additional behavior related to this change are we trying test here? This just looks like it verifies the commit is performed and the metrics after that.
|
|
||
| BaseTable table = (BaseTable) catalog.createTable(TABLE, SCHEMA); | ||
|
|
||
| table.newAppend().appendFile(FILE_A).commit(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before the commit, we may want to extract out the CreateSnapshotEvent and assert it's what we expect, since that's the part we're changing?
When committing a merge append there is a step to refresh table metadata to load the snapshot that was just created by the commit operation. However, this is not always necessary, because depending on the implementation of
TableOperations.commit(), the metadata could already be updated to contain that snapshot.For instance with RESTTableOperations it's not required to perform this refresh step, because
commit()already refreshes table metadata. With RESTTableOperations, initially an appendFiles().commit() required the following messages sent to REST catalog:The last step isn't needed when ops is RESTTableOperations.