Skip to content

Conversation

@gaborkaszab
Copy link
Collaborator

@gaborkaszab gaborkaszab commented Nov 28, 2025

When committing a merge append there is a step to refresh table metadata to load the snapshot that was just created by the commit operation. However, this is not always necessary, because depending on the implementation of TableOperations.commit(), the metadata could already be updated to contain that snapshot.

For instance with RESTTableOperations it's not required to perform this refresh step, because commit() already refreshes table metadata. With RESTTableOperations, initially an appendFiles().commit() required the following messages sent to REST catalog:

  1. Load table at the beginning of SnapshotProducer.apply()
  2. Update table to send the updates to the catalog
  3. Load table again in MergingSnapshotProducer.updateEvent()
    The last step isn't needed when ops is RESTTableOperations.

@github-actions github-actions bot added the core label Nov 28, 2025
@gaborkaszab gaborkaszab requested a review from nastra November 28, 2025 15:32
@gaborkaszab gaborkaszab force-pushed the main_skip_loadtable_for_mergeappend branch from 68bb3bd to 94827aa Compare November 28, 2025 15:38
@gaborkaszab
Copy link
Collaborator Author

Hey @flyrain @amogh-jahagirdar @nastra ,
You might be interested in this change (I found you active within this area) where I found that when I do table.newAppend().appendFile(...).commit() then there is an excess load table request going to the REST catalog. In case I don't miss anything, I think we can skip this step because RESTTableOperations.commit() already refreshed table metadata.

Note, I also see an unconditional table refresh in CherryPickOperation.updateEvent() but I didn't have the time to investigate that one further. In case there is support for the current enhancement, I can take a look at the cherry pick one in a separate PR.

@gaborkaszab gaborkaszab force-pushed the main_skip_loadtable_for_mergeappend branch 2 times, most recently from 7a98e4f to 7e72af1 Compare December 9, 2025 10:40
@gaborkaszab
Copy link
Collaborator Author

@flyrain @amogh-jahagirdar @nastra Would you mind taking a look at this improvement?

long snapshotId = snapshotId();
Snapshot justSaved = ops().refresh().snapshot(snapshotId);

Snapshot justSaved = ops().current().snapshot(snapshotId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the changes, @gaborkaszab ! Should we check if ops is an instance of RESTTableOperations to ensure this is REST operation?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is the other approach I was considering. I didn't want (Merging)SnapshotProducer to know about what is the underlying TableOps implementation and I preferred this more general approach. Basically, instead of asking "is this REST ops?" I ask "do we have this snapshot" that seemed more robust and general.
WDYT @flyrain ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. To be clear, I'm all for a generic way instead of dedicating to one type of table operation. My confusion and concern came from the PR description as the following:

With RESTTableOperations it's not required to perform a refresh after committing a merge append because the commit already refreshes table metadata.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing that out, @flyrain ! I updated the description to be a bit more generic, and use RESTTableOperations as an example.

@gaborkaszab
Copy link
Collaborator Author

Hey @flyrain ,
Do you think you can take a look at this and maybe merge if there are no further comments? Thanks!

@gaborkaszab gaborkaszab force-pushed the main_skip_loadtable_for_mergeappend branch from 7e72af1 to cb1a4e4 Compare January 13, 2026 14:10
@gaborkaszab
Copy link
Collaborator Author

Hey @flyrain
Do you think we can wrap this up and merge, unless there are further comments?

With RESTTableOperations it's not required to perform a refresh
after committing a merge append because the commit already
refreshes table metadata. Initially an appendFiles().commit()
required the following messages sent to REST catalog:
  1) Load table at the beginning of SnapshotProducer.apply()
  2) Update table to send the updates to the catalog
  3) Load table again in MergingSnapshotProducer.updateEvent()
The last step isn't needed when ops is RESTTableOperations.
@gaborkaszab gaborkaszab force-pushed the main_skip_loadtable_for_mergeappend branch from cb1a4e4 to bac8711 Compare January 20, 2026 11:39
@gaborkaszab
Copy link
Collaborator Author

Resolved merge conflict.
@flyrain do you think you can take a quick look at this?

Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for missing this one to review @gaborkaszab, I think it makes a lot of sense to first just check the current metadata after a commit for the snapshot instead of always eagerly issuing a refresh. Just had some comments on testing. Would be good to get @flyrain @nastra input as well.

Comment on lines +3276 to +3278
table.newAppend().appendFile(FILE_A).commit();
assertThat(CustomMetricsReporter.COMMIT_COUNTER.get()).isEqualTo(1);
CustomMetricsReporter.COMMIT_COUNTER.set(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry not entirely following, what additional behavior related to this change are we trying test here? This just looks like it verifies the commit is performed and the metrics after that.


BaseTable table = (BaseTable) catalog.createTable(TABLE, SCHEMA);

table.newAppend().appendFile(FILE_A).commit();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before the commit, we may want to extract out the CreateSnapshotEvent and assert it's what we expect, since that's the part we're changing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants