-
Notifications
You must be signed in to change notification settings - Fork 3k
Kafka Connect: Add delta writer support #12070
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@bryanck copied over the code as is. Im planning to refactor upsert mode (delta writer) code, planning to add few improvements to it, potentially changing existing behavior. |
kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/IcebergSinkConfig.java
Outdated
Show resolved
Hide resolved
kafka-connect/kafka-connect/src/main/java/org/apache/iceberg/connect/data/RecordProjection.java
Show resolved
Hide resolved
...connect/kafka-connect/src/main/java/org/apache/iceberg/connect/data/BaseDeltaTaskWriter.java
Show resolved
Hide resolved
There are a couple of discussions on why we didn't originally add the delta writer functionality. I think we will need to resolve those discussions before we add this. |
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions. |
|
This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
|
I consider this functionality as a MUST, what's preventing it from been merged? At the end people are force to move to tabular version if they want support for upserts |
|
@ismailsimsek @bryanck can we please revive this PR? |
|
It would be great to have this functionality, even if it needs to be used with some attention points. |
5333248 to
9ad8d26
Compare
df1418f to
cbad542
Compare
|
the failure doesn't seem related. @ajantha-bhat @bryanck @jbonofre could you please review? |
|
@danielcweeks Do you feel our stance on this evolved or should we hold off on adding this until there is more clarity on the future of equality deletes? |
|
I feel we should close this PR until we discuss this with the community. If the community feels we can move forward, I can handling porting over my code. |
|
@bryanck while I can fully relate to your concerns, I strongly advocate for moving forward with this PR, performance considerations notwithstanding. I agree with @ajantha-bhat and their comment here that the performance limitations of equality deletes should be addressed separately. For our team, having an append-only connector is virtually useless and we would rather deal with the related performance issues. |
|
I'm following up on this topic—are there any updates or decisions so far? Thanks in advance for your help! |
| @Override | ||
| public void write(Record row) throws IOException { | ||
| Operation op; | ||
| if (row instanceof RecordWrapper) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If CDC enabled and upsertMode enabled the CDC insert will not be changed to an UPDATE. Which I think might be wrong.
I raised a similar PR in the old tabular code last week: databricks/iceberg-kafka-connect#332
BadCandy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The iceberg.tables.cdc-field property is only used to determine whether to use DeltaWriter, and setting this property does not configure the op field in the Record.
It seems necessary to modify the code to set the op field in RecordWrapper, referring to the existing implementation in IcebergWriter.
cbad542 to
e06fe78
Compare
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions. |
|
Still relevant |
|
Can this be merged? |
There are conflicts with the main branch, so we can't merge this. For the record: There is an ongoing effort to move away from the FileAppenderFactory and start using the FileWriterFactory (#14328), which will conflict with the current PR. |
Will the FileWriterFactory allow for upserts? |
|
@cbuckle1: It only changes the interface which uses for the files to write. We still need a PR like this. Mostly only a change where |
|
Hey @pvary I just saw this PR you were mentioning is already merged. Are we good to go now? |
|
We should resolve concerns around relying on equality deletes before going down that road, or open a new PR for a solution that does not rely on equality deletes. Here is one thread on the mailing list from a few months ago: https://lists.apache.org/thread/96dhf3sj5pc4ql0l8yk8sxgtr78bchrd. |
|
@bryanck - for us, we are using this for streaming data, so equality deletes are needed. Based on another comment from April: #12070 (comment) it looks like others are in the same boat. |
|
@bryanck - in the linked thread, you mentioned that the flink sink is looking at a similar issue, do you have the Github issue for us to track? |
|
I can give it a shot, ill base it on your PR and the flink sink implementation. |
resolves #10842