Skip to content

Conversation

@superhx
Copy link
Collaborator

@superhx superhx commented Jan 14, 2026

No description provided.

cmccabe and others added 30 commits November 6, 2024 13:17
…#17686)

Kafka Streams actively purges records from repartition topics. Prior to this PR, Kafka Streams would retrieve the offset from the consumedOffsets map, but here are a couple of edge cases where the consumedOffsets can get ahead of the commitedOffsets map. In these cases, this means Kafka Streams will potentially purge a repartition record before it's committed.

Updated the current StreamTask test to cover this case

Reviewers: Matthias Sax <mjsax@apache.org>
In docs/ops.html, add a section discussion the difference between static and dynamic quorums. This section also discusses how to find out which quorum type you have. Also discuss the current limitations, such as the inability to transition from static quorums to dynamic.

Add a brief section to docs/upgrade.html discussing controller membership change.

Co-authored-by: Federico Valeri <fedevaleri@gmail.com>, Colin P. McCabe <cmccabe@apache.org>
Reviewers: Justine Olshan <jolshan@confluent.io>
* MINOR: Fix upgrade instructions for 3.8 and 3.9

Signed-off-by: Josep Prat <josep.prat@aiven.io>

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Colin McCabe <colin@cmccabe.xyz>
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
…p (#17710)

TimestampExtractor allows to drop records by returning a timestamp of -1. For this case, we still need to update consumed offsets to allows us to commit progress.

Reviewers: Bill Bejeck <bill@confluent.io>
Fixes broken commit for KAFKA-17872
… for aborted txns (#17676) (#1 7733)

Reviewers: Jun Rao <junrao@gmail.com>
Clarify the functionality of split matching on first predicate
Reviewers: Matthias Sax <mjsax@apache.org>
…s large (#17794)

If a user has configured value of `retention.ms` to a value greater than current unix timestamp epoch, then we fail cleanup of a remote log segment with an error. This change fixes the bug by handling this case of large `retention.ms` correctly.

Reviewers: Divij Vaidya <diviv@amazon.com>
Reviewers: Bill Bejeck <bill@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
…7713)

The thread that evaluates the gauge for the oldest-iterator-open-since-ms runs concurrently
with threads that open/close iterators (stream threads and interactive query threads). This PR
fixed a race condition between `openIterators.isEmpty()` and `openIterators.first()`, by catching
a potential exception. Because we except the race condition to be rare, we rather catch the
exception in favor of introducing a guard via locking.

Reviewers: Matthias J. Sax <matthias@confluent.io>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>
Reviewers: Josep Prat <josep.prat@aiven.io>
Signed-off-by: PoAn Yang <payang@apache.org>

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
Following the KIP-1033 a FailedProcessingException is passed to the Streams-specific uncaught exception handler.

The goal of the PR is to unwrap a FailedProcessingException into a StreamsException when an exception occurs during the flushing or closing of a store

Reviewer: Bruno Cadonna <cadonna@apache.org>
Add alive-stream-threads to Kafka Streams client metrics table
Reviewers: Matthias Sax <mjsax@apache.org>
SnapshotRegistry needs to have a reference to all snapshot data structures. However, this should
not be a strong reference, but a weak reference, so that these data structures can be garbage
collected as needed. This PR also adds a scrub mechanism so that we can eventually reclaim the
slots used by GC'ed Revertable objects in the SnapshotRegistry.revertables array.

Reviewers: David Jacot <david.jacot@gmail.com>
…in a colon (#17883)

Kafka Principals must contain a colon. We should enforce this in createAcls.

Reviewers: David Arthur <mumrah@gmail.com>
Reviewer: Lucas Brutschy <lbrutschy@confluent.io>
When Kafka Streams skips overs corrupted messages, it might not resume previously paused partitions,
if more than one record is skipped at once, and if the buffer drop below the max-buffer limit at the same time.

Reviewers: Matthias J. Sax <matthias@confluent.io>
apache/kafka#17899 fixed the issue, but did not
add any unit tests.

Reviewers: Bill Bejeck <bill@confluent.io>
Add missing fixed for last cherry-pick.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
…m Properties section (#17920)


Reviewers: Mickael Maison <mickael.maison@gmail.com>
…KIP-853 (#17921)

Reviewers: Ziming Deng <dengziming1993@gmail.com>.
Docker tests rely on docker compose. In recent runs it has been observed that github actions does not provide support for docker compose, so we are installing it explicitly in the workflow.
…ows (#18042)

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
…352)

Because the set of voters are dynamic (KIP-953), it is possible for a replica to believe they are a voter while the current leader doesn't have that replica in the voter set. In this replicated state, the leader will not sent BeginQuorumEpoch requests to such a replica. This means that such replicas will not be able to discover the leader.

This change will help Unattached rediscover the leader by sending Fetch requests to the the bootstrap servers.
Followers have a similar issue - if they are unable to communicate with the leader they should try contacting the bootstrap servers.

Reviewers: José Armando García Sancio <jsancio@apache.org>
When a struct field is tagged and nullable, it is serialized as
{ varint tag; varint dataLength; nullable data }, where
nullable is serialized as
{ varint isNotNull; if (isNotNull) struct s; }. The length field
includes the is-not-null varint.

This patch fixes a bug in serialization where the written value of
the length field and the value used to compute the size of the length
field differs by 1. In practice this has no impact unless the
serialized length of the struct is 127 bytes, since the varint encodings
of 127 and 128 have different lengths (0x7f vs 0x80 01).

Reviewers: David Jacot <djacot@confluent.io>
srdo and others added 27 commits April 7, 2025 10:25
…backport) (#19307)

cherry picked from commit 9cb49092f0865ae18487444be52aa1d03b915de3 in trunk

Co-authored-by: Ken Huang <s7133700@gmail.com>

Reviewers: Luke Chen <showuon@gmail.com>
…310)

Cherry picked from commit 078760a008fadcf524b31d079f8b08c9966c1139

Co-authored-by: Chia-Chuan Yu <yujuan476@gmail.com>

Reviewers: Luke Chen <showuon@gmail.com>
Backport for 3.9 for apache/kafka#16522

The reason to do this is that this is necessary in order for 3.9 to support Java 24.

Please see https://lists.apache.org/thread/6k942pphowd28dh9gn6xbnngk6nxs3n0 where it is being discussed whether to do this.

Co-authored-by: Greg Harris <greg.harris@aiven.io>

Reviewers: Luke Chen <showuon@gmail.com>, Greg Harris <greg.harris@aiven.io>
kafka-client-metrics.sh cannot reset the interval using `--interval=`.

Reviewers: Andrew Schofield <aschofield@confluent.io>
KAFKA-17639 Add Java 23 to CI.
Backported from Commit 76a9df4 and updated Jenkinsfile.

Reviewers: Mickael Maison <mickael.maison@gmail.com>
Call the StateRestoreListener#onBatchRestored with numRestored and not
the totalRestored when reprocessing state

See: https://issues.apache.org/jira/browse/KAFKA-18962

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Matthias
Sax <mjsax@apache.org>
For the KRaft implementation there is a race between the network thread,
which read bytes in the log segments, and the KRaft driver thread, which
truncates the log and appends records to the log. This race can cause
the network thread to send corrupted records or inconsistent records.
The corrupted records case is handle by catching and logging the
CorruptRecordException. The inconsistent records case is handle by only
appending record batches who's partition leader epoch is less than or
equal to the fetching replica's epoch and the epoch didn't change
between the request and response.

For the ISR implementation there is also a race between the network
thread and the replica fetcher thread, which truncates the log and
appends records to the log. This race can cause the network thread send
corrupted records or inconsistent records. The replica fetcher thread
already handles the corrupted record case. The inconsistent records case
is handle by only appending record batches who's partition leader epoch
is less than or equal to the leader epoch in the FETCH request.

Reviewers: Jun Rao <junrao@apache.org>, Alyssa Huang <ahuang@confluent.io>, Chia-Ping Tsai <chia7712@apache.org>
…equest and fetch state (#19223)

This PR fixes a potential issue where the `FetchResponse` returns
`divergingEndOffsets` with an older leader epoch. This can lead to
committed records being removed from the follower's log, potentially
causing data loss.

In detail:
`processFetchRequest` gets the requested leader epoch of partition data
by `topicPartition` and compares it with the leader epoch of the current
fetch state. If they don't match, the response is ignored.

Reviewers: Jun Rao <junrao@gmail.com>
…sponse (#19127)

The kafka controllers need to set kraft.version in their
ApiVersionsResponse messages according to the current kraft.version
reported by the Raft layer. Instead, currently they always set it to 0.

Also remove FeatureControlManager.latestFinalizedFeatures. It is not
needed and it does a lot of copying.

Reviewers: Jun Rao <junrao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
Fix failed
`kafka.server.KRaftClusterTest."testDescribeKRaftVersion(boolean)` test.
It is failing in [3.9
branch](https://ci-builds.apache.org/job/Kafka/job/kafka/job/3.9/186/#showFailuresLink).

In the [patch for
4.0](https://github.com/apache/kafka/pull/19127/files#diff-a95d286b4e1eb166af89ea45bf1fe14cd8c944d7fc0483bfd4eff2245d1d2bbbR1014),
we created TestKitNodes like this:
```
new TestKitNodes.Builder().
        setNumBrokerNodes(1).
        setNumControllerNodes(1).
        setFeature(KRaftVersion.FEATURE_NAME, 1.toShort).build()).build()
```
But in the 3.9, because we don't have `setFeature` method in
TestKitNodes, we removed it which causes the test failure. Added them
back to 3.9 branch.

Also in 4.0, we include this
[patch](https://github.com/apache/kafka/pull/17582/files#r1823404214),
which contains a bug fix and TestKitNodes improvement to allow set
features. Added them in 3.9 branch to fix the test and the finalized
version always 0 issue.

Reviewers: PoAn Yang <payang@apache.org>, TengYao Chi <kitingiao@gmail.com>
As of 3.9, Kafka allows disabling remote storage on a topic after it was
enabled. It allows subsequent enabling and disabling too.

However the documentation says otherwise and needs to be corrected.

Doc:
https://kafka.apache.org/39/documentation/#topicconfigs_remote.storage.enable

Reviewers: Luke Chen <showuon@gmail.com>, PoAn Yang <payang@apache.org>, Ken Huang <s7133700@gmail.com>
CVE-2025-24970: Netty, an asynchronous, event-driven network application
framework, has a vulnerability starting in version 4.1.91.Final and
prior to version 4.1.118.Final.

Reviewers: TengYao Chi <kitingiao@gmail.com>
Ran system test based on 3.9 branch, but got the error: 
`Did expect to read 'Kafka version.*3.9.1.*SNAPSHOT' from
ducker@ducker04`. It's because the version should be `3.9.1` without
snapshot. Update the version.

Reviewers: TengYao Chi <frankvicky@apache.org>
Add a brief note pointing to the issue in the migration guide

Reviewers: Luke Chen <showuon@gmail.com>, TengYao Chi
<frankvicky@apache.org>
Bump version to 3.9.1

Reviewers: Luke Chen <showuon@gmail.com>
During PR apache/kafka#19484 only the dependency
was updated but not the `LICENSE-binary` file. This fixes this
misalignment.

Signed-off-by: Josep Prat <josep.prat@aiven.io>

Reviewers: Luke Chen <showuon@gmail.com>
When building RC, the current version of Jetty has been reporting for
CVE.
Hence, we should upgrade the Jetty version to fix it.
Signed-off-by: Robin Han <hanxvdovehx@gmail.com>
@CLAassistant
Copy link

CLAassistant commented Jan 14, 2026

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 6 committers have signed the CLA.

✅ superhx
❌ mumrah
❌ showuon
❌ edoardocomar
❌ frankvicky
❌ jlprat
You have signed the CLA already but the status is still pending? Let us recheck it.

Signed-off-by: Robin Han <hanxvdovehx@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.