-
Notifications
You must be signed in to change notification settings - Fork 188
feat: add pipeline mode for streaming replication #1543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
drmingdrmer
wants to merge
13
commits into
databendlabs:main
Choose a base branch
from
drmingdrmer:285-pipeline
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+1,251
−798
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
RaftNetworkV2 as primary trait7feca32 to
829ee8a
Compare
Collection of small improvements including documentation, logging, configuration tuning, and test robustness fixes. Changes: - Reduce default network backoff from 500ms to 200ms for faster retries - Add doc comments for `IOProgress` fields (`accepted`, `submitted`, `flushed`) - Add debug logging to client-http example - Add test assertion for watch channel `changed()` pending behavior - Fix metrics test to handle missing heartbeat entries gracefully
Verify that a LogReader obtained before writing new entries can still read entries written after it was created. This ensures LogReader implementations don't cache or snapshot data in a way that makes newly written entries invisible.
Add infrastructure for open-ended streaming replication where the leader continuously sends logs after a given point without a fixed upper bound. This complements the existing fixed-range `Logs` replication mode.
Add `io_submitted_tx` watch channel to notify replication tasks when log entries have been submitted to storage and are safe to read. This enables replication tasks to coordinate with I/O progress without polling. Changes: - Add `io_submitted_tx` watch channel to `RaftCore` - Broadcast I/O submission progress after `AppendEntries`, `SaveVote`, and `UpdateIOProgress`
Add `io_accepted_tx` watch channel to notify observers before I/O operations are submitted to storage. This enables preparation for upcoming I/O events before they actually happen. Changes: - Add `io_accepted_tx` watch channel to `RaftCore` - Broadcast I/O acceptance before `UpdateIOProgress`, `AppendEntries`, and `SaveVote`
Enable replication tasks to synchronize with leader I/O progress using watch channels. The replication stream monitors io_accepted_rx and io_submitted_rx to detect leader changes and wait for log availability.
Replication to a follower has two phases: 1. **Binary search phase**: The leader runs a binary search to find the exact matching position of log entries on the follower. 2. **Pipeline mode**: After finding the match point, the leader calls the `stream_append` method on the network and continuously generates AppendEntries requests. The network implementation should pipeline all requests to the follower and yield responses. Note that responses and requests don't have to be 1-to-1 mapped - the number of responses can be smaller than the number of requests. `stream_append` provides a default implementation that calls the existing `append_entries` method to emulate streaming replication. A mature implementation should run in real pipeline mode instead of request-response manner. When a request is received by `stream_append`, it is responsible for sending all content of the request to the follower - partial success is not allowed. On the leader, the `Inflight` structure in Engine tracks the inflight replication session running by `ReplicationCore`. An `InflightId` identifies each inflight session, and the `Inflight` structure ignores any response that doesn't match the current `InflightId`. This change also reorganizes replication data structures for clarity: - Add `Payload` enum for log replication specifications (`LogIdRange`, `LogsSince`) - Add `Replicate` struct combining `inflight_id` with `Payload` - Add `ReplicationProgress` to track local committed and remote matched state - Simplify `drain_events()` to set fields directly instead of returning values - Remove obsolete `request.rs`, `replication_state.rs`, `log_state.rs` Changes: - Add pipeline mode entry in `ProgressEntry::next_send()` when fully caught up - Add `Inflight::LogsSince` variant for unbounded log streaming - Add `is_logs_since()` method to `Inflight` for type checking - Add `get_partial_success()` method to `AppendEntriesResponse` - Refactor `drain_events()` to set `next_action` and `inflight_id` directly - Add unit tests for pipeline mode in `progress/entry/tests.rs`
…ne replication Replace chunked append_entries with native gRPC bidirectional streaming via `stream_append`. This provides more efficient pipelined log replication. Changes: - Add `StreamAppend` RPC with bidirectional streaming to proto - Implement `stream_append` server handler in `raft_service.rs` - Implement `stream_append` client in `network/mod.rs` - Remove chunked `append_entries` fallback logic - Change `RaftNetworkV2::stream_append` to accept `'static` stream - Update README to reflect streaming pipeline approach - Delete obsolete `test_chunk.rs`
Replace `append_entries` with `stream_append` in HeartbeatWorker so applications only need to implement `stream_append` for replication.
Replace `append_entries` with `stream_append` in linearizable read confirmation so applications only need to implement `stream_append`.
…trait Modernize the getting-started documentation to focus on `RaftNetworkV2` as the recommended network trait. Add documentation for the optional `stream_append()` method for pipelined replication.
When rebuilding replication streams after a membership change, reuse existing streams instead of destroying all and recreating. This avoids unnecessary stream teardown and maintains in-flight replication state. Changes: - Reuse existing replication streams when targets remain in new membership - Only spawn new replication for newly added targets - Properly join and cleanup removed replication streams - Handle missing progress entries gracefully in `update_matching()` - Add debug logging for membership change operations
829ee8a to
24b736f
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Changelog
docs: update getting-started guide to use
RaftNetworkV2as primary traitModernize the getting-started documentation to focus on
RaftNetworkV2as the recommended network trait. Add documentation for the optional
stream_append()method for pipelined replication.refactor: use
stream_appendfor linearizable read confirmationReplace
append_entrieswithstream_appendin linearizable readconfirmation so applications only need to implement
stream_append.refactor: use
stream_appendfor heartbeat instead ofappend_entriesReplace
append_entrieswithstream_appendin HeartbeatWorker soapplications only need to implement
stream_appendfor replication.docs: gRPC example: implement gRPC bidirectional streaming for pipeline replication
Replace chunked append_entries with native gRPC bidirectional streaming
via
stream_append. This provides more efficient pipelined log replication.Changes:
StreamAppendRPC with bidirectional streaming to protostream_appendserver handler inraft_service.rsstream_appendclient innetwork/mod.rsappend_entriesfallback logicRaftNetworkV2::stream_appendto accept'staticstreamtest_chunk.rsfeat: add pipeline mode for streaming replication
Replication to a follower has two phases:
Binary search phase: The leader runs a binary search to find the exact
matching position of log entries on the follower.
Pipeline mode: After finding the match point, the leader calls the
stream_appendmethod on the network and continuously generatesAppendEntries requests. The network implementation should pipeline all
requests to the follower and yield responses. Note that responses and
requests don't have to be 1-to-1 mapped - the number of responses can be
smaller than the number of requests.
stream_appendprovides a default implementation that calls the existingappend_entriesmethod to emulate streaming replication. A matureimplementation should run in real pipeline mode instead of request-response
manner. When a request is received by
stream_append, it is responsible forsending all content of the request to the follower - partial success is not
allowed.
On the leader, the
Inflightstructure in Engine tracks the inflightreplication session running by
ReplicationCore. AnInflightIdidentifieseach inflight session, and the
Inflightstructure ignores any response thatdoesn't match the current
InflightId.This change also reorganizes replication data structures for clarity:
Payloadenum for log replication specifications (LogIdRange,LogsSince)Replicatestruct combininginflight_idwithPayloadReplicationProgressto track local committed and remote matched statedrain_events()to set fields directly instead of returning valuesrequest.rs,replication_state.rs,log_state.rsChanges:
ProgressEntry::next_send()when fully caught upInflight::LogsSincevariant for unbounded log streamingis_logs_since()method toInflightfor type checkingget_partial_success()method toAppendEntriesResponsedrain_events()to setnext_actionandinflight_iddirectlyprogress/entry/tests.rsfeat: add streaming replication with I/O progress synchronization
Enable replication tasks to synchronize with leader I/O progress using
watch channels. The replication stream monitors io_accepted_rx and
io_submitted_rx to detect leader changes and wait for log availability.
feat: add watch channel for I/O acceptance notification
Add
io_accepted_txwatch channel to notify observers before I/O operationsare submitted to storage. This enables preparation for upcoming I/O events
before they actually happen.
Changes:
io_accepted_txwatch channel toRaftCoreUpdateIOProgress,AppendEntries, andSaveVotetest: add assertion for watch channel changed() behavior
Add test assertion to verify that after
changed()returns ready, asubsequent call returns pending because the value was already seen.
Changes:
changed()returning pending after value is marked as seenfeat: add watch channel for I/O submission progress broadcast
Add
io_submitted_txwatch channel to notify replication tasks when logentries have been submitted to storage and are safe to read. This enables
replication tasks to coordinate with I/O progress without polling.
Changes:
io_submitted_txwatch channel toRaftCoreAppendEntries,SaveVote, andUpdateIOProgresschore: add doc
feat: add streaming replication support with
LogsSincevariantAdd infrastructure for open-ended streaming replication where the leader
continuously sends logs after a given point without a fixed upper bound.
This complements the existing fixed-range
Logsreplication mode.chore: remove unused inflight-id
This change is