feat: add pipeline mode for streaming replication #1543

drmingdrmer · 2025-12-10T09:56:40Z

Changelog

docs: update getting-started guide to use `RaftNetworkV2` as primary trait

Modernize the getting-started documentation to focus on RaftNetworkV2
as the recommended network trait. Add documentation for the optional
stream_append() method for pipelined replication.

refactor: use `stream_append` for linearizable read confirmation

Replace append_entries with stream_append in linearizable read
confirmation so applications only need to implement stream_append.

refactor: use `stream_append` for heartbeat instead of `append_entries`

Replace append_entries with stream_append in HeartbeatWorker so
applications only need to implement stream_append for replication.

docs: gRPC example: implement gRPC bidirectional streaming for pipeline replication

Replace chunked append_entries with native gRPC bidirectional streaming
via stream_append. This provides more efficient pipelined log replication.

Changes:

Add StreamAppend RPC with bidirectional streaming to proto
Implement stream_append server handler in raft_service.rs
Implement stream_append client in network/mod.rs
Remove chunked append_entries fallback logic
Change RaftNetworkV2::stream_append to accept 'static stream
Update README to reflect streaming pipeline approach
Delete obsolete test_chunk.rs

feat: add pipeline mode for streaming replication

Replication to a follower has two phases:

Binary search phase: The leader runs a binary search to find the exact
matching position of log entries on the follower.
Pipeline mode: After finding the match point, the leader calls the
stream_append method on the network and continuously generates
AppendEntries requests. The network implementation should pipeline all
requests to the follower and yield responses. Note that responses and
requests don't have to be 1-to-1 mapped - the number of responses can be
smaller than the number of requests.

stream_append provides a default implementation that calls the existing
append_entries method to emulate streaming replication. A mature
implementation should run in real pipeline mode instead of request-response
manner. When a request is received by stream_append, it is responsible for
sending all content of the request to the follower - partial success is not
allowed.

On the leader, the Inflight structure in Engine tracks the inflight
replication session running by ReplicationCore. An InflightId identifies
each inflight session, and the Inflight structure ignores any response that
doesn't match the current InflightId.

This change also reorganizes replication data structures for clarity:

Add Payload enum for log replication specifications (LogIdRange, LogsSince)
Add Replicate struct combining inflight_id with Payload
Add ReplicationProgress to track local committed and remote matched state
Simplify drain_events() to set fields directly instead of returning values
Remove obsolete request.rs, replication_state.rs, log_state.rs

Changes:

Add pipeline mode entry in ProgressEntry::next_send() when fully caught up
Add Inflight::LogsSince variant for unbounded log streaming
Add is_logs_since() method to Inflight for type checking
Add get_partial_success() method to AppendEntriesResponse
Refactor drain_events() to set next_action and inflight_id directly
Add unit tests for pipeline mode in progress/entry/tests.rs

feat: add streaming replication with I/O progress synchronization

Enable replication tasks to synchronize with leader I/O progress using
watch channels. The replication stream monitors io_accepted_rx and
io_submitted_rx to detect leader changes and wait for log availability.

feat: add watch channel for I/O acceptance notification

Add io_accepted_tx watch channel to notify observers before I/O operations
are submitted to storage. This enables preparation for upcoming I/O events
before they actually happen.

Changes:

Add io_accepted_tx watch channel to RaftCore
Broadcast I/O acceptance before UpdateIOProgress, AppendEntries, and SaveVote

test: add assertion for watch channel changed() behavior

Add test assertion to verify that after changed() returns ready, a
subsequent call returns pending because the value was already seen.

Changes:

Add assertion for changed() returning pending after value is marked as seen

feat: add watch channel for I/O submission progress broadcast

Add io_submitted_tx watch channel to notify replication tasks when log
entries have been submitted to storage and are safe to read. This enables
replication tasks to coordinate with I/O progress without polling.

Changes:

Add io_submitted_tx watch channel to RaftCore
Broadcast I/O submission progress after AppendEntries, SaveVote, and UpdateIOProgress

chore: add doc

feat: add streaming replication support with `LogsSince` variant

Add infrastructure for open-ended streaming replication where the leader
continuously sends logs after a given point without a fixed upper bound.
This complements the existing fixed-range Logs replication mode.

chore: remove unused inflight-id

Improvement
Build/Testing/CI

This change is

Collection of small improvements including documentation, logging, configuration tuning, and test robustness fixes. Changes: - Reduce default network backoff from 500ms to 200ms for faster retries - Add doc comments for `IOProgress` fields (`accepted`, `submitted`, `flushed`) - Add debug logging to client-http example - Add test assertion for watch channel `changed()` pending behavior - Fix metrics test to handle missing heartbeat entries gracefully

Verify that a LogReader obtained before writing new entries can still read entries written after it was created. This ensures LogReader implementations don't cache or snapshot data in a way that makes newly written entries invisible.

Add infrastructure for open-ended streaming replication where the leader continuously sends logs after a given point without a fixed upper bound. This complements the existing fixed-range `Logs` replication mode.

Add `io_submitted_tx` watch channel to notify replication tasks when log entries have been submitted to storage and are safe to read. This enables replication tasks to coordinate with I/O progress without polling. Changes: - Add `io_submitted_tx` watch channel to `RaftCore` - Broadcast I/O submission progress after `AppendEntries`, `SaveVote`, and `UpdateIOProgress`

Add `io_accepted_tx` watch channel to notify observers before I/O operations are submitted to storage. This enables preparation for upcoming I/O events before they actually happen. Changes: - Add `io_accepted_tx` watch channel to `RaftCore` - Broadcast I/O acceptance before `UpdateIOProgress`, `AppendEntries`, and `SaveVote`

Enable replication tasks to synchronize with leader I/O progress using watch channels. The replication stream monitors io_accepted_rx and io_submitted_rx to detect leader changes and wait for log availability.

Replication to a follower has two phases: 1. **Binary search phase**: The leader runs a binary search to find the exact matching position of log entries on the follower. 2. **Pipeline mode**: After finding the match point, the leader calls the `stream_append` method on the network and continuously generates AppendEntries requests. The network implementation should pipeline all requests to the follower and yield responses. Note that responses and requests don't have to be 1-to-1 mapped - the number of responses can be smaller than the number of requests. `stream_append` provides a default implementation that calls the existing `append_entries` method to emulate streaming replication. A mature implementation should run in real pipeline mode instead of request-response manner. When a request is received by `stream_append`, it is responsible for sending all content of the request to the follower - partial success is not allowed. On the leader, the `Inflight` structure in Engine tracks the inflight replication session running by `ReplicationCore`. An `InflightId` identifies each inflight session, and the `Inflight` structure ignores any response that doesn't match the current `InflightId`. This change also reorganizes replication data structures for clarity: - Add `Payload` enum for log replication specifications (`LogIdRange`, `LogsSince`) - Add `Replicate` struct combining `inflight_id` with `Payload` - Add `ReplicationProgress` to track local committed and remote matched state - Simplify `drain_events()` to set fields directly instead of returning values - Remove obsolete `request.rs`, `replication_state.rs`, `log_state.rs` Changes: - Add pipeline mode entry in `ProgressEntry::next_send()` when fully caught up - Add `Inflight::LogsSince` variant for unbounded log streaming - Add `is_logs_since()` method to `Inflight` for type checking - Add `get_partial_success()` method to `AppendEntriesResponse` - Refactor `drain_events()` to set `next_action` and `inflight_id` directly - Add unit tests for pipeline mode in `progress/entry/tests.rs`

…ne replication Replace chunked append_entries with native gRPC bidirectional streaming via `stream_append`. This provides more efficient pipelined log replication. Changes: - Add `StreamAppend` RPC with bidirectional streaming to proto - Implement `stream_append` server handler in `raft_service.rs` - Implement `stream_append` client in `network/mod.rs` - Remove chunked `append_entries` fallback logic - Change `RaftNetworkV2::stream_append` to accept `'static` stream - Update README to reflect streaming pipeline approach - Delete obsolete `test_chunk.rs`

Replace `append_entries` with `stream_append` in HeartbeatWorker so applications only need to implement `stream_append` for replication.

Replace `append_entries` with `stream_append` in linearizable read confirmation so applications only need to implement `stream_append`.

…trait Modernize the getting-started documentation to focus on `RaftNetworkV2` as the recommended network trait. Add documentation for the optional `stream_append()` method for pipelined replication.

When rebuilding replication streams after a membership change, reuse existing streams instead of destroying all and recreating. This avoids unnecessary stream teardown and maintains in-flight replication state. Changes: - Reuse existing replication streams when targets remain in new membership - Only spawn new replication for newly added targets - Properly join and cleanup removed replication streams - Handle missing progress entries gracefully in `update_matching()` - Add debug logging for membership change operations

drmingdrmer requested a review from xp-trumpet December 10, 2025 09:56

drmingdrmer changed the title ~~docs: update getting-started guide to use RaftNetworkV2 as primary trait~~ feat: add pipeline mode for streaming replication Dec 10, 2025

drmingdrmer force-pushed the 285-pipeline branch 6 times, most recently from 7feca32 to 829ee8a Compare December 14, 2025 09:07

drmingdrmer added 13 commits December 14, 2025 17:16

test: add log_reader_reads_new_entries test

64fd801

Verify that a LogReader obtained before writing new entries can still read entries written after it was created. This ensures LogReader implementations don't cache or snapshot data in a way that makes newly written entries invisible.

chore: remove unused inflight-id

c9a68a8

feat: add streaming replication support with LogsSince variant

bb9e30f

Add infrastructure for open-ended streaming replication where the leader continuously sends logs after a given point without a fixed upper bound. This complements the existing fixed-range `Logs` replication mode.

feat: add streaming replication with I/O progress synchronization

255ad7a

Enable replication tasks to synchronize with leader I/O progress using watch channels. The replication stream monitors io_accepted_rx and io_submitted_rx to detect leader changes and wait for log availability.

refactor: use stream_append for heartbeat instead of append_entries

2ebc959

Replace `append_entries` with `stream_append` in HeartbeatWorker so applications only need to implement `stream_append` for replication.

refactor: use stream_append for linearizable read confirmation

b624a53

Replace `append_entries` with `stream_append` in linearizable read confirmation so applications only need to implement `stream_append`.

docs: update getting-started guide to use RaftNetworkV2 as primary …

6b8ddf4

…trait Modernize the getting-started documentation to focus on `RaftNetworkV2` as the recommended network trait. Add documentation for the optional `stream_append()` method for pipelined replication.

drmingdrmer force-pushed the 285-pipeline branch from 829ee8a to 24b736f Compare December 14, 2025 09:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add pipeline mode for streaming replication #1543

feat: add pipeline mode for streaming replication #1543

drmingdrmer commented Dec 10, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: add pipeline mode for streaming replication #1543

Are you sure you want to change the base?

feat: add pipeline mode for streaming replication #1543

Conversation

drmingdrmer commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog

docs: update getting-started guide to use RaftNetworkV2 as primary trait

refactor: use stream_append for linearizable read confirmation

refactor: use stream_append for heartbeat instead of append_entries

docs: gRPC example: implement gRPC bidirectional streaming for pipeline replication

feat: add pipeline mode for streaming replication

feat: add streaming replication with I/O progress synchronization

feat: add watch channel for I/O acceptance notification

test: add assertion for watch channel changed() behavior

feat: add watch channel for I/O submission progress broadcast

chore: add doc

feat: add streaming replication support with LogsSince variant

chore: remove unused inflight-id

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

drmingdrmer commented Dec 10, 2025 •

edited

Loading

docs: update getting-started guide to use `RaftNetworkV2` as primary trait

refactor: use `stream_append` for linearizable read confirmation

refactor: use `stream_append` for heartbeat instead of `append_entries`

feat: add streaming replication support with `LogsSince` variant