Skip to content

Conversation

@KyrinCode
Copy link

Description

This PR adds a new flag --raft.no-shutdown-on-remove to op-conductor that prevents Raft from shutting down when the node is removed from the cluster.

Problem

When a node is removed from the cluster via conductor_removeServer, HashiCorp Raft's default behavior (ShutdownOnRemove=true) causes the Raft instance to completely shut down. This means:

  • The node cannot be re-added to the cluster without restarting the entire process
  • This creates operational complexity when managing cluster membership dynamically
  • In scenarios where nodes need to be temporarily removed and re-added (e.g., during maintenance or role changes), a full restart is required

Solution

Add a configurable flag that sets ShutdownOnRemove=false in Raft configuration. When enabled:

  • The node transitions to follower state instead of shutting down when removed
  • The node can be immediately re-added to the cluster via conductor_addServerAsVoter or conductor_addServerAsNonvoter
  • No process restart is required

Changes

  • Added --raft.no-shutdown-on-remove flag (default: false, backward compatible)
  • Added NoShutdownOnRemove field to RaftConsensusConfig
  • Configured Raft's ShutdownOnRemove based on the flag value

Tests

Manual testing was performed:

  1. Started a 3-node conductor cluster with --raft.no-shutdown-on-remove enabled
  2. Removed a node via conductor_removeServer
  3. Verified the node transitioned to follower state (instead of Raft shutting down)
  4. Re-added the node via conductor_addServerAsVoter
  5. Verified the node successfully rejoined the cluster without restarting (by checking the conductor_clusterMembership of this node)

Additional context

This feature is opt-in and disabled by default to maintain backward compatibility with existing deployments. The default Raft behavior (shutdown on remove) is preserved unless explicitly configured otherwise.
Use cases for this flag:

  • Dynamic cluster membership management without downtime
  • Graceful role transitions (voter ↔ nonvoter)
  • Easier disaster recovery scenarios

Metadata

  • Related to cluster membership management and operational flexibility

@KyrinCode KyrinCode requested review from a team as code owners December 29, 2025 12:36
@KyrinCode KyrinCode requested a review from mslipper December 29, 2025 12:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant