Skip to content

When a ping or balance fails because of a 4xx, a sensible retry strategy needs to be implemented #10

@nevali

Description

@nevali

If a ping fails due to a 4xx (with the default ETCD_EXISTS), we should retry with ETCD_NONE to attempt to re-create the value. If that fails, we should close and re-open the directory.

  • If re-opening the directory fails, we should invoke a (new) callback to inform the application that the cluster has been forcibly left.
  • This process should be completed while the write-lock is held, which would prevent the other thread from interfering with it.
  • Once the cluster has been re-acquired, we should set a (new) flag to inform the other thread that the cluster state has changed.
  • Regardless of whether re-creating the value succeeds, cluster_etcd_balance_() should be invoked as part of the re-acquisition process to ensure that member state data is up to date.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions