Skip to content

Remote comms: Error pattern analysis and permanent failure detection #688

@sirtimid

Description

@sirtimid

Problem: isRetryableNetworkError treats all network errors as retryable, but we can't distinguish "not running now" from "will never be running again" from "wrong address". This causes wasted retries for permanently unreachable peers.

Expected Behavior:

  • Track error patterns per peer over time (error codes, frequency, success rate)
  • Classify persistent failures as permanently non-retryable after threshold
  • Stop retrying when pattern indicates permanent failure (wrong address, dead peer)
  • Continue retrying for transient failures (temporary network issues)

Implementation:

  • Add error tracking to ReconnectionManager (error history per peer)
  • Track consecutive identical errors, error frequency, and success rate
  • Implement heuristics: e.g., "same error code N times without success = permanent"
  • Add permanent failure state to ReconnectionManager
  • Modify isRetryableNetworkError or add isPermanentlyFailed(peerId) check
  • Integrate with attemptReconnection to stop on permanent failure
  • Consider "wrong address" patterns (persistent ECONNREFUSED/EHOSTUNREACH)

Acceptance Criteria:

  • Error patterns are tracked per peer
  • Persistent failures are classified as permanent after threshold
  • Permanent failures stop retry attempts
  • Transient failures continue to retry
  • Tests verify pattern detection and permanent failure classification

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions