Skip to content

Conversation

@PeterGRutherford
Copy link
Contributor

  • Replace per-restart epoch increment with genesis-derived network epoch
  • Add persistent nonce fingerprint tracking across restarts
  • Implement TTL-based pruning for bounded storage
  • Add SeenResult enum for clearer replay detection API
  • Add compute_nonce_fingerprint() with context binding
  • Update all NonceCache call sites to pass network_epoch
  • Update tests to use new NetworkEpoch parameter
  • Add tests: network_epoch_is_stable, replay_rejected_across_restart, pruning_removes_old_entries
  • Add backward-compatible check_and_store() wrapper
  • Add legacy epoch migration support

Fixes epoch desync bug where client/server had different epochs due to NonceCache::open() being called per-handshake instead of per-process.

Network epoch is now stable, deterministic, and identical for all nodes on the same network. Replay protection uses persistent nonce fingerprints with TTL instead of relying on epoch increments.

Resolves: #440

…ion (PR #440)

- Replace per-restart epoch increment with genesis-derived network epoch
- Add persistent nonce fingerprint tracking across restarts
- Implement TTL-based pruning for bounded storage
- Add SeenResult enum for clearer replay detection API
- Add compute_nonce_fingerprint() with context binding
- Update all NonceCache call sites to pass network_epoch
- Update tests to use new NetworkEpoch parameter
- Add tests: network_epoch_is_stable, replay_rejected_across_restart, pruning_removes_old_entries
- Add backward-compatible check_and_store() wrapper
- Add legacy epoch migration support

Fixes epoch desync bug where client/server had different epochs due to
NonceCache::open() being called per-handshake instead of per-process.

Network epoch is now stable, deterministic, and identical for all nodes
on the same network. Replay protection uses persistent nonce fingerprints
with TTL instead of relying on epoch increments.

Resolves: #440
Copilot AI review requested due to automatic review settings January 8, 2026 22:10
@PeterGRutherford PeterGRutherford linked an issue Jan 8, 2026 that may be closed by this pull request
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the nonce cache replay protection system to use a stable, genesis-derived network epoch instead of per-restart epoch increments. The change fixes a critical bug where client and server had different epochs due to NonceCache::open() being called per-handshake instead of per-process.

Key changes:

  • Replaced incrementing epoch with deterministic NetworkEpoch derived from genesis hash or chain ID
  • Added persistent nonce fingerprinting with context binding (network epoch + nonce + protocol version + peer role)
  • Implemented TTL-based pruning with lazy cleanup every 10,000 insertions
  • Added SeenResult enum for clearer replay detection semantics
  • Provided backward-compatible check_and_store() wrapper for migration

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
lib-network/src/handshake/nonce_cache.rs Core implementation: NetworkEpoch struct, compute_nonce_fingerprint function, mark_nonce_seen API, persistent storage with TTL pruning, and comprehensive tests
lib-network/src/handshake/mod.rs Exports NetworkEpoch, SeenResult, and compute_nonce_fingerprint; updates tests to pass network_epoch parameter
lib-network/src/web4/client.rs Initializes NetworkEpoch from chain_id(0) with TODO for blockchain integration
lib-network/src/protocols/quic_mesh.rs Initializes NetworkEpoch from chain_id(0) with TODO for blockchain integration
lib-network/src/protocols/wifi_direct_handshake.rs Updates tests to create NetworkEpoch and pass to NonceCache::new_test
lib-network/src/protocols/quic_handshake.rs Updates test helper to create NetworkEpoch and pass to NonceCache::new_test
lib-network/src/client/zhtp_client.rs Initializes NetworkEpoch from chain_id(0) with TODO for blockchain integration
lib-network/src/bootstrap/peer_discovery.rs Initializes NetworkEpoch from chain_id(0) with TODO for blockchain integration
lib-network/src/bootstrap/handshake.rs Updates tests to create NetworkEpoch and pass to NonceCache::new_test

…sh derivation

- Replaced hardcoded chain_id(0) with actual genesis hash lookup via get_network_genesis()
- Network epoch now derived from blockchain genesis when available
- Added graceful fallback to chain_id(0) for dev/test environments with warning
- Implemented #[deprecated] attribute on check_and_store() for proper compiler warnings
- Updated all NetworkEpoch call sites: peer_discovery, zhtp_client, web4/client, quic_mesh
- Proper network isolation now works automatically when genesis is set via set_network_genesis()

This completes the ticket requirements by using actual blockchain context instead of TODOs.
All tests passing. Addresses all Copilot review comments.
…lay protection

Resolves PR #440 by implementing deterministic NetworkEpoch derivation
and cross-restart replay protection for handshake nonces.

## Changes

### Core Implementation (lib-network/src/handshake/nonce_cache.rs)
- Added NetworkEpoch struct with Blake3-based derivation from genesis hash
- Implemented compute_nonce_fingerprint() for context-bound nonce hashing
- Added SeenResult enum for clear replay detection API
- Implemented mark_nonce_seen() with RocksDB persistent storage
- Added prune_seen_nonces() with TTL-based cleanup
- Implemented verify_or_store_network_epoch() for DB migration
- Deprecated check_and_store() in favor of new SeenResult API
- All nonces now stored with 'seen:' prefix in RocksDB

### Updated Call Sites
- lib-network: peer_discovery.rs, zhtp_client.rs, web4/client.rs, quic_mesh.rs
- zhtp: quic_handler.rs (2 instances), protocols/wifi.rs (2 instances)
- All sites use get_network_genesis() with fallback to chain_id(0) for dev/test

### Benefits
- Network-specific replay protection (different networks = different epochs)
- Persistent nonce tracking survives node restarts
- Backward compatible with existing databases
- Clear deprecation path for legacy API

Reviewed-by: GitHub Copilot
Implements: Ticket #440 requirements R1, R2, R3
## Critical Fixes

- **Error Handling**: Consistent RocksDB error handling (no more silent failures on delete)
- **Lazy Pruning**: Add error logging and cache health monitoring
- **Genesis Fallback**: Implement production vs dev check with conditional compilation
- **Deserialization**: Improve error messages with detailed context
- **Cache Overflow**: Clarify warnings when memory cache fills during load
- **Cleanup Task**: Monitor cache health with size/utilization metrics
- **Atomic Counter**: Replace Arc<RwLock<u64>> with AtomicU64 for efficiency

## Updated Fallback Pattern

- New `NetworkEpoch::from_global_or_fail()` helper:
  - Dev builds: Fallback to chain_id(0) with warning (acceptable)
  - Release builds: Error to prevent production misconfiguration
- Applied to 6 locations across lib-network and zhtp

## New Test Coverage

Added 8 critical security tests:
- Corrupted database entry recovery
- LRU eviction replay protection
- Concurrent pruning race conditions
- TTL boundary edge cases
- Genesis hashing consistency
- Legacy epoch migration
- System time error handling
- Network isolation verification

All tests passing (21/21). No compilation errors.

Fixes: #440, #653
@sonarqubecloud
Copy link

@umwelt umwelt merged commit c699059 into development Jan 10, 2026
4 checks passed
@umwelt umwelt deleted the fix/network-derived-epoch-persistent-replay-cache branch January 10, 2026 08:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ALPHA-P1] Network-Derived Epoch + Persistent Replay Cache

3 participants