-
Notifications
You must be signed in to change notification settings - Fork 17
fix(handshake): Implement network epoch and persistent replay protection (PR #542) #653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(handshake): Implement network epoch and persistent replay protection (PR #542) #653
Conversation
…ion (PR #440) - Replace per-restart epoch increment with genesis-derived network epoch - Add persistent nonce fingerprint tracking across restarts - Implement TTL-based pruning for bounded storage - Add SeenResult enum for clearer replay detection API - Add compute_nonce_fingerprint() with context binding - Update all NonceCache call sites to pass network_epoch - Update tests to use new NetworkEpoch parameter - Add tests: network_epoch_is_stable, replay_rejected_across_restart, pruning_removes_old_entries - Add backward-compatible check_and_store() wrapper - Add legacy epoch migration support Fixes epoch desync bug where client/server had different epochs due to NonceCache::open() being called per-handshake instead of per-process. Network epoch is now stable, deterministic, and identical for all nodes on the same network. Replay protection uses persistent nonce fingerprints with TTL instead of relying on epoch increments. Resolves: #440
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR refactors the nonce cache replay protection system to use a stable, genesis-derived network epoch instead of per-restart epoch increments. The change fixes a critical bug where client and server had different epochs due to NonceCache::open() being called per-handshake instead of per-process.
Key changes:
- Replaced incrementing epoch with deterministic
NetworkEpochderived from genesis hash or chain ID - Added persistent nonce fingerprinting with context binding (network epoch + nonce + protocol version + peer role)
- Implemented TTL-based pruning with lazy cleanup every 10,000 insertions
- Added
SeenResultenum for clearer replay detection semantics - Provided backward-compatible
check_and_store()wrapper for migration
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| lib-network/src/handshake/nonce_cache.rs | Core implementation: NetworkEpoch struct, compute_nonce_fingerprint function, mark_nonce_seen API, persistent storage with TTL pruning, and comprehensive tests |
| lib-network/src/handshake/mod.rs | Exports NetworkEpoch, SeenResult, and compute_nonce_fingerprint; updates tests to pass network_epoch parameter |
| lib-network/src/web4/client.rs | Initializes NetworkEpoch from chain_id(0) with TODO for blockchain integration |
| lib-network/src/protocols/quic_mesh.rs | Initializes NetworkEpoch from chain_id(0) with TODO for blockchain integration |
| lib-network/src/protocols/wifi_direct_handshake.rs | Updates tests to create NetworkEpoch and pass to NonceCache::new_test |
| lib-network/src/protocols/quic_handshake.rs | Updates test helper to create NetworkEpoch and pass to NonceCache::new_test |
| lib-network/src/client/zhtp_client.rs | Initializes NetworkEpoch from chain_id(0) with TODO for blockchain integration |
| lib-network/src/bootstrap/peer_discovery.rs | Initializes NetworkEpoch from chain_id(0) with TODO for blockchain integration |
| lib-network/src/bootstrap/handshake.rs | Updates tests to create NetworkEpoch and pass to NonceCache::new_test |
…sh derivation - Replaced hardcoded chain_id(0) with actual genesis hash lookup via get_network_genesis() - Network epoch now derived from blockchain genesis when available - Added graceful fallback to chain_id(0) for dev/test environments with warning - Implemented #[deprecated] attribute on check_and_store() for proper compiler warnings - Updated all NetworkEpoch call sites: peer_discovery, zhtp_client, web4/client, quic_mesh - Proper network isolation now works automatically when genesis is set via set_network_genesis() This completes the ticket requirements by using actual blockchain context instead of TODOs. All tests passing. Addresses all Copilot review comments.
…lay protection Resolves PR #440 by implementing deterministic NetworkEpoch derivation and cross-restart replay protection for handshake nonces. ## Changes ### Core Implementation (lib-network/src/handshake/nonce_cache.rs) - Added NetworkEpoch struct with Blake3-based derivation from genesis hash - Implemented compute_nonce_fingerprint() for context-bound nonce hashing - Added SeenResult enum for clear replay detection API - Implemented mark_nonce_seen() with RocksDB persistent storage - Added prune_seen_nonces() with TTL-based cleanup - Implemented verify_or_store_network_epoch() for DB migration - Deprecated check_and_store() in favor of new SeenResult API - All nonces now stored with 'seen:' prefix in RocksDB ### Updated Call Sites - lib-network: peer_discovery.rs, zhtp_client.rs, web4/client.rs, quic_mesh.rs - zhtp: quic_handler.rs (2 instances), protocols/wifi.rs (2 instances) - All sites use get_network_genesis() with fallback to chain_id(0) for dev/test ### Benefits - Network-specific replay protection (different networks = different epochs) - Persistent nonce tracking survives node restarts - Backward compatible with existing databases - Clear deprecation path for legacy API Reviewed-by: GitHub Copilot Implements: Ticket #440 requirements R1, R2, R3
## Critical Fixes - **Error Handling**: Consistent RocksDB error handling (no more silent failures on delete) - **Lazy Pruning**: Add error logging and cache health monitoring - **Genesis Fallback**: Implement production vs dev check with conditional compilation - **Deserialization**: Improve error messages with detailed context - **Cache Overflow**: Clarify warnings when memory cache fills during load - **Cleanup Task**: Monitor cache health with size/utilization metrics - **Atomic Counter**: Replace Arc<RwLock<u64>> with AtomicU64 for efficiency ## Updated Fallback Pattern - New `NetworkEpoch::from_global_or_fail()` helper: - Dev builds: Fallback to chain_id(0) with warning (acceptable) - Release builds: Error to prevent production misconfiguration - Applied to 6 locations across lib-network and zhtp ## New Test Coverage Added 8 critical security tests: - Corrupted database entry recovery - LRU eviction replay protection - Concurrent pruning race conditions - TTL boundary edge cases - Genesis hashing consistency - Legacy epoch migration - System time error handling - Network isolation verification All tests passing (21/21). No compilation errors. Fixes: #440, #653
|



Fixes epoch desync bug where client/server had different epochs due to NonceCache::open() being called per-handshake instead of per-process.
Network epoch is now stable, deterministic, and identical for all nodes on the same network. Replay protection uses persistent nonce fingerprints with TTL instead of relying on epoch increments.
Resolves: #440