Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
341 changes: 341 additions & 0 deletions MULTITHREADED_IMPLEMENTATION_PLAN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,341 @@
# Multithreaded DNS Server Implementation Plan

## Overview
This document outlines the plan to transform the current single-threaded DNS server into a truly multithreaded, high-performance DNS resolver capable of handling thousands of concurrent queries.

## Current Architecture Analysis

**Current Bottleneck**: The main loop in `main.rs:75-211` processes DNS queries sequentially:
```rust
loop {
let (len, addr) = sock.recv_from(&mut buf).await?; // Blocks until packet received
// Process packet synchronously
// Send response
}
```

**Performance Impact**: Each query blocks the entire server until completion, limiting throughput to ~100 queries/sec.

## 1. Core Architecture Changes

### Main Loop Refactoring (`main.rs:75-211`)
**Objective**: Replace blocking loop with task spawning architecture

**Current Implementation**:
- Sequential processing in main loop
- Each query waits for previous query completion
- Single point of failure

**New Implementation**:
```rust
loop {
let (len, addr) = sock.recv_from(&mut buf).await?;
let packet_data = buf[..len].to_vec();
let query_handle = query_actor_handle.clone();
let sock_clone = sock.clone(); // Arc<UdpSocket>

tokio::spawn(async move {
process_dns_query(packet_data, addr, query_handle, sock_clone).await;
});
}
```

**Benefits**:
- Each incoming UDP packet spawns independent task
- Main loop becomes purely packet reception and task dispatch
- Concurrent processing of multiple queries

### Task Isolation Strategy
- Create `process_dns_query()` function to handle individual queries
- Move packet decoding, resolution, and response logic into spawned tasks
- Ensure each task is self-contained and isolated

## 2. Shared State Management

### UdpSocket Sharing
**Challenge**: Multiple tasks need to send responses via same socket

**Solution**:
- Wrap `UdpSocket` in `Arc<UdpSocket>` for thread-safe sharing
- Each spawned task gets cloned reference for response sending
- Tokio's UdpSocket is already thread-safe for concurrent operations

### QueryActorHandle Management
**Current State**: Already implements `Clone` trait
**Strategy**:
- Safe to clone across tasks
- Consider implementing connection pooling for better throughput
- Each task gets independent handle to actor system

### Thread Safety Considerations
- All shared components use tokio's async-safe primitives
- No additional synchronization required for basic implementation
- Actor pattern already provides message-passing concurrency

## 3. Connection Pooling Strategy

### Multiple QueryActor Instances
**Objective**: Eliminate single QueryActor bottleneck

**Current Limitation**: Single QueryActor processes all DNS resolutions
**Solution**: Create pool of QueryActor instances

```rust
struct QueryActorPool {
actors: Vec<QueryActorHandle>,
current: AtomicUsize,
size: usize,
}

impl QueryActorPool {
fn new(pool_size: usize, resolver: Resolver<TokioConnectionProvider>) -> Self {
let actors = (0..pool_size)
.map(|_| QueryActorHandle::new(resolver.clone()))
.collect();

Self {
actors,
current: AtomicUsize::new(0),
size: pool_size,
}
}

fn get_actor(&self) -> &QueryActorHandle {
let index = self.current.fetch_add(1, Ordering::Relaxed) % self.size;
&self.actors[index]
}
}
```

### Distribution Strategies
- **Round-robin**: Simple, even distribution
- **Least-busy**: More complex, better load balancing
- **Recommended**: Start with round-robin, optimize later

### Pool Sizing
- **Recommended**: 10-50 actors depending on expected load
- **Consideration**: Each actor maintains resolver connection
- **Tuning**: Monitor connection pool utilization

## 4. Resource Management

### Task Limits
**Problem**: Unlimited task spawning can exhaust system resources

**Solution**: Implement semaphore-based limiting
```rust
use tokio::sync::Semaphore;

static QUERY_SEMAPHORE: Semaphore = Semaphore::const_new(1000);

// In main loop:
let permit = QUERY_SEMAPHORE.acquire().await.unwrap();
tokio::spawn(async move {
let _permit = permit; // Hold permit for task duration
process_dns_query(packet_data, addr, query_handle, sock_clone).await;
});
```

**Recommended Limits**:
- Development: 100-500 concurrent queries
- Production: 1000-5000 concurrent queries
- Adjust based on available memory and CPU cores

### Memory Management
**Optimizations**:
- Use `BytesMut` efficiently to avoid excessive allocations
- Consider buffer pooling for high-throughput scenarios
- Monitor memory usage per task

**Buffer Pool Implementation** (Optional):
```rust
use tokio::sync::Mutex;

struct BufferPool {
buffers: Mutex<Vec<Vec<u8>>>,
buffer_size: usize,
}
```

## 5. Error Handling & Resilience

### Task Isolation
**Benefits**:
- Failed tasks don't crash entire server
- Graceful handling of malformed packets
- Individual query timeouts

**Implementation**:
```rust
tokio::spawn(async move {
let result = tokio::time::timeout(
Duration::from_secs(30),
process_dns_query(packet_data, addr, query_handle, sock_clone)
).await;

match result {
Ok(_) => {}, // Success
Err(_) => tracing::warn!("Query timeout for {}", addr),
}
});
```

### Graceful Degradation
**Circuit Breaker Pattern**:
- Track upstream DNS resolver failures
- Temporarily disable failing resolvers
- Fallback to alternative resolvers

**Fallback Responses**:
- Return SERVFAIL for resolution failures
- Implement basic error response generation
- Log errors for monitoring

### Timeout Management
**Query Timeouts**: 5-30 seconds per query
**Resolution Timeouts**: Configure in hickory-resolver
**Connection Timeouts**: Handle at actor level

## 6. Performance Optimizations

### Async I/O Efficiency
**Current**: Single-threaded UDP operations
**Optimization**: Leverage tokio's async I/O

**Advanced Options**:
- Consider `io_uring` backend on Linux for maximum performance
- Batch response sending where possible
- Use vectored I/O for multiple responses

### Future Enhancements
**Caching Layer**:
- Add Redis or in-memory cache for frequent queries
- TTL-based cache invalidation
- Cache hit ratio monitoring

**Load Balancing**:
- Multiple upstream DNS servers
- Health checking and failover
- Geographic DNS routing

## 7. Monitoring & Metrics

### Concurrent Operations Tracking
**Key Metrics**:
- Active task count
- Query processing latency (p50, p95, p99)
- Throughput (queries per second)
- Upstream resolver health

**Implementation**:
```rust
use std::sync::atomic::{AtomicU64, Ordering};

static ACTIVE_QUERIES: AtomicU64 = AtomicU64::new(0);
static TOTAL_QUERIES: AtomicU64 = AtomicU64::new(0);

// In query processing:
ACTIVE_QUERIES.fetch_add(1, Ordering::Relaxed);
// ... process query ...
ACTIVE_QUERIES.fetch_sub(1, Ordering::Relaxed);
TOTAL_QUERIES.fetch_add(1, Ordering::Relaxed);
```

### Resource Monitoring
**System Metrics**:
- Memory usage per task
- Connection pool utilization
- Error rates by query type
- Response time distribution

**Logging Strategy**:
- Structured logging with tracing
- Query-level tracing spans
- Performance metrics export

## 8. Implementation Phases

### Phase 1: Basic Concurrency (Critical)
**Priority**: Immediate
**Components**:
- Task spawning in main loop
- Shared UdpSocket with Arc
- Basic error handling
- Simple resource limits

**Expected Improvement**: 10x throughput increase

### Phase 2: Connection Pooling (Important)
**Priority**: Short-term
**Components**:
- QueryActor pool implementation
- Round-robin distribution
- Advanced resource management
- Timeout handling

**Expected Improvement**: 5x additional throughput increase

### Phase 3: Advanced Features (Enhancement)
**Priority**: Medium-term
**Components**:
- Caching layer
- Advanced monitoring
- Circuit breaker pattern
- Performance optimizations

**Expected Improvement**: 2-3x additional throughput increase

## 9. Testing Strategy

### Load Testing
**Tools**: Use `dig` with scripts or specialized DNS load testing tools
**Scenarios**:
- Concurrent query burst testing
- Sustained load testing
- Failure scenario testing

### Performance Benchmarks
**Baseline**: Current single-threaded performance
**Targets**:
- Phase 1: 1,000+ concurrent queries
- Phase 2: 5,000+ concurrent queries
- Phase 3: 10,000+ concurrent queries

### Integration Testing
**Scenarios**:
- Multiple client connections
- Mixed query types (A, AAAA, CNAME)
- Upstream DNS server failures
- Network latency simulation

## 10. Configuration

### Environment Variables
```bash
DNS_SERVER_MAX_CONCURRENT_QUERIES=1000
DNS_SERVER_ACTOR_POOL_SIZE=20
DNS_SERVER_QUERY_TIMEOUT_SECS=30
DNS_SERVER_ENABLE_CACHING=true
```

### Runtime Configuration
- Adjustable limits without restart
- Pool size modification
- Timeout adjustments
- Feature toggles

## Expected Performance Outcomes

**Current Performance**: ~100 queries/second (sequential)
**Phase 1 Target**: 1,000+ queries/second (10x improvement)
**Phase 2 Target**: 5,000+ queries/second (50x improvement)
**Phase 3 Target**: 10,000+ queries/second (100x improvement)

**Resource Requirements**:
- Memory: 2-8 GB depending on concurrent load
- CPU: 2-8 cores for optimal performance
- Network: Gigabit connection recommended

## Conclusion

This implementation plan transforms the DNS server from a simple sequential processor to a high-performance, concurrent DNS resolver capable of handling enterprise-level query loads. The phased approach allows for incremental improvements while maintaining system stability and providing measurable performance gains at each stage.
Loading