-
Notifications
You must be signed in to change notification settings - Fork 9
Open
Description
Summary
Add true streaming support to both TTS (Kokoro) and ASR (Whisper) backends while maintaining subprocess isolation for guaranteed memory cleanup.
Current State
| Backend | Input | Output | Subprocess Isolation |
|---|---|---|---|
| Kokoro TTS | Text | Full audio (wait for completion) | ✅ |
| Whisper ASR | Audio chunks (WebSocket) | Full transcription (wait for completion) | ✅ |
Proposed: Unified Streaming Architecture
Use multiprocessing.Queue for IPC to stream chunks while keeping subprocess isolation:
┌─────────────────────────────────────────────────────────────┐
│ Main Process │
│ ┌─────────────┐ Queue (in) ┌─────────────────────┐ │
│ │ API Layer │ ───────────────► │ │ │
│ │ (FastAPI) │ │ Subprocess Worker │ │
│ │ │ ◄─────────────── │ (Model loaded) │ │
│ └─────────────┘ Queue (out) └─────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
| Backend | Input Queue | Output Queue |
|---|---|---|
| TTS (Kokoro) | Text + voice + speed | Audio chunks (~3s batches) |
| ASR (Whisper) | Audio chunks | Partial transcriptions |
Benefits
- Lower latency: First byte/word delivered faster
- Memory isolation preserved: Subprocess termination guarantees GPU memory release
- Unified pattern: Same architecture for TTS and ASR
- Code reuse: Shared
StreamingSubprocessBackendbase class
Implementation Notes
- Kokoro-FastAPI achieves ~300ms first-token latency but uses in-process model (no subprocess isolation)
- Our approach trades some latency (~3s batches) for guaranteed memory cleanup
- Could use
multiprocessing.Manager().Queue()which is picklable for ProcessPoolExecutor
Estimate
~330 lines total for both backends with shared base class.
References
- Kokoro-FastAPI streaming implementation: https://github.com/remsky/Kokoro-FastAPI
- faster-whisper supports VAD-based streaming via
vad_filter=True
Metadata
Metadata
Assignees
Labels
No labels