Add streaming support for TTS and ASR backends

## Summary

Add true streaming support to both TTS (Kokoro) and ASR (Whisper) backends while maintaining subprocess isolation for guaranteed memory cleanup.

## Current State

| Backend | Input | Output | Subprocess Isolation |
|---------|-------|--------|---------------------|
| **Kokoro TTS** | Text | Full audio (wait for completion) | ✅ |
| **Whisper ASR** | Audio chunks (WebSocket) | Full transcription (wait for completion) | ✅ |

## Proposed: Unified Streaming Architecture

Use `multiprocessing.Queue` for IPC to stream chunks while keeping subprocess isolation:

```
┌─────────────────────────────────────────────────────────────┐
│                     Main Process                             │
│  ┌─────────────┐    Queue (in)     ┌─────────────────────┐  │
│  │  API Layer  │ ───────────────►  │                     │  │
│  │  (FastAPI)  │                   │  Subprocess Worker  │  │
│  │             │ ◄───────────────  │  (Model loaded)     │  │
│  └─────────────┘    Queue (out)    └─────────────────────┘  │
└─────────────────────────────────────────────────────────────┘
```

| Backend | Input Queue | Output Queue |
|---------|-------------|--------------|
| **TTS (Kokoro)** | Text + voice + speed | Audio chunks (~3s batches) |
| **ASR (Whisper)** | Audio chunks | Partial transcriptions |

## Benefits

- **Lower latency**: First byte/word delivered faster
- **Memory isolation preserved**: Subprocess termination guarantees GPU memory release
- **Unified pattern**: Same architecture for TTS and ASR
- **Code reuse**: Shared `StreamingSubprocessBackend` base class

## Implementation Notes

- Kokoro-FastAPI achieves ~300ms first-token latency but uses in-process model (no subprocess isolation)
- Our approach trades some latency (~3s batches) for guaranteed memory cleanup
- Could use `multiprocessing.Manager().Queue()` which is picklable for ProcessPoolExecutor

## Estimate

~330 lines total for both backends with shared base class.

## References

- Kokoro-FastAPI streaming implementation: https://github.com/remsky/Kokoro-FastAPI
- faster-whisper supports VAD-based streaming via `vad_filter=True`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add streaming support for TTS and ASR backends #263

Summary

Current State

Proposed: Unified Streaming Architecture

Benefits

Implementation Notes

Estimate

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Backend	Input	Output	Subprocess Isolation
Kokoro TTS	Text	Full audio (wait for completion)	✅
Whisper ASR	Audio chunks (WebSocket)	Full transcription (wait for completion)	✅

Backend	Input Queue	Output Queue
TTS (Kokoro)	Text + voice + speed	Audio chunks (~3s batches)
ASR (Whisper)	Audio chunks	Partial transcriptions

Add streaming support for TTS and ASR backends #263

Description

Summary

Current State

Proposed: Unified Streaming Architecture

Benefits

Implementation Notes

Estimate

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions