Elixir wrapper for Chatterbox TTS - state-of-the-art text-to-speech models from Resemble AI.
- Zero-shot voice cloning
- Three model variants:
- Turbo (350M params) - Low-latency English TTS with paralinguistic tags
- English (500M params) - High-quality English with CFG controls
- Multilingual (500M params) - 23+ languages support
- Simple Elixir API via GenServer
- Elixir 1.14+
- Python 3.10 or 3.11 (3.12+ not supported)
- CUDA GPU, Apple Silicon (MPS), or CPU
chatterbox-ttsPython package
- Add
chatterbexto your dependencies inmix.exs:
def deps do
[
{:chatterbex, "~> 0.1.0"}
]
end- Install the Python dependencies:
mix chatterbex.setupOr manually:
pip install chatterbox-tts# CUDA (NVIDIA GPU)
mix chatterbex.setup --cuda
# Apple Silicon (M1/M2/M3/M4)
mix chatterbex.setup --mps
# CPU-only (no GPU required, smaller download)
mix chatterbex.setup --cpu
# Use a virtual environment
mix chatterbex.setup --mps --venv .venv# Start a Turbo model server
{:ok, pid} = Chatterbex.start_link(model: :turbo)
# Generate speech
{:ok, audio} = Chatterbex.generate(pid, "Hello, world!")
# Save to file
:ok = Chatterbex.save(audio, "output.wav")Provide a reference audio file (10 seconds recommended) for voice cloning:
{:ok, audio} = Chatterbex.generate(pid, "Hello in your voice!",
audio_prompt: "path/to/reference.wav"
)The Turbo model supports embedded emotional expressions:
{:ok, audio} = Chatterbex.generate(pid, "That's hilarious [laugh] I can't believe it!")Available tags: [laugh], [chuckle], [cough], [sigh], [gasp], [groan], [yawn], [sniff], [clearing_throat]
{:ok, pid} = Chatterbex.start_link(model: :multilingual)
# French
{:ok, audio} = Chatterbex.generate(pid, "Bonjour, comment allez-vous?", language: "fr")
# German
{:ok, audio} = Chatterbex.generate(pid, "Guten Tag, wie geht es Ihnen?", language: "de")
# Chinese
{:ok, audio} = Chatterbex.generate(pid, "你好,今天天气真不错", language: "zh")# Use Apple Silicon GPU (MPS)
{:ok, pid} = Chatterbex.start_link(model: :turbo, device: "mps")
# Use CPU instead of GPU
{:ok, pid} = Chatterbex.start_link(model: :english, device: "cpu")
# English model with exaggeration control
{:ok, audio} = Chatterbex.generate(pid, "This is exciting!",
exaggeration: 0.7,
cfg_weight: 0.5
)# Start with a name for easy access
{:ok, _pid} = Chatterbex.start_link(model: :turbo, name: MyApp.TTS)
# Use anywhere in your app
{:ok, audio} = Chatterbex.generate(MyApp.TTS, "Hello!")| Option | Description | Default |
|---|---|---|
:model |
Model variant (:turbo, :english, :multilingual) |
:turbo |
:device |
Compute device ("cuda", "mps", "cpu") |
"cuda" |
:name |
GenServer name | nil |
| Option | Model | Description |
|---|---|---|
:audio_prompt |
All | Path to reference audio for voice cloning |
:language |
Multilingual | Language code (e.g., "fr", "de", "zh") |
:exaggeration |
English | Expression intensity (0.0 - 1.0) |
:cfg_weight |
English | Classifier-free guidance weight |
The multilingual model supports: English, French, German, Spanish, Italian, Portuguese, Dutch, Polish, Russian, Ukrainian, Czech, Slovak, Hungarian, Romanian, Bulgarian, Croatian, Slovenian, Serbian, Macedonian, Albanian, Turkish, Arabic, Hebrew, Chinese, Japanese, Korean, Vietnamese, Thai, Indonesian, Malay.
Chatterbex uses Erlang ports to communicate with a Python process running the Chatterbox models. Each Chatterbex.start_link/1 call spawns a dedicated Python process with the loaded model, allowing multiple models or instances to run concurrently.
+---------------+ JSON/stdin +-----------------+
| Elixir | --------------------> | Python |
| GenServer | | Chatterbox TTS |
| | <-------------------- | |
+---------------+ Base64 WAV/stdout +-----------------+
See the examples directory for runnable scripts:
- hello_world.exs - Basic text-to-speech
- voice_cloning.exs - Clone a voice from reference audio
- multilingual.exs - Generate speech in 23+ languages
mix run examples/hello_world.exs --text "Hello" --device mps
mix run examples/voice_cloning.exs --reference voice.wav
mix run examples/multilingual.exs --text "Bonjour" --language fr- Examples README - Detailed usage for all examples
- Architecture Decision Records - Design decisions and rationale
MIT License - See LICENSE for details.
- Resemble AI for the Chatterbox TTS models
- The Elixir community
