🎵 TTS Server - Local Text-to-Speech Service

Version 0.2.0 | High-performance local TTS server powered by Kokoro-82M ONNX model

📖 Overview

A lightweight, blazing-fast text-to-speech server designed for the MyDictionary Chrome extension. Features 54 high-quality voices with automatic model downloading and intelligent caching. The macOS version now runs as a background menubar application.

✨ Features

🎤 54 Premium Voices - British/American English, male/female options
⚡ Lightning Fast - Rust-powered, sub-second synthesis
💾 Smart Caching - SHA256-based file caching with TTL, stored in ~/Library/Application Support/tts-server/
🔄 Auto Download - Models download automatically on first run
🌐 REST API - Simple HTTP endpoints for easy integration
🎯 Browser Compatible - 16-bit PCM WAV output
🖥️ macOS Menubar App - Runs silently in the background with a menubar icon for quick access and control.
🔒 Single Instance - Prevents multiple instances from running concurrently.
🪵 Detailed Logging - Logs are written to ~/Library/Application Support/tts-server/logs/

🚀 Quick Start

Option 1: Download Pre-built Binary (Recommended)

macOS (Apple Silicon & Intel)

The macOS version is now a self-contained .app bundle that runs as a background menubar application.

# 1. Download the latest TTS Server.app from the releases page:
#    (e.g., https://github.com/jhfnetboy/Candle-local-AI-Server/releases/download/v0.2.0/TTS_Server.app.zip)

# 2. Extract the downloaded archive (if it's a .zip or .tar.gz)
#    (Example for .zip):
#    unzip TTS_Server.app.zip

# 3. Move/Drag the "TTS Server.app" to your /Applications folder.
mv TTS_Server.app /Applications/

# 4. Install espeak-ng (required for phonemization)
brew install espeak-ng

# 5. Launch the application
#    You can double-click it from your /Applications folder, or run:
open /Applications/TTS\ Server.app

The application will:

Run silently in the background with an icon in your macOS menubar (top-right).
Start the server on http://localhost:9527.
Download models automatically on first run (~310MB ONNX model, ~50MB voice data). This will be stored in ~/Library/Application Support/tts-server/checkpoints/ and ~/Library/Application Support/tts-server/data/.
Create a cache directory for audio files in ~/Library/Application Support/tts-server/cache/audio/.
Generate detailed logs in ~/Library/Application Support/tts-server/logs/.

Menubar Icon Usage:

Left-click on the icon to show "Open UI" and "Quit" options.
"Open UI" will open http://localhost:9527 in your default browser.
"Quit" will gracefully shut down the server.

常见问题解决:

如果遇到 "cannot be opened because it is from an unidentified developer"
- 请在 /Applications 文件夹中右键点击 TTS Server.app，选择“打开”。系统可能会询问是否确定要打开，点击“打开”即可。此操作通常只需进行一次。
如果遇到 "espeak-ng: command not found"
- 安装: brew install espeak-ng

Windows (x64)

⚠️ Windows 版本将在未来版本发布 (预计 v0.2.0 后)

目前仅支持 macOS。Windows 用户可以选择从源码构建。

Option 2: Build from Source

Prerequisites:

Rust 1.70+ (Install Rust)
espeak-ng

# Clone the repository
git clone https://github.com/jhfnetboy/Candle-local-AI-Server.git
cd Candle-local-AI-Server

# Install espeak-ng
# macOS:
brew install espeak-ng
# Ubuntu:
sudo apt-get install espeak-ng
# Windows:
choco install espeak-ng

# Build release version (for macOS, this will generate a .app bundle)
cargo bundle --release

# For macOS, move the generated .app to Applications and launch:
mv target/release/bundle/osx/TTS\ Server.app /Applications/
open /Applications/TTS\ Server.app

# For Linux/Windows, run the raw binary (if you don't need a UI)
# ./target/release/tts-server

🔗 Integration with MyDictionary Extension

Step 1: Start TTS Server

# Make sure the server is running (e.g., double-click TTS Server.app or run from terminal)
# You should see the menubar icon if on macOS.

# You can check server health via:
curl http://localhost:9527/health

Step 2: Install MyDictionary Extension

Download MyDictionary extension from Chrome Web Store or build from source
The extension will automatically detect the local TTS server
Open extension settings → TTS Voice Settings
You'll see a green "✅ Connected" indicator if the server is running

Step 3: Select Your Voice

Go to TTS Voice Settings (Extension popup → Settings → Voice Settings)
Choose from 54 voices:
- 🇬🇧 British English: George, Daniel, Alice, Emma... (Recommended for learning)
- 🇺🇸 American English: Michael, Nova, Sarah...
Click Save Settings

Step 4: Enjoy!

Select any text on a webpage and click the 🔊 TTS button in the sidebar.

📡 API Reference

Endpoints

`GET /` - Server Info

curl http://localhost:9527/

Response:

{
  "success": true,
  "data": {
    "name": "TTS Server",
    "version": "0.2.0",
    "status": "running",
    "framework": "Candle"
  }
}

`GET /health` - Health Check

curl http://localhost:9527/health

`POST /synthesize` - Text to Speech

Request:

curl -X POST http://localhost:9527/synthesize \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello, world!",
    "voice": "bm_george",
    "format": "wav"
  }'

Parameters:

text (required): Text to synthesize
voice (optional): Voice ID (default: bm_george)
format (optional): Output format, currently only wav (reserved for future mp3/ogg support)

Response:

{
  "file_id": "51f91581302698db",
  "url": "http://localhost:9527/audio/51f91581302698db.wav",
  "cached": false
}

`GET /audio/:filename` - Get Audio File

curl http://localhost:9527/audio/51f91581302698db.wav --output output.wav

Voice List

See VOICE_API.md for complete list of 54 available voices.

Recommended voices for English learning:

bm_george - British male, clear and standard
bm_daniel - British male, accurate pronunciation
af_nova - American female, recommended
am_michael - American male, standard

🛠️ Configuration

Port Configuration

By default, the server runs on port 9527. To change:

Edit src/main.rs:

let addr = SocketAddr::from(([0, 0, 0, 0], 9527));  // Change port here

Then rebuild:

cargo build --release

Cache Configuration

Location: ~/Library/Application Support/tts-server/cache/audio/
TTL: 1 hour (3600 seconds)
Format: SHA256-based file IDs

To change cache settings, edit src/main.rs:

AudioCache::new("cache/audio", 3600)  // Change TTL (seconds)

🐛 Troubleshooting

Problem: Server won't start

Solution 1: Check if port 9527 is already in use

# macOS/Linux:
lsof -i :9527

# Windows:
netstat -ano | findstr :9527

Solution 2: Check espeak-ng installation

espeak-ng --version

If not installed, see Quick Start for installation instructions.

Problem: Extension shows "Disconnected"

Make sure the TTS server is running: http://localhost:9527/health
Check browser console for CORS errors
Restart the server and reload the extension

Problem: "Model not found" error

The models should download automatically on first run. They will be stored in ~/Library/Application Support/tts-server/checkpoints/ and ~/Library/Application Support/tts-server/data/. If download fails:

# Manual download (you might need to provide the full path to download_models.sh inside the .app bundle)
# For example, if TTS Server.app is in /Applications:
/Applications/TTS\ Server.app/Contents/Resources/download_models.sh

Problem: Windows - "espeak-ng not found"

⚠️ Windows 版本将在未来版本发布 (预计 v0.2.0 后)

Windows 用户目前可以从源码构建，或者等待官方 Windows 版本发布。

🏗️ Project Structure

tts-server/
├── src/
│   ├── main.rs           # HTTP server & routes
│   ├── tts_engine.rs     # Kokoro ONNX inference
│   ├── cache.rs          # File caching system
│   ├── vocab.rs          # Tokenization
│   └── wav_encoder.rs    # WAV audio encoding
├── checkpoints/          # ONNX models (auto-downloaded to Application Support)
├── data/voices/          # 54 voice embeddings (auto-downloaded to Application Support)
├── Cargo.toml            # Rust dependencies
├── Info.plist.in         # macOS app bundle configuration
└── README.md             # This file

🤝 Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch: git checkout -b feature/amazing-feature
Commit your changes: git commit -m 'Add amazing feature'
Push to the branch: git push origin feature/amazing-feature
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Kokoro-82M - High-quality TTS model
ONNX Runtime - ML inference engine
espeak-ng - Phonemization

📞 Support

GitHub Issues: Report a bug
Discussions: Ask a question
Extension Issues: MyDictionary

Made with ❤️ by Jason

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
assets		assets
cache		cache
checkpoints		checkpoints
data		data
docs		docs
references		references
release-v0.1.0/tts-server		release-v0.1.0/tts-server
release-v0.1.1/tts-server		release-v0.1.1/tts-server
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
CURRENT_ISSUE.md		CURRENT_ISSUE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
FINAL_SUMMARY.md		FINAL_SUMMARY.md
GITHUB_RELEASE_GUIDE.md		GITHUB_RELEASE_GUIDE.md
INSTALLATION_GUIDE.md		INSTALLATION_GUIDE.md
Info.plist.in		Info.plist.in
LICENSE		LICENSE
README.md		README.md
RELEASE_NOTES.md		RELEASE_NOTES.md
STATUS.md		STATUS.md
TESTING.md		TESTING.md
TODO.md		TODO.md
VOICE_API.md		VOICE_API.md
download_models.sh		download_models.sh
prepare-release.sh		prepare-release.sh
start.sh		start.sh

License

jhfnetboy/Candle-local-AI-Server

Folders and files

Latest commit

History

Repository files navigation