Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
23a76e7
feat: Add voice agent testing framework for synthetic conversations
siddharthraja Nov 14, 2025
0c1c7fb
feat: Add Jodo payment collection scenarios and improve file naming
siddharthraja Nov 14, 2025
f88f2cd
Merge branch 'test-1': Add Jodo payment collection scenarios
siddharthraja Nov 15, 2025
e01b1a0
Adding eleven labs integration and creating a config page
siddharthraja Nov 15, 2025
a33f856
Improvement to UI, re-arrange and generate cmd
siddharthraja Nov 15, 2025
17f3f19
Fix cmd line generation for running synthetic data
siddharthraja Nov 15, 2025
ddfc08a
refactor: Modularize voice conversation generator with clean architec…
siddharthraja Nov 16, 2025
63a6004
Updates to folder structure
siddharthraja Nov 16, 2025
d41e036
cleanup: Remove old implementation files after refactoring
siddharthraja Nov 17, 2025
190d384
feat: Add Cartesia TTS integration with Sonic models
siddharthraja Nov 17, 2025
089d2eb
feat: Add support for GPT-4.1 model and update all references
siddharthraja Nov 17, 2025
cc39bb8
docs: Add comprehensive guide for Cartesia Indian accent and Hinglish…
siddharthraja Nov 17, 2025
ceb680d
Fix Cartesia TTS provider model and voice handling
siddharthraja Nov 17, 2025
8978612
Consolidate data directories to single root location
siddharthraja Nov 17, 2025
9baba4e
Add Hindi/Hinglish voice support for Cartesia TTS
siddharthraja Nov 17, 2025
f249263
feat: Voice catalog with priority-based selection and MP3 audio fix
siddharthraja Nov 17, 2025
0293454
Merge branch 'main' into feat/voice-catalog-priority-selection
siddharthraja Nov 17, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 12 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,15 @@ KMS
.vscode
*.egg-info
.pytest_cache
.ruff_cache
.ruff_cache

# Generated test outputs
conversations/
results/
report.html
*.mp3
*.wav

# Temporary test files
test_connection.py
test_simple.py
161 changes: 161 additions & 0 deletions ARCHITECTURE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,161 @@
# Voice Conversation Generator - Architecture

Clean, modular architecture for generating synthetic customer support conversations.

## Core Architecture

```
┌─────────────────────────────────────────────────────────┐
│ CLI Interface │
│ (vcg_cli.py) │
├─────────────────────────────────────────────────────────┤
│ Core Services Layer │
├─────────────────────────────────────────────────────────┤
│ ConversationOrchestrator │ PersonaService │ Factory │
├─────────────────────────────────────────────────────────┤
│ Domain Models │
├─────────────────────────────────────────────────────────┤
│ Persona │ Conversation │ Metrics │
├─────────────────────────────────────────────────────────┤
│ Provider Abstraction Layer │
├─────────────────────────────────────────────────────────┤
│ LLMProvider │ TTSProvider │ StorageGateway │
├─────────────────────────────────────────────────────────┤
│ Provider Implementations │
├─────────────────────────────────────────────────────────┤
│ OpenAI │ ElevenLabs │ LocalStorage │ (Future: GCS, S3) │
└─────────────────────────────────────────────────────────┘
```

## Directory Structure

```
src/
├── vcg_cli.py # CLI entry point
└── voice_conversation_generator/
├── config/
│ └── config.py # Configuration management
├── models/
│ ├── persona.py # Persona models
│ ├── conversation.py # Conversation & Turn models
│ └── metrics.py # Metrics tracking
├── providers/
│ ├── base.py # Abstract base classes
│ ├── llm/
│ │ └── openai.py # OpenAI LLM
│ ├── tts/
│ │ ├── openai.py # OpenAI TTS
│ │ └── elevenlabs.py # ElevenLabs TTS
│ └── storage/
│ └── local.py # Local file storage
└── services/
├── orchestrator.py # Conversation orchestration
├── persona_service.py # Persona management
└── provider_factory.py # Provider creation
```

## Key Design Patterns

### 1. Provider Pattern
Abstract interfaces with swappable implementations:

```python
class TTSProvider(ABC):
@abstractmethod
async def generate_speech(text, voice_config) -> bytes:
pass
```

### 2. Storage Gateway
Unified interface for file storage:

```python
class StorageGateway(ABC):
@abstractmethod
async def save_conversation(conversation, metrics) -> Dict:
pass
```

### 3. Dependency Injection
Services receive dependencies through constructor:

```python
orchestrator = ConversationOrchestrator(
llm_provider=llm,
tts_provider=tts,
storage_gateway=storage
)
```

## Data Flow

1. **CLI** receives user command
2. **PersonaService** loads customer/support personas
3. **ProviderFactory** creates providers from config
4. **ConversationOrchestrator** manages the flow:
- Generates text via LLMProvider
- Converts to speech via TTSProvider
- Tracks metrics
5. **StorageGateway** saves all artifacts

## Configuration

Layered configuration with environment overrides:

```yaml
# config.yaml
providers:
llm:
type: "openai"
model: "gpt-4"
tts:
type: "elevenlabs"
storage:
type: "local"
local:
base_path: "data/conversations"
```

Environment variables override file config:
- `LLM_PROVIDER`, `LLM_MODEL`
- `TTS_PROVIDER`
- `STORAGE_TYPE`

## Extension Points

### Adding New Providers

1. Create provider class implementing abstract interface
2. Register in `ProviderFactory`
3. Update configuration

Example:
```python
# providers/llm/anthropic.py
class AnthropicLLMProvider(LLMProvider):
async def generate_completion(...):
# Implementation
```

### Database Integration

Future support for PostgreSQL:
- Persona storage and retrieval
- Conversation history
- Analytics queries

### Cloud Deployment

Ready for:
- **FastAPI**: REST endpoints
- **Docker**: Container deployment
- **Cloud Storage**: GCS/S3 support
- **LiveKit**: Real-time simulation

## Benefits

- **Modularity**: Single responsibility per component
- **Testability**: Easy mocking and isolation
- **Extensibility**: Add providers without core changes
- **Maintainability**: Clear separation of concerns
- **Scalability**: Cloud-ready architecture
196 changes: 196 additions & 0 deletions CARTESIA_INDIAN_VOICES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,196 @@
# Cartesia Indian Accent & Hindi/Hinglish Support

## ✅ Confirmed: Cartesia Supports Indian Accents

Based on Cartesia's official documentation and announcements (2024), here's what's available:

### 1. **Hindi Language Support**
- **Language Code**: `hi`
- **Use Case**: Pure Hindi text-to-speech
- **Natural Accent**: Hindi voices naturally have Indian accent when speaking mixed English-Hindi

### 2. **Hinglish Support** (NEW in 2024)
- **What is it**: Native support for mixed Hindi-English conversations
- **Features**:
- Fluid transitions between English and Hindi
- Softens English consonants where appropriate
- Adds Hindi intonation naturally
- Understands code-switching patterns
- **Deployment**: Special deployments in India for lowest latency (40ms time-to-first-audio)

### 3. **Available Indian Voices**
- **Conversational male voice** - Perfect for sales and customer support (Hinglish)
- **Warm feminine voice** - Smooth Indian accent for storytelling and dialogue
- Multiple Hindi voices available

---

## How to Use Indian Accents in Your Code

### Current Implementation (Already Supports It!)

Your Cartesia provider in `cartesia.py` already supports the `language` parameter:

```python
# Line 84-87 in cartesia.py
bytes_iter = self.client.tts.bytes(
model_id=model,
transcript=text,
voice={"mode": "id", "id": voice_id},
language=language, # Already implemented! ✅
output_format=self.output_format
)
```

### Method 1: Use Hindi Language Code (Recommended)

For Indian accent when speaking Hindi/English mix:

```python
# Update voice config in persona_service.py
voice_config = VoiceConfig(
provider="cartesia",
voice_id="hinglish-male-voice-id", # Get from Cartesia dashboard
model="sonic-3"
)

# When generating speech, pass language="hi"
audio_data = await self.tts.generate_speech(
text=text,
voice_config=voice_config,
language="hi" # Use Hindi - gives natural Indian accent
)
```

### Method 2: Use Hinglish-Specific Voices

Cartesia has specific voices designed for Hinglish that automatically handle mixed language:

```python
# These voices are specifically trained for Indian accent
HINGLISH_VOICES = {
'male': 'hinglish-conversational-male-voice-id',
'female': 'hinglish-warm-feminine-voice-id'
}

voice_config = VoiceConfig(
provider="cartesia",
voice_id=HINGLISH_VOICES['male'],
model="sonic-3"
)
```

---

## Supported Languages

Cartesia Sonic supports **15 languages** including:
- English (en)
- Hindi (hi) ✅
- French (fr)
- German (de)
- Spanish (es)
- Portuguese (pt)
- Chinese (zh)
- Japanese (ja)
- Italian (it)
- Korean (ko)
- Dutch (nl)
- Polish (pl)
- Russian (ru)
- Swedish (sv)
- Turkish (tr)

---

## Implementation Steps for Your Project

### Step 1: Find Cartesia Voice IDs

Visit Cartesia's voice playground to browse and test Indian voices:
1. Go to: https://cartesia.ai/voices (or your Cartesia dashboard)
2. Filter for Hindi or Hinglish voices
3. Listen to samples
4. Note the voice IDs

### Step 2: Update Voice Configuration

Edit `src/voice_conversation_generator/services/persona_service.py`:

```python
# Around line 75-79 for support agent
support_persona = SupportPersona(
...
voice_config=VoiceConfig(
provider="cartesia",
voice_id="your-hinglish-male-voice-id", # From Cartesia
model="sonic-3"
)
)

# Around line 189-196 for customer agents
voice_config = VoiceConfig(
provider="cartesia",
voice_id="your-hinglish-female-voice-id", # From Cartesia
model="sonic-3"
)
```

### Step 3: (Optional) Modify Cartesia Provider

If you want to default to Hindi language, edit `cartesia.py:40`:

```python
# In __init__ method
self.default_language = config.get('language', 'hi') # Change from 'en' to 'hi'
```

### Step 4: Generate Conversations

```bash
# Set API key
export CARTESIA_API_KEY=your_key

# Generate with Cartesia (will use Indian voices)
uv run python src/vcg_cli.py generate --tts cartesia --customer angry_insufficient_funds
```

---

## Why Cartesia is Best for Indian Accents

1. **Native Hinglish Support**: Specifically designed for Indian market
2. **Fluid Code-Switching**: Understands when to switch between Hindi and English
3. **Natural Intonation**: Proper Hindi intonation and English consonant softening
4. **Low Latency**: Deployments in India for 40ms time-to-first-audio
5. **Multiple Voices**: Various male/female voices with authentic Indian accents
6. **Recent Launch**: 2024 launch shows active development for Indian market

---

## Summary

**✅ YES - Cartesia fully supports Indian accents through:**
1. Hindi language code (`language="hi"`)
2. Dedicated Hinglish voices (specific voice IDs)
3. Natural code-switching between Hindi and English
4. Authentic Indian intonation and pronunciation

**Your current implementation already has the infrastructure:**
- Language parameter support in `cartesia.py:84-87`
- Voice configuration system in place
- Just need to set the right voice IDs from Cartesia

**Next Steps:**
1. Visit Cartesia dashboard and get Hindi/Hinglish voice IDs
2. Update voice configurations in `persona_service.py`
3. Test with: `uv run python src/vcg_cli.py generate --tts cartesia`

---

## Additional Resources

- Cartesia India Page: https://cartesia.ai/india
- Cartesia Hindi Voices: https://cartesia.ai/languages/hindi
- Python SDK Docs: https://docs.cartesia.ai/use-an-sdk/python
- Voice Playground: https://cartesia.ai/voices (check for latest voice IDs)
Loading