Skip to content

Mistral STT returns empty transcripts #4399

@AntoineDrt

Description

@AntoineDrt

Bug Description

The Mistral STT plugin does not transcribe user voice input.

With default config, printing Mistral STT client's responses in the Livekit plugin gives the following output, with empty text but tokens were used:

model='voxtral-mini-latest' text='' usage=UsageInfo(prompt_tokens=8, completion_tokens=4, total_tokens=387, prompt_audio_seconds=1) 
language=None segments=[] finish_reason=None

The behavior is the same with any language.

My analysis:
The plugin sets timestamp_granularities=["segment"] and language (either set in Agent implementation or defaults to en) in the transcription request but Mistral's audio transcription documentation states that timestamp_granularities and language can't be used together.

Expected Behavior

A transcription should be returned.

Reproduction Steps

You can reproduce the behavior with this minimal Livekit Agent: https://gist.github.com/AntoineDrt/95ffcd2e026917564b34698994598dfd

Setup and run:

uv venv

uv pip install livekit-plugins-mistralai livekit "livekit-agents[silero]~=1.3"

export LIVEKIT_URL=wss:// LIVEKIT_API_KEY= LIVEKIT_API_SECRET= MISTRAL_API_KEY=

uv run python ./livekit_mistral_stt_empty_transcript.py console

Try talking to the Agent:

  • the logs in the user_state_changed event callback show speech is detected
  • the callback associated to user_input_transcribed never gets called

Operating System

macOS 15.2 (24C101)

Models Used

Voxtral mini latest, Mistral Medium latest, Cartesia Sonic-3

Package Versions

livekit==1.3.10
livekit_agents==1.3.10
livekit_plugins_mistral==1.3.10
livekit_plugins_silero==1.3.10

Proposed Solution

Mistral's audio transcription documentation states that:

timestamp_granularities is currently not compatible with language, please use either one or the other.

I tested Mistral STT plugin without timestamp_granularities=["segment"] and it does seem to fix the issue.

Judging from this comment, timestamp_granularities=["segment"] does not return timestamps anyways so I suggest we remove it.

This solution would enable the use of language.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions