Skip to content

Conversation

@tinalenguyen
Copy link
Contributor

timestamp_granularities and language can't be used in tandem. to use one, the other must be set to None

@tinalenguyen tinalenguyen linked an issue Dec 27, 2025 that may be closed by this pull request
@chenghao-mou chenghao-mou requested a review from a team December 27, 2025 21:10
Copy link
Member

@chenghao-mou chenghao-mou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for spotting that!

I tested it locally, and it returned timestamps. However, I don't think we can turn the flag on because it is still non-streaming, and therefore only relative to each VAD spans.

STT event: SpeechEvent(type=<SpeechEventType.FINAL_TRANSCRIPT: 'final_transcript'>, request_id='', alternatives=[SpeechData(language=None,        
                                           text=' Can you tell me a different story?', start_time=0.3, end_time=1.9, confidence=0.0, speaker_id=None, is_primary_speaker=None, words=[' Can  
                                           you tell me a different story?'])], recognition_usage=None) 
...
STT event: SpeechEvent(type=<SpeechEventType.FINAL_TRANSCRIPT: 'final_transcript'>, request_id='', alternatives=[SpeechData(language=None,        
                                           text=' You know.', start_time=0.3, end_time=0.7, confidence=0.0, speaker_id=None, is_primary_speaker=None, words=[' You know.'])],                
                                           recognition_usage=None) 

@tinalenguyen
Copy link
Contributor Author

@chenghao-mou Ah okay, thanks for the clarification! It's always the case that aligned_transcript is set if streaming is True, right?

@tinalenguyen tinalenguyen merged commit 620ed34 into main Dec 27, 2025
18 checks passed
@tinalenguyen tinalenguyen deleted the tina/fix-mistral-stt branch December 27, 2025 22:23
@ChenghaoMou
Copy link
Contributor

@chenghao-mou Ah okay, thanks for the clarification! It's always the case that aligned_transcript is set if streaming is True, right?

Yes. If it supports streaming and can return word or chunk timestamps, otherwise it should always be False.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Mistral STT returns empty transcripts

4 participants