Skip to content

Conversation

@XBastille
Copy link

Description

Adds streaming support for batched prompts, resolving issue #406.

Changes

  • Removed restriction that blocked batched streaming in _sampler.py
  • Fixed _stream_sample_loop to wait for ALL batch elements to complete (not just first)
  • Updated _stream_decode_state to properly handle batch dimensions
  • Net reduction of 3 lines of code

Testing

  • Tested with Gemma3_4B on NVIDIA L40s GPU
  • Single prompt streaming: Works (unchanged)
  • Batched non-streaming: Works (unchanged)
  • Batched streaming: Now works (new feature!)

Backward Compatibility

Fully backward compatible - no breaking changes to existing API.

Fixes #406

@XBastille
Copy link
Author

Hi @skandermoalla I noticed the repo has been a bit quiet since the last release, so I wanted to gently bump this. Since this PR fixes a functional issue #406 and is fully tested, I'd love to get it into the queue whenever the team is back to reviewing external contributions. Please let me know if I need to resolve any conflicts that have come up in the meantime. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ValueError: Streaming is not supported for batched prompts. Let us know if you need this feature.

1 participant