Skip to content

Conversation

@suiyoubi
Copy link

Latest versions of Megatron-Core's inference engine return DynamicInferenceRequestRecord objects instead of InferenceRequest objects. The DynamicInferenceRequestRecord is a container class that holds multiple DynamicInferenceRequest objects (to support suspend/resume functionality) and doesn't have a direct generated_text attribute.

Added handling after generate() to merge DynamicInferenceRequestRecord objects in this PR

Signed-off-by: Ao Tang <aot@nvidia.com>
@oyilmaz-nvidia
Copy link
Contributor

@suiyoubi Could you please fix the linting issues? Then we can start the CI.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Dec 18, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@oyilmaz-nvidia
Copy link
Contributor

/ok to test bc17765

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants