-
Notifications
You must be signed in to change notification settings - Fork 20
fix: create isolated runtime per eval execution #1066
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Each eval execution now gets its own runtime with a unique runtime_id (using execution_id). This ensures each eval has its own LangGraph thread_id with clean state, preventing message accumulation across sequential eval runs. Previously, all evals shared a single runtime with the same thread_id, causing the LangGraph checkpointer to persist and accumulate messages across eval runs, leading to 400 bad request errors. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
9c4fa08 to
cc1896b
Compare
saksharthakkar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left a comment, other than that... lgtm!
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
If new_runtime fails, eval_runtime would be unassigned and the finally block would raise NameError when trying to dispose. Initialize to None and check before disposing. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
akshaylive
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix!
|
One minor comment: can you please rename to |
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Summary
Problem
Previously, all evals shared a single runtime with the same thread_id, causing the LangGraph checkpointer to persist and accumulate messages across eval runs. This led to 400 bad request errors on sequential eval executions.
Root Cause Analysis
Regression Source
This bug was introduced in PR #1055 (
akshaya/single_eval_runtime) by @akshaylive, merged on Dec 30, 2025.e51d942What Changed
Before PR #1055 - Each evaluation created its own runtime with a unique
runtime_id:After PR #1055 - A single runtime was created and shared across all evaluations:
Why This Breaks Sequential Evals
When using LangGraph agents with SQLite checkpointer:
runtime_idmaps tothread_idfor LangGraph conversation statethread_idWhy Testing Missed It
The original PR was tested with the calculator agent which doesn't use LangGraph with persistent state. The issue only manifests with stateful LangGraph agents where conversation history accumulates across the shared thread.
Solution
Each eval execution now gets its own runtime with a unique runtime_id (using execution_id). The runtime is properly disposed after the eval completes.
Test plan
🤖 Generated with Claude Code