-
Notifications
You must be signed in to change notification settings - Fork 109
Fix MCP HTTP session persistence across tool calls #1740
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This PR fixes the issue where MCP HTTP connections create new sessions per tool call, breaking session-based authentication. ## Changes ### Core Fix: Persistent MCP Connections - Modified MCPToolExecutor to maintain persistent connections instead of opening/closing for each tool call - Added connection reuse with reference counting in MCPClient - Added session ID tracking for potential session resumption ### Session Management Infrastructure - Added mcp_sessions field to ConversationState for session persistence - Created MCPSessionManager for centralized session tracking - Added cleanup logic in LocalConversation.close() to release MCP sessions ### Files Changed - openhands-sdk/openhands/sdk/mcp/client.py - Added session tracking, reentrant context manager - openhands-sdk/openhands/sdk/mcp/tool.py - Persistent connection in MCPToolExecutor - openhands-sdk/openhands/sdk/mcp/session_manager.py - New session manager - openhands-sdk/openhands/sdk/conversation/state.py - Added mcp_sessions field - openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py - MCP cleanup ### Tests - Added comprehensive tests using live MCP servers (no mocks) - Tests verify connection reuse, session stability, and proper cleanup Fixes #1739 Co-authored-by: openhands <openhands@all-hands.dev>
- Use getattr for dynamic attribute access in MCPClient._capture_session_id - Add proper type narrowing with isinstance assertions in tests - Add Literal import for transport type annotation - Remove unused variable in test Co-authored-by: openhands <openhands@all-hands.dev>
Keep MCP connections open after listing tools instead of reconnecting for each tool call. This is the simplest fix for the session-based authentication issue. Changes: - utils.py: Keep connection open after _list_tools_and_keep_connected() - tool.py: Remove async with from executor since client is already connected - client.py: Revert to original simple implementation Key insight: The cleanest fix is proper object lifecycle - create ONE client per server, keep it open, close it when done. No reference counting, no session managers, no global state. Related issue: #1739 Co-authored-by: openhands <openhands@all-hands.dev>
Multiple tools share the same client, so close() may be called multiple times during LocalConversation.close() cleanup. Added _closed flag to ensure safe multiple calls. Co-authored-by: openhands <openhands@all-hands.dev>
This test uses a server that stores state keyed by MCP session ID. - Without fix: Each tool call creates new session, state is lost - With fix: Same session is reused, state is preserved Test confirms the exact issue reported by user (session-based auth breaking). Co-authored-by: openhands <openhands@all-hands.dev>
|
Looks like there are a few issues preventing this PR from being merged!
If you'd like me to help, just leave a comment, like Feel free to include any additional details that might help me get this PR into a better state. You can manage your notification settings |
|
@OpenHands fix pre-commit |
|
I'm on it! neubig can track my progress at all-hands.dev |
- Fix line length issues (E501) by breaking long strings - Add assertions for executor not None to fix pyright type errors - Remove unused import MCPToolAction
|
I've fixed the pre-commit issues in PR #1740. The changes have been pushed to the Summary of ChangesFixed pre-commit failures in
All pre-commit checks now pass:
|
all-hands-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The session persistence fix works (tests pass), but there are concerns about the approach that should be addressed. Key issues: context manager protocol violation and relying on del for cleanup. Details in inline comments.
|
@OpenHands try to reflect the comments above without making the code overly complex. |
|
I'm on it! neubig can track my progress at all-hands.dev |
…xt manager support - Add explicit connect()/disconnect() methods to MCPClient as alternative to calling __aenter__/__aexit__ directly (addresses protocol violation concern) - Add context manager support (__enter__/__exit__) to MCPToolExecutor for deterministic cleanup - Improve exception handling in utils.py to not mask original exceptions when cleanup fails - Improve error message for disconnected client with more context Co-authored-by: openhands <openhands@all-hands.dev>
|
I've addressed the review comments on PR #1740 without making the code overly complex. The changes have been pushed to the Summary of ChangesAddressed all 4 review comments from
Files Changed
All 61 MCP tests pass and all pre-commit checks pass. |
| """ | ||
| if self._closed: | ||
| return | ||
| self._closed = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue: Setting self._closed = True before attempting cleanup means that if cleanup fails, future calls to sync_close() will return early without retrying.
Suggested change: Move self._closed = True to after the cleanup operations:
def sync_close(self) -> None:
if self._closed:
return
# Best-effort: try async close if parent provides it
if hasattr(self, "close") and inspect.iscoroutinefunction(self.close):
try:
self._executor.run_async(self.close, timeout=10.0)
except Exception:
pass
self._executor.close()
self._closed = True # Mark closed only after cleanup succeedsWhy this is safe: Both underlying close methods are idempotent:
AsyncExecutor.close()uses an atomic swap pattern with aNonecheckfastmcp.Client.close()→_disconnect()checkssession_task is Nonebefore acting
Multiple calls to either are harmless, so if cleanup partially fails, a retry can safely re-attempt without double-closing anything.
Summary
Simplified fix for MCP session persistence. Keeps MCP connections open after listing tools instead of reconnecting for each tool call.
Changes
utils.py: Keep connection open after_list_tools_and_keep_connected()tool.py: Removeasync withfrom executor since client is already connectedclient.py: Reverted to original simple implementationKey Insight
The cleanest fix is proper object lifecycle:
No reference counting, no session managers, no global state. Just ~100 lines changed instead of 900+.
Test Results
All 58 MCP tests pass.
Live Testing with Playwright MCP
Verified session persistence with the Playwright MCP server (a stateful browser automation server):
https://www.openhands.dev/blogusingbrowser_navigatebrowser_snapshotto verify it was still on the same pageResult: ✅ SUCCESS - The browser state was maintained across multiple tool calls. The agent confirmed it was still on
https://www.openhands.dev/blogwith the correct page content.This proves the MCP session persistence fix works correctly - without the fix, the browser context would have been lost when the MCP connection was closed and reopened between tool calls.
Fixes: #1739
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.12-nodejs22golang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:974b18e-pythonRun
All tags pushed for this build
About Multi-Architecture Support
974b18e-python) is a multi-arch manifest supporting both amd64 and arm64974b18e-python-amd64) are also available if needed