Skip to content

Conversation

@neubig
Copy link
Contributor

@neubig neubig commented Jan 16, 2026

Summary

Simplified fix for MCP session persistence. Keeps MCP connections open after listing tools instead of reconnecting for each tool call.

Changes

  • utils.py: Keep connection open after _list_tools_and_keep_connected()
  • tool.py: Remove async with from executor since client is already connected
  • client.py: Reverted to original simple implementation

Key Insight

The cleanest fix is proper object lifecycle:

  1. Create ONE client per server
  2. Keep it open after listing tools
  3. Close it when done

No reference counting, no session managers, no global state. Just ~100 lines changed instead of 900+.

Test Results

All 58 MCP tests pass.

Live Testing with Playwright MCP

Verified session persistence with the Playwright MCP server (a stateful browser automation server):

  1. Step 1: Agent navigated to https://www.openhands.dev/blog using browser_navigate
  2. Step 2: Agent used browser_snapshot to verify it was still on the same page

Result: ✅ SUCCESS - The browser state was maintained across multiple tool calls. The agent confirmed it was still on https://www.openhands.dev/blog with the correct page content.

This proves the MCP session persistence fix works correctly - without the fix, the browser context would have been lost when the MCP connection was closed and reopened between tool calls.

Fixes: #1739


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:974b18e-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-974b18e-python \
  ghcr.io/openhands/agent-server:974b18e-python

All tags pushed for this build

ghcr.io/openhands/agent-server:974b18e-golang-amd64
ghcr.io/openhands/agent-server:974b18e-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:974b18e-golang-arm64
ghcr.io/openhands/agent-server:974b18e-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:974b18e-java-amd64
ghcr.io/openhands/agent-server:974b18e-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:974b18e-java-arm64
ghcr.io/openhands/agent-server:974b18e-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:974b18e-python-amd64
ghcr.io/openhands/agent-server:974b18e-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:974b18e-python-arm64
ghcr.io/openhands/agent-server:974b18e-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:974b18e-golang
ghcr.io/openhands/agent-server:974b18e-java
ghcr.io/openhands/agent-server:974b18e-python

About Multi-Architecture Support

  • Each variant tag (e.g., 974b18e-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 974b18e-python-amd64) are also available if needed

This PR fixes the issue where MCP HTTP connections create new sessions per
tool call, breaking session-based authentication.

## Changes

### Core Fix: Persistent MCP Connections
- Modified MCPToolExecutor to maintain persistent connections instead of
  opening/closing for each tool call
- Added connection reuse with reference counting in MCPClient
- Added session ID tracking for potential session resumption

### Session Management Infrastructure
- Added mcp_sessions field to ConversationState for session persistence
- Created MCPSessionManager for centralized session tracking
- Added cleanup logic in LocalConversation.close() to release MCP sessions

### Files Changed
- openhands-sdk/openhands/sdk/mcp/client.py - Added session tracking, reentrant context manager
- openhands-sdk/openhands/sdk/mcp/tool.py - Persistent connection in MCPToolExecutor
- openhands-sdk/openhands/sdk/mcp/session_manager.py - New session manager
- openhands-sdk/openhands/sdk/conversation/state.py - Added mcp_sessions field
- openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py - MCP cleanup

### Tests
- Added comprehensive tests using live MCP servers (no mocks)
- Tests verify connection reuse, session stability, and proper cleanup

Fixes #1739

Co-authored-by: openhands <openhands@all-hands.dev>
@github-actions
Copy link
Contributor

github-actions bot commented Jan 16, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-sdk/openhands/sdk/mcp
   client.py37391%43, 69–70
   tool.py951782%56, 70–73, 98, 101, 160, 167, 169–171, 237–238, 243, 251, 279
   utils.py46784%25–26, 29, 32, 48, 99–100
TOTAL15695455770% 

- Use getattr for dynamic attribute access in MCPClient._capture_session_id
- Add proper type narrowing with isinstance assertions in tests
- Add Literal import for transport type annotation
- Remove unused variable in test

Co-authored-by: openhands <openhands@all-hands.dev>
Keep MCP connections open after listing tools instead of reconnecting
for each tool call. This is the simplest fix for the session-based
authentication issue.

Changes:
- utils.py: Keep connection open after _list_tools_and_keep_connected()
- tool.py: Remove async with from executor since client is already connected
- client.py: Revert to original simple implementation

Key insight: The cleanest fix is proper object lifecycle - create ONE
client per server, keep it open, close it when done. No reference
counting, no session managers, no global state.

Related issue: #1739

Co-authored-by: openhands <openhands@all-hands.dev>
Multiple tools share the same client, so close() may be called multiple
times during LocalConversation.close() cleanup. Added _closed flag to
ensure safe multiple calls.

Co-authored-by: openhands <openhands@all-hands.dev>
This test uses a server that stores state keyed by MCP session ID.
- Without fix: Each tool call creates new session, state is lost
- With fix: Same session is reused, state is preserved

Test confirms the exact issue reported by user (session-based auth breaking).

Co-authored-by: openhands <openhands@all-hands.dev>
@openhands-ai
Copy link

openhands-ai bot commented Jan 16, 2026

Looks like there are a few issues preventing this PR from being merged!

  • GitHub Actions are failing:
    • Pre-commit checks

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #1740 at branch `openhands/fix-mcp-session-persistence`

Feel free to include any additional details that might help me get this PR into a better state.

You can manage your notification settings

@neubig
Copy link
Contributor Author

neubig commented Jan 16, 2026

@OpenHands fix pre-commit

@openhands-ai
Copy link

openhands-ai bot commented Jan 16, 2026

I'm on it! neubig can track my progress at all-hands.dev

- Fix line length issues (E501) by breaking long strings
- Add assertions for executor not None to fix pyright type errors
- Remove unused import MCPToolAction
@openhands-ai
Copy link

openhands-ai bot commented Jan 16, 2026

I've fixed the pre-commit issues in PR #1740. The changes have been pushed to the openhands/fix-mcp-session-persistence branch.

Summary of Changes

Fixed pre-commit failures in tests/sdk/mcp/test_stateful_mcp.py:

  1. Line length issues (E501) - Fixed 3 lines that exceeded 88 characters:

    • Line 72: Split long error message string across multiple lines
    • Line 85: Extracted counter value to a variable before using in f-string
    • Line 218: Shortened comment text
  2. Pyright type errors - Fixed 5 type errors where executor could be None:

    • Added assertions (assert ... is not None) for all 4 tool executors before using them in test_multiple_operations_same_session
  3. Unused import - Removed unused MCPToolAction import (auto-fixed by ruff)

All pre-commit checks now pass:

  • ✅ Format YAML files
  • ✅ Ruff format
  • ✅ Ruff lint
  • ✅ PEP8 style check (pycodestyle)
  • ✅ Type check with pyright
  • ✅ Check import dependency rules
  • ✅ Check Tool subclass registration

View full conversation

@neubig neubig marked this pull request as ready for review January 16, 2026 19:44
@neubig neubig requested a review from jpshackelford January 16, 2026 19:47
Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The session persistence fix works (tests pass), but there are concerns about the approach that should be addressed. Key issues: context manager protocol violation and relying on del for cleanup. Details in inline comments.

@neubig
Copy link
Contributor Author

neubig commented Jan 16, 2026

@OpenHands try to reflect the comments above without making the code overly complex.

@openhands-ai
Copy link

openhands-ai bot commented Jan 16, 2026

I'm on it! neubig can track my progress at all-hands.dev

…xt manager support

- Add explicit connect()/disconnect() methods to MCPClient as alternative to
  calling __aenter__/__aexit__ directly (addresses protocol violation concern)
- Add context manager support (__enter__/__exit__) to MCPToolExecutor for
  deterministic cleanup
- Improve exception handling in utils.py to not mask original exceptions
  when cleanup fails
- Improve error message for disconnected client with more context

Co-authored-by: openhands <openhands@all-hands.dev>
@openhands-ai
Copy link

openhands-ai bot commented Jan 16, 2026

I've addressed the review comments on PR #1740 without making the code overly complex. The changes have been pushed to the openhands/fix-mcp-session-persistence branch.

Summary of Changes

Addressed all 4 review comments from all-hands-bot:

  1. 🟠 Important - Protocol violation fix: Added explicit connect() and disconnect() methods to MCPClient as a cleaner alternative to calling __aenter__() directly. The _list_tools_and_keep_connected() function now uses await client.connect() instead.

  2. 🟡 Context manager support: Added __enter__ and __exit__ methods to MCPToolExecutor for deterministic cleanup. Users can now write:

    with tool.executor as executor:
        # Use tools
        pass  # Automatic cleanup
  3. 🟡 Exception handling: Improved the exception handler in utils.py to wrap client.sync_close() in a try-except block, ensuring the original exception is preserved if cleanup fails.

  4. 🟢 Better error message: Enhanced the RuntimeError message when the client is disconnected to include more debugging context.

Files Changed

  • client.py: +16 lines (added connect() and disconnect() methods)
  • tool.py: +10 lines (added context manager support)
  • utils.py: +7/-7 lines (use connect() method, improved exception handling)

All 61 MCP tests pass and all pre-commit checks pass.

View full conversation

"""
if self._closed:
return
self._closed = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: Setting self._closed = True before attempting cleanup means that if cleanup fails, future calls to sync_close() will return early without retrying.

Suggested change: Move self._closed = True to after the cleanup operations:

def sync_close(self) -> None:
    if self._closed:
        return

    # Best-effort: try async close if parent provides it
    if hasattr(self, "close") and inspect.iscoroutinefunction(self.close):
        try:
            self._executor.run_async(self.close, timeout=10.0)
        except Exception:
            pass

    self._executor.close()
    self._closed = True  # Mark closed only after cleanup succeeds

Why this is safe: Both underlying close methods are idempotent:

  • AsyncExecutor.close() uses an atomic swap pattern with a None check
  • fastmcp.Client.close()_disconnect() checks session_task is None before acting

Multiple calls to either are harmless, so if cleanup partially fails, a retry can safely re-attempt without double-closing anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MCP HTTP connections create new sessions per tool call, breaking session-based auth

5 participants