-
Notifications
You must be signed in to change notification settings - Fork 144
LangGraph Plugin for Temporal Python SDK #1263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
mfateev
wants to merge
96
commits into
temporalio:main
Choose a base branch
from
mfateev:langgraph-plugin
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
96 commits
Select commit
Hold shift + click to select a range
3051548
LangGraph: Add Phase 1 prototype structure
mfateev 56394b0
LangGraph: Implement Pregel loop submit injection prototype
mfateev 8d1409f
LangGraph: Use internal constant import to avoid deprecation warning
mfateev 1ac7158
Fix write capture test for PregelExecutableTask type
mfateev c5c9896
Add task interface prototype and tests
mfateev 4e078c7
Add serialization prototype using Temporal data converters
mfateev 5521520
LangGraph: Add Phase 1 validation prototypes before cleanup
mfateev b12a24e
LangGraph: Remove prototype files after Phase 2 implementation
mfateev 7a29282
LangGraph: Implement Phase 3 and 4 - Activity write capture and per-n…
mfateev 6d72b9c
LangGraph: Fix conditional edge routing and use AsyncPregelLoop
mfateev 5feff61
LangGraph: Remove prototype test files
mfateev 0f555e1
LangGraph: Execute tasks in parallel within each tick
mfateev 5ed4b86
LangGraph: Implement native interrupt API with comprehensive tests
mfateev 13ef23c
LangGraph: Fix multi-interrupt handling and add e2e tests
mfateev 60ed486
LangGraph: Add checkpoint and should_continue APIs for continue-as-new
mfateev 4e4301f
LangGraph: Add Store support for cross-node persistence
mfateev 83e90e5
LangGraph: Add e2e tests for Store functionality
mfateev 1f49320
LangGraph: Add Send API support and validation tests
mfateev 2d5738e
LangGraph: Add design docs for interrupt and store APIs
mfateev 3f7c044
LangGraph: Remove design docs (preserved in git history)
mfateev 535851a
LangGraph: Add user-facing README documentation
mfateev 5b0231e
LangGraph: Add temporal_node_metadata() helper for typed activity opt…
mfateev 3d54aba
LangGraph: Use temporal_node_metadata() for compile() defaults
mfateev 788e5cc
LangGraph: Rename API for consistency
mfateev 904c566
LangGraph: Add plugin-level activity options
mfateev cf45004
LangGraph: Add temporal_tool() and temporal_model() for durable agent…
mfateev b518249
LangGraph: Add create_agent support and temporal_node_metadata() helper
mfateev bda9006
LangGraph: Reorganize tests and fix sandbox graph building
mfateev be1d36e
LangGraph: Add experimental warnings and improve documentation
mfateev afbd452
LangGraph: Document internal API usage with detailed rationale
mfateev 87f6b9b
LangGraph: Add logging infrastructure
mfateev 6bf1456
LangGraph: Remove warning suppression by using internal imports directly
mfateev ec44b14
LangGraph: Add domain-specific exceptions with ApplicationError
mfateev ea808c4
LangGraph: Tidy docstrings to be precise and concise
mfateev 7e98835
LangGraph: Remove enable_workflow_execution compile parameter
mfateev 720667e
LangGraph: Rename activities and add meaningful summaries for UI
mfateev ed1da9a
LangGraph: Run __start__ node inline in workflow
mfateev 7a3121a
LangGraph: Use ClientConfig for example connection setup
mfateev 02eb922
LangGraph: Add sandbox passthrough for pydantic_core and langchain_core
mfateev dc1d2e3
LangGraph: Align with SDK style conventions
mfateev 386d9b1
LangGraph: Remove example.py from module
mfateev a86d651
LangGraph: Implement bind_tools for temporal_model
mfateev 406acc9
LangGraph: Document bind_tools support in README
mfateev bf072c4
LangGraph: Add summary to temporal_tool activity
mfateev 8952549
LangGraph: Show tool names in activity summary for tools node
mfateev 1f010d9
LangGraph: Simplify activity names and add metadata description support
mfateev 54ee95f
LangGraph: Add create_durable_agent and create_durable_react_agent fu…
mfateev 01216c6
LangGraph: Document create_durable_agent and create_durable_react_agent
mfateev f69c100
LangGraph: Rename node_activity_options to activity_options
mfateev ee7dbd0
LangGraph: Fix temporal_model deepcopy issue with HTTP clients
mfateev 68d7370
LangGraph: Remove temporal_model and temporal_tool wrappers
mfateev 212a80a
LangGraph: Update README to reflect simplified architecture
mfateev b53c66a
LangGraph: Document create_agent as preferred over deprecated create_…
mfateev c97eede
LangGraph: Add langchain test dependency and use create_react_agent i…
mfateev 6921021
LangGraph: Fix state reading for create_agent routing edges
mfateev 1e87448
LangGraph: Improve activity summaries for model nodes
mfateev dc2dd26
LangGraph: Execute subgraph inner nodes as separate activities
mfateev c1edae9
LangGraph: Fix subgraph routing and add unit tests
mfateev 62b245c
LangGraph: Fix node execution to always use ainvoke
mfateev 1f1c209
LangGraph: Handle ParentCommand for supervisor multi-agent routing
mfateev 8c0f3ae
LangGraph: Update README to reference create_agent
mfateev 517c01d
LangGraph: Classify node errors as retryable or non-retryable
mfateev 97e4312
LangGraph: Execute Send packets in parallel using asyncio.gather
mfateev 82f0725
LangGraph: Extract query from input state for activity summaries
mfateev 2193a41
Add .mypy_cache to .gitignore
mfateev 63dcfdf
LangGraph: Refactor ainvoke and _execute_subgraph into smaller methods
mfateev fb46f18
LangGraph: Group instance variables into state dataclasses
mfateev 5576183
LangGraph: Extract magic strings to constants module
mfateev dbb1821
LangGraph: Extract nested functions from _execute_node_impl
mfateev 4cc4ff8
Update CODE_REVIEW.md to mark completed refactoring items
mfateev 470d916
LangGraph: Fix linting issues (import order and formatting)
mfateev d80affd
Fix lint errors: add type annotations and docstrings
mfateev 12cc97d
LangGraph: Set ContextVar for get_config()/get_store() in activities
mfateev 56fbe77
LangGraph: Support run_in_workflow nodes calling Temporal operations
mfateev 1d2e2ad
LangGraph: Run user function sandboxed in run_in_workflow nodes
mfateev 95b2e32
LangGraph: Add E2E test for sandbox enforcement on run_in_workflow nodes
mfateev 4a389e3
LangGraph: Improve README documentation
mfateev b15f465
Update README sample links to langgraph_plugin directory
mfateev d2fb7ac
LangGraph: Add graph visualization methods
mfateev 7e2cf0c
Add Graph Visualization Queries section to README
mfateev 4c4d5ef
LangGraph: Fix parallel branch execution in BSP model
mfateev 9c3ee17
Add unit tests for LangGraph Functional API implementation
mfateev 697faca
LangGraph: Add Functional API implementation and fix lint issues
mfateev 8b92932
Fix asyncio compatibility for LangGraph Functional API
mfateev ccfc578
LangGraph Functional API: Bypass Pregel runner for proper task routing
mfateev 02c4ca7
LangGraph: Support activity_options() helper in Functional API
mfateev eeeaf00
LangGraph: Unify plugin for Graph API and Functional API
mfateev 30975ab
Remove backward compatibility code from LangGraph integration
mfateev 32f59cb
Update README installation instructions for development branch
mfateev e370b43
Add CLAUDE.md with design decisions and implementation guide
mfateev 9fde135
Remove duplicate design doc (langgraph-plugin-design.md)
mfateev 91fa485
Update design documentation to match implementation
mfateev 2648bc0
Add task result caching for Functional API continue-as-new
mfateev 60f0305
Document task result caching for Functional API in CLAUDE.md
mfateev 61e8a1e
Add Functional API documentation to README
mfateev cb926d1
Update README sample applications with both API styles
mfateev File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,5 +1,6 @@ | ||
| .venv | ||
| __pycache__ | ||
| .mypy_cache | ||
| /build | ||
| /dist | ||
| temporalio/bridge/target/ | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,277 @@ | ||
| # LangGraph Temporal Plugin - Code Review | ||
|
|
||
| **Date:** 2025-12-29 | ||
| **Reviewer:** Claude Code | ||
| **Scope:** Full codebase review of `temporalio/contrib/langgraph/` | ||
|
|
||
| --- | ||
|
|
||
| ## Executive Summary | ||
|
|
||
| The LangGraph plugin is a **well-designed integration** that maps LangGraph's computational model onto Temporal's durable execution model. The architecture is sound with clear separation of concerns. The implementation successfully supports most LangGraph features including interrupts, Store API, Send API, Command API, and subgraphs. | ||
|
|
||
| **Overall Rating:** Good with minor improvements recommended | ||
|
|
||
| --- | ||
|
|
||
| ## Architecture Overview | ||
|
|
||
| ### Module Structure | ||
|
|
||
| | Module | Purpose | Lines | Assessment | | ||
| |--------|---------|-------|------------| | ||
| | `_plugin.py` | Plugin registration, worker setup | ~100 | Clean | | ||
| | `_graph_registry.py` | Graph storage, lookup | ~130 | Clean | | ||
| | `_runner.py` | Main orchestration logic | ~1200 | Complex but necessary | | ||
| | `_activities.py` | Node execution activities | ~430 | Well-structured | | ||
| | `_models.py` | Data transfer objects | ~320 | Good dataclass usage | | ||
| | `_exceptions.py` | Error classification | ~170 | Comprehensive | | ||
| | `_store.py` | Activity-local store | ~100 | Simple, effective | | ||
| | `__init__.py` | Public API | ~190 | Well-documented | | ||
|
|
||
| ### Key Design Decisions | ||
|
|
||
| 1. **Activities as Node Executors**: Each graph node runs as a Temporal activity, providing durability and retry semantics. This is the correct architectural choice. | ||
|
|
||
| 2. **AsyncPregelLoop Integration**: The runner uses LangGraph's internal `AsyncPregelLoop` for graph traversal, ensuring compatibility with native LangGraph behavior. | ||
|
|
||
| 3. **Plugin-based Registration**: Graphs are registered via `LangGraphPlugin` and stored in a global registry, allowing compile-time lookup within workflows. | ||
|
|
||
| 4. **Store Snapshot Pattern**: Store data is snapshotted before each activity and writes are tracked/merged back - enables cross-node persistence without shared state. | ||
|
|
||
| --- | ||
|
|
||
| ## Strengths | ||
|
|
||
| ### 1. Clean Separation of Concerns | ||
| - `_plugin.py` handles Temporal integration (activities, data converter, sandbox) | ||
| - `_runner.py` handles workflow-side orchestration | ||
| - `_activities.py` handles activity-side execution | ||
| - `_models.py` defines serializable DTOs | ||
|
|
||
| ### 2. Comprehensive Error Classification (`_exceptions.py:13-97`) | ||
| ```python | ||
| def is_non_retryable_error(exc: BaseException) -> bool: | ||
| ``` | ||
| The error classifier correctly identifies: | ||
| - Non-retryable: `TypeError`, `ValueError`, `AuthenticationError`, 4xx HTTP errors | ||
| - Retryable: Rate limits (429), network errors, 5xx server errors | ||
|
|
||
| This ensures proper retry behavior for different failure modes. | ||
|
|
||
| ### 3. Rich Activity Summaries (`_runner.py:~64-185`) | ||
| Activity summaries extract meaningful context: | ||
| - Tool calls from messages | ||
| - Model names from chat models | ||
| - Last human query for context | ||
| - Node descriptions from metadata | ||
|
|
||
| This significantly improves workflow observability in the Temporal UI. | ||
|
|
||
| ### 4. Robust Interrupt Handling | ||
| The interrupt/resume flow is well-implemented: | ||
| - `_pending_interrupt` tracks interrupt state | ||
| - `_interrupted_node_name` enables targeted resume | ||
| - `_completed_nodes_in_cycle` prevents re-execution after resume | ||
| - Resume values flow through `PregelScratchpad` | ||
|
|
||
| ### 5. Parallel Send Execution (`_runner.py:866-999`) | ||
| Send packets now execute in parallel using `asyncio.gather`, with proper phase separation: | ||
| 1. Prepare all activity inputs (deterministic step counter assignment) | ||
| 2. Execute all activities in parallel | ||
| 3. Process results sequentially (handle interrupts, parent commands) | ||
|
|
||
| ### 6. Comprehensive Feature Support | ||
| The integration supports: | ||
| - Interrupts/resume via `interrupt()` and `Command(resume=...)` | ||
| - Store API via `ActivityLocalStore` | ||
| - Send API for dynamic parallelism | ||
| - Command API for navigation | ||
| - Subgraphs with automatic flattening | ||
| - Continue-as-new via `get_state()`/checkpoint | ||
|
|
||
| --- | ||
|
|
||
| ## Areas for Improvement | ||
|
|
||
| ### 1. ~~Long Methods in `_runner.py`~~ ✅ COMPLETED | ||
|
|
||
| **Issue:** `ainvoke()` is ~215 lines, `_execute_subgraph()` is ~175 lines. | ||
|
|
||
| **Resolution:** Refactored into smaller methods: | ||
| - `_prepare_resume_input()` - Handle resume/Command input | ||
| - `_create_pregel_loop()` - Create and configure the Pregel loop | ||
| - `_execute_loop()` - Main execution loop with tick processing | ||
| - `_process_tick_tasks()` - Process tasks from a single tick | ||
| - `_execute_regular_tasks()` - Execute regular node tasks | ||
| - `_execute_send_packets()` - Execute Send packet tasks in parallel | ||
| - `_finalize_output()` - Prepare final output with interrupt/checkpoint handling | ||
|
|
||
| ### 2. ~~Many Instance Variables in `TemporalLangGraphRunner`~~ ✅ COMPLETED | ||
|
|
||
| **Issue:** The class has ~20 instance variables tracking various state. | ||
|
|
||
| **Resolution:** Grouped into two dataclasses in `_runner.py`: | ||
| ```python | ||
| @dataclass | ||
| class InterruptState: | ||
| interrupted_state: dict[str, Any] | None = None | ||
| interrupted_node_name: str | None = None | ||
| resume_value: Any | None = None | ||
| resume_used: bool = False | ||
| is_resume_invocation: bool = False | ||
| pending_interrupt: InterruptValue | None = None | ||
|
|
||
| @dataclass | ||
| class ExecutionState: | ||
| step_counter: int = 0 | ||
| invocation_counter: int = 0 | ||
| completed_nodes_in_cycle: set[str] = field(default_factory=set) | ||
| resumed_node_writes: dict[str, list[tuple[str, Any]]] = field(default_factory=dict) | ||
| last_output: dict[str, Any] | None = None | ||
| pending_parent_command: Any | None = None | ||
| store_state: dict[tuple[tuple[str, ...], str], dict[str, Any]] = field(default_factory=dict) | ||
| ``` | ||
|
|
||
| Now accessed via `self._interrupt.*` and `self._execution.*`. | ||
|
|
||
| ### 3. ~~Magic Strings Could Be Constants~~ ✅ COMPLETED | ||
|
|
||
| **Issue:** String literals like `"__start__"`, `"tools"`, `"__interrupt__"`, `"__checkpoint__"` appear throughout. | ||
|
|
||
| **Resolution:** Created `_constants.py` with: | ||
| ```python | ||
| START_NODE = "__start__" | ||
| TOOLS_NODE = "tools" | ||
| INTERRUPT_KEY = "__interrupt__" | ||
| CHECKPOINT_KEY = "__checkpoint__" | ||
| BRANCH_PREFIX = "branch:" | ||
| MODEL_NODE_NAMES = frozenset({"agent", "model", "llm", "chatbot", "chat_model"}) | ||
| MODEL_NAME_ATTRS = ("model_name", "model") | ||
| ``` | ||
|
|
||
| ### 4. ~~Nested Functions in `_execute_node_impl`~~ ✅ COMPLETED | ||
|
|
||
| **Issue:** `_execute_node_impl` contains 5 nested functions. | ||
|
|
||
| **Resolution:** Extracted to module level in `_activities.py`: | ||
| - `_convert_messages_if_needed()` - Module-level pure function | ||
| - `_merge_channel_value()` - Module-level pure function | ||
| - `StateReader` class - Encapsulates state reading logic | ||
| - `_get_null_resume()` - Module-level function | ||
|
|
||
| Only `_interrupt_counter()` remains nested (requires mutable state capture). | ||
|
|
||
| ### 5. Type Annotations Could Be More Specific | ||
|
|
||
| **Issue:** Some `Any` types could be narrowed: | ||
| ```python | ||
| per_node_activity_options: dict[str, dict[str, Any]] # inner dict structure is known | ||
| checkpoint: dict | None # could be StateSnapshot | dict | None | ||
| ``` | ||
|
|
||
| **Recommendation:** Use more specific types or TypedDict where the structure is known. | ||
|
|
||
| --- | ||
|
|
||
| ## Test Coverage Assessment | ||
|
|
||
| ### Current Tests | ||
|
|
||
| | Test File | Tests | Coverage | | ||
| |-----------|-------|----------| | ||
| | `test_e2e.py` | 14 | Basic execution, interrupts, store, advanced features, agents | | ||
| | `test_runner.py` | 39 | Activity summary, model extraction, compile, error retryability, parallel sends | | ||
| | `test_activities.py` | ~10 | Node execution, interrupts, parent commands | | ||
| | `test_models.py` | ~15 | Data model serialization | | ||
| | `test_store.py` | ~10 | Store operations | | ||
| | `test_plugin.py` | ~5 | Plugin registration | | ||
| | `test_registry.py` | ~5 | Graph registry | | ||
|
|
||
| ### Coverage Gaps | ||
|
|
||
| 1. **Edge Cases:** | ||
| - Workflow cancellation during activity execution | ||
| - Very large state serialization | ||
| - Deep subgraph nesting (>3 levels) | ||
|
|
||
| 2. **Error Scenarios:** | ||
| - Activity timeout during interrupt | ||
| - Store write conflicts | ||
| - Graph definition changes between invocations | ||
|
|
||
| 3. **Performance:** | ||
| - No load tests for high-parallelism Send patterns | ||
| - No benchmarks for large state checkpointing | ||
|
|
||
| --- | ||
|
|
||
| ## Security Considerations | ||
|
|
||
| ### Positive | ||
|
|
||
| 1. **Sandbox passthrough is limited:** Only `pydantic_core`, `langchain_core`, `annotated_types` are passed through. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Claude is actually lying here. |
||
|
|
||
| 2. **Config filtering:** Internal LangGraph keys (`__pregel_*`, `__lg_*`) are stripped before serialization. | ||
|
|
||
| 3. **No arbitrary code execution:** Node functions are registered at plugin init, not deserialized. | ||
|
|
||
| ### Recommendations | ||
|
|
||
| 1. **Input validation:** Consider validating `graph_id` format in `compile()` to prevent injection attacks via workflow inputs. | ||
|
|
||
| 2. **State size limits:** Consider adding configurable limits on serialized state size to prevent memory issues. | ||
|
|
||
| --- | ||
|
|
||
| ## Documentation Quality | ||
|
|
||
| ### Strengths | ||
|
|
||
| - Comprehensive README with examples | ||
| - Good docstrings on public API (`__init__.py`) | ||
| - MISSING_FEATURES.md provides clear status tracking | ||
| - Experimental warnings are clearly noted | ||
|
|
||
| ### Gaps | ||
|
|
||
| - Internal architecture documentation could be added (class diagrams, sequence diagrams) | ||
| - Contributing guidelines not present | ||
| - Changelog/versioning not formalized | ||
|
|
||
| --- | ||
|
|
||
| ## Recommendations Summary | ||
|
|
||
| ### High Priority | ||
|
|
||
| 1. ~~**Refactor `ainvoke` and `_execute_subgraph`** into smaller, testable methods~~ ✅ DONE | ||
| 2. ~~**Group instance variables** into state dataclasses for better organization~~ ✅ DONE | ||
|
|
||
| ### Medium Priority | ||
|
|
||
| 3. ~~**Extract magic strings** to a constants module~~ ✅ DONE | ||
| 4. **Add integration tests** for cancellation and timeout scenarios | ||
| 5. **Add more specific type annotations** | ||
|
|
||
| ### Low Priority | ||
|
|
||
| 6. ~~**Extract nested functions** from `_execute_node_impl`~~ ✅ DONE | ||
| 7. **Add architecture documentation** with diagrams | ||
| 8. **Add load/performance tests** for Send API patterns | ||
|
|
||
| --- | ||
|
|
||
| ## Conclusion | ||
|
|
||
| The LangGraph plugin is a solid implementation that correctly integrates LangGraph's graph execution model with Temporal's durable execution. The code is functional, well-tested for core scenarios, and provides good observability. | ||
|
|
||
| **Update (2025-12-29):** The major code organization improvements have been completed: | ||
| - ✅ Long methods refactored into smaller, testable functions | ||
| - ✅ Instance variables grouped into `InterruptState` and `ExecutionState` dataclasses | ||
| - ✅ Magic strings extracted to `_constants.py` module | ||
| - ✅ Nested functions extracted from `_execute_node_impl` | ||
|
|
||
| Remaining items are lower priority (integration tests, type annotations, documentation). | ||
|
|
||
| **Verdict:** Ready for experimental use with improved maintainability. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except for AuthenticationError, these are retryable in a temporal context if users want the simplicity of deploying to fix their stuck workflows.
That is, "preconditions not met" errors can be non-retryable but developer-facing errors can be.