temporalio · mfateev · Dec 25, 2025 · Dec 25, 2025 · Dec 25, 2025 · Dec 25, 2025
@@ -1,5 +1,6 @@
 .venv
 __pycache__
+.mypy_cache
 /build
 /dist
 temporalio/bridge/target/

@@ -0,0 +1,277 @@
+# LangGraph Temporal Plugin - Code Review
+
+**Date:** 2025-12-29
+**Reviewer:** Claude Code
+**Scope:** Full codebase review of `temporalio/contrib/langgraph/`
+
+---
+
+## Executive Summary
+
+The LangGraph plugin is a **well-designed integration** that maps LangGraph's computational model onto Temporal's durable execution model. The architecture is sound with clear separation of concerns. The implementation successfully supports most LangGraph features including interrupts, Store API, Send API, Command API, and subgraphs.
+
+**Overall Rating:** Good with minor improvements recommended
+
+---
+
+## Architecture Overview
+
+### Module Structure
+
+| Module | Purpose | Lines | Assessment |
+|--------|---------|-------|------------|
+| `_plugin.py` | Plugin registration, worker setup | ~100 | Clean |
+| `_graph_registry.py` | Graph storage, lookup | ~130 | Clean |
+| `_runner.py` | Main orchestration logic | ~1200 | Complex but necessary |
+| `_activities.py` | Node execution activities | ~430 | Well-structured |
+| `_models.py` | Data transfer objects | ~320 | Good dataclass usage |
+| `_exceptions.py` | Error classification | ~170 | Comprehensive |
+| `_store.py` | Activity-local store | ~100 | Simple, effective |
+| `__init__.py` | Public API | ~190 | Well-documented |
+
+### Key Design Decisions
+
+1. **Activities as Node Executors**: Each graph node runs as a Temporal activity, providing durability and retry semantics. This is the correct architectural choice.
+
+2. **AsyncPregelLoop Integration**: The runner uses LangGraph's internal `AsyncPregelLoop` for graph traversal, ensuring compatibility with native LangGraph behavior.
+
+3. **Plugin-based Registration**: Graphs are registered via `LangGraphPlugin` and stored in a global registry, allowing compile-time lookup within workflows.
+
+4. **Store Snapshot Pattern**: Store data is snapshotted before each activity and writes are tracked/merged back - enables cross-node persistence without shared state.
+
+---
+
+## Strengths
+
+### 1. Clean Separation of Concerns
+- `_plugin.py` handles Temporal integration (activities, data converter, sandbox)
+- `_runner.py` handles workflow-side orchestration
+- `_activities.py` handles activity-side execution
+- `_models.py` defines serializable DTOs
+
+### 2. Comprehensive Error Classification (`_exceptions.py:13-97`)
+```python
+def is_non_retryable_error(exc: BaseException) -> bool:
+```
+The error classifier correctly identifies:
+- Non-retryable: `TypeError`, `ValueError`, `AuthenticationError`, 4xx HTTP errors
+- Retryable: Rate limits (429), network errors, 5xx server errors
+
+This ensures proper retry behavior for different failure modes.
+
+### 3. Rich Activity Summaries (`_runner.py:~64-185`)
+Activity summaries extract meaningful context:
+- Tool calls from messages
+- Model names from chat models
+- Last human query for context
+- Node descriptions from metadata
+
+This significantly improves workflow observability in the Temporal UI.
+
+### 4. Robust Interrupt Handling
+The interrupt/resume flow is well-implemented:
+- `_pending_interrupt` tracks interrupt state
+- `_interrupted_node_name` enables targeted resume
+- `_completed_nodes_in_cycle` prevents re-execution after resume
+- Resume values flow through `PregelScratchpad`
+
+### 5. Parallel Send Execution (`_runner.py:866-999`)
+Send packets now execute in parallel using `asyncio.gather`, with proper phase separation:
+1. Prepare all activity inputs (deterministic step counter assignment)
+2. Execute all activities in parallel
+3. Process results sequentially (handle interrupts, parent commands)
+
+### 6. Comprehensive Feature Support
+The integration supports:
+- Interrupts/resume via `interrupt()` and `Command(resume=...)`
+- Store API via `ActivityLocalStore`
+- Send API for dynamic parallelism
+- Command API for navigation
+- Subgraphs with automatic flattening
+- Continue-as-new via `get_state()`/checkpoint
+
+---
+
+## Areas for Improvement
+
+### 1. ~~Long Methods in `_runner.py`~~ ✅ COMPLETED
+
+**Issue:** `ainvoke()` is ~215 lines, `_execute_subgraph()` is ~175 lines.
+
+**Resolution:** Refactored into smaller methods:
+- `_prepare_resume_input()` - Handle resume/Command input
+- `_create_pregel_loop()` - Create and configure the Pregel loop
+- `_execute_loop()` - Main execution loop with tick processing
+- `_process_tick_tasks()` - Process tasks from a single tick
+- `_execute_regular_tasks()` - Execute regular node tasks
+- `_execute_send_packets()` - Execute Send packet tasks in parallel
+- `_finalize_output()` - Prepare final output with interrupt/checkpoint handling
+
+### 2. ~~Many Instance Variables in `TemporalLangGraphRunner`~~ ✅ COMPLETED
+
+**Issue:** The class has ~20 instance variables tracking various state.
+
+**Resolution:** Grouped into two dataclasses in `_runner.py`:
+```python
+@dataclass
+class InterruptState:
+    interrupted_state: dict[str, Any] | None = None
+    interrupted_node_name: str | None = None
+    resume_value: Any | None = None
+    resume_used: bool = False
+    is_resume_invocation: bool = False
+    pending_interrupt: InterruptValue | None = None
+
+@dataclass
+class ExecutionState:
+    step_counter: int = 0
+    invocation_counter: int = 0
+    completed_nodes_in_cycle: set[str] = field(default_factory=set)
+    resumed_node_writes: dict[str, list[tuple[str, Any]]] = field(default_factory=dict)
+    last_output: dict[str, Any] | None = None
+    pending_parent_command: Any | None = None
+    store_state: dict[tuple[tuple[str, ...], str], dict[str, Any]] = field(default_factory=dict)
+```
+
+Now accessed via `self._interrupt.*` and `self._execution.*`.
+
+### 3. ~~Magic Strings Could Be Constants~~ ✅ COMPLETED
+
+**Issue:** String literals like `"__start__"`, `"tools"`, `"__interrupt__"`, `"__checkpoint__"` appear throughout.
+
+**Resolution:** Created `_constants.py` with:
+```python
+START_NODE = "__start__"
+TOOLS_NODE = "tools"
+INTERRUPT_KEY = "__interrupt__"
+CHECKPOINT_KEY = "__checkpoint__"
+BRANCH_PREFIX = "branch:"
+MODEL_NODE_NAMES = frozenset({"agent", "model", "llm", "chatbot", "chat_model"})
+MODEL_NAME_ATTRS = ("model_name", "model")
+```
+
+### 4. ~~Nested Functions in `_execute_node_impl`~~ ✅ COMPLETED
+
+**Issue:** `_execute_node_impl` contains 5 nested functions.
+
+**Resolution:** Extracted to module level in `_activities.py`:
+- `_convert_messages_if_needed()` - Module-level pure function
+- `_merge_channel_value()` - Module-level pure function
+- `StateReader` class - Encapsulates state reading logic
+- `_get_null_resume()` - Module-level function
+
+Only `_interrupt_counter()` remains nested (requires mutable state capture).
+
+### 5. Type Annotations Could Be More Specific
+
+**Issue:** Some `Any` types could be narrowed:
+```python
+per_node_activity_options: dict[str, dict[str, Any]]  # inner dict structure is known
+checkpoint: dict | None  # could be StateSnapshot | dict | None
+```
+
+**Recommendation:** Use more specific types or TypedDict where the structure is known.
+
+---
+
+## Test Coverage Assessment
+
+### Current Tests
+
+| Test File | Tests | Coverage |
+|-----------|-------|----------|
+| `test_e2e.py` | 14 | Basic execution, interrupts, store, advanced features, agents |
+| `test_runner.py` | 39 | Activity summary, model extraction, compile, error retryability, parallel sends |
+| `test_activities.py` | ~10 | Node execution, interrupts, parent commands |
+| `test_models.py` | ~15 | Data model serialization |
+| `test_store.py` | ~10 | Store operations |
+| `test_plugin.py` | ~5 | Plugin registration |
+| `test_registry.py` | ~5 | Graph registry |
+
+### Coverage Gaps
+
+1. **Edge Cases:**
+   - Workflow cancellation during activity execution
+   - Very large state serialization
+   - Deep subgraph nesting (>3 levels)
+
+2. **Error Scenarios:**
+   - Activity timeout during interrupt
+   - Store write conflicts
+   - Graph definition changes between invocations
+
+3. **Performance:**
+   - No load tests for high-parallelism Send patterns
+   - No benchmarks for large state checkpointing
+
+---
+
+## Security Considerations
+
+### Positive
+
+1. **Sandbox passthrough is limited:** Only `pydantic_core`, `langchain_core`, `annotated_types` are passed through.
+
+2. **Config filtering:** Internal LangGraph keys (`__pregel_*`, `__lg_*`) are stripped before serialization.
+
+3. **No arbitrary code execution:** Node functions are registered at plugin init, not deserialized.
+
+### Recommendations
+
+1. **Input validation:** Consider validating `graph_id` format in `compile()` to prevent injection attacks via workflow inputs.
+
+2. **State size limits:** Consider adding configurable limits on serialized state size to prevent memory issues.
+
+---
+
+## Documentation Quality
+
+### Strengths
+
+- Comprehensive README with examples
+- Good docstrings on public API (`__init__.py`)
+- MISSING_FEATURES.md provides clear status tracking
+- Experimental warnings are clearly noted
+
+### Gaps
+
+- Internal architecture documentation could be added (class diagrams, sequence diagrams)
+- Contributing guidelines not present
+- Changelog/versioning not formalized
+
+---
+
+## Recommendations Summary
+
+### High Priority
+
+1. ~~**Refactor `ainvoke` and `_execute_subgraph`** into smaller, testable methods~~ ✅ DONE
+2. ~~**Group instance variables** into state dataclasses for better organization~~ ✅ DONE
+
+### Medium Priority
+
+3. ~~**Extract magic strings** to a constants module~~ ✅ DONE
+4. **Add integration tests** for cancellation and timeout scenarios
+5. **Add more specific type annotations**
+
+### Low Priority
+
+6. ~~**Extract nested functions** from `_execute_node_impl`~~ ✅ DONE
+7. **Add architecture documentation** with diagrams
+8. **Add load/performance tests** for Send API patterns
+
+---
+
+## Conclusion
+
+The LangGraph plugin is a solid implementation that correctly integrates LangGraph's graph execution model with Temporal's durable execution. The code is functional, well-tested for core scenarios, and provides good observability.
+
+**Update (2025-12-29):** The major code organization improvements have been completed:
+- ✅ Long methods refactored into smaller, testable functions
+- ✅ Instance variables grouped into `InterruptState` and `ExecutionState` dataclasses
+- ✅ Magic strings extracted to `_constants.py` module
+- ✅ Nested functions extracted from `_execute_node_impl`
+
+Remaining items are lower priority (integration tests, type annotations, documentation).
+
+**Verdict:** Ready for experimental use with improved maintainability.
@@ -60,6 +60,9 @@ dev = [
   "openai-agents[litellm]>=0.3,<0.7; python_version < '3.14'",
   "googleapis-common-protos==1.70.0",
   "pytest-rerunfailures>=16.1",
+  # LangGraph integration tests
+  "langchain>=1.2.0,<2",
+  "langgraph>=1.0.0,<2",
 ]
 
 [tool.poe.tasks]
@@ -136,6 +139,8 @@ exclude = [
   # Ignore generated code
   'temporalio/api',
   'temporalio/bridge/proto',
+  # Ignore separate repos/worktrees
+  'crew-ai',
 ]
 
 [tool.pydocstyle]