300 lines
6.1 KiB
Markdown
300 lines
6.1 KiB
Markdown
# LangGraph Execution Model
|
|
|
|
**Version:** 1.0.0
|
|
**Last Updated:** 2026-02-23
|
|
|
|
---
|
|
|
|
## Overview
|
|
|
|
This document details how LangGraph executes graphs, based on direct source code analysis of `pregel/_loop.py` and related files.
|
|
|
|
---
|
|
|
|
## Pregel Superstep Model
|
|
|
|
### The Core Loop
|
|
|
|
LangGraph is inspired by Google's **Pregel** — a system for large-scale graph processing:
|
|
|
|
```
|
|
Pregel = "Think like a vertex"
|
|
- Each node computes independently
|
|
- Nodes communicate via messages
|
|
- Synchronous supersteps with barrier
|
|
- Fault tolerance via checkpointing
|
|
```
|
|
|
|
---
|
|
|
|
## Execution Flow
|
|
|
|
### High-Level
|
|
|
|
```
|
|
graph.invoke(input, config)
|
|
│
|
|
▼
|
|
[Compile graph if needed]
|
|
│
|
|
▼
|
|
[Load checkpoint if resuming]
|
|
│
|
|
▼
|
|
┌─────────────────────────────────┐
|
|
│ FOR each superstep: │
|
|
│ 1. prepare_next_tasks() │
|
|
│ 2. execute_tasks() │
|
|
│ 3. apply_writes() │
|
|
│ 4. checkpoint() (if enabled) │
|
|
└─────────────────────────────────┘
|
|
│
|
|
▼
|
|
[Return final state]
|
|
```
|
|
|
|
---
|
|
|
|
## Detailed Superstep
|
|
|
|
### Step 1: Prepare Next Tasks
|
|
|
|
From `_algo.py`:
|
|
|
|
```python
|
|
def prepare_next_tasks(
|
|
checkpoint: Checkpoint,
|
|
nodes: dict[str, PregelNode],
|
|
channels: dict[str, BaseChannel],
|
|
pending_writes: list[tuple],
|
|
etc.
|
|
) -> list[PregelExecutableTask]:
|
|
"""Determine which nodes to run in this superstep."""
|
|
|
|
# For each node:
|
|
# 1. Check if triggered (input channels have values)
|
|
# 2. Check if should run (not already running)
|
|
# 3. Create executable task
|
|
```
|
|
|
|
### Step 2: Execute Tasks
|
|
|
|
From `_runner.py`:
|
|
|
|
```python
|
|
async def execute_tasks(tasks: list[PregelExecutableTask]):
|
|
"""Execute tasks in parallel."""
|
|
|
|
# Submit all tasks to executor
|
|
# Each task:
|
|
# 1. Read input from channels
|
|
# 2. Execute node function
|
|
# 3. Return writes (channel updates)
|
|
```
|
|
|
|
### Step 3: Apply Writes
|
|
|
|
From `_algo.py`:
|
|
|
|
```python
|
|
def apply_writes(
|
|
checkpoint: Checkpoint,
|
|
pending_writes: list[tuple],
|
|
channels: dict[str, BaseChannel]
|
|
):
|
|
"""Apply writes to channels using reducers."""
|
|
|
|
# For each write:
|
|
# 1. Identify target channel
|
|
# 2. Apply reducer to merge with existing value
|
|
```
|
|
|
|
### Step 4: Checkpoint
|
|
|
|
From `_checkpoint.py`:
|
|
|
|
```python
|
|
def create_checkpoint(
|
|
channels: dict[str, BaseChannel],
|
|
versions: dict[str, int],
|
|
metadata: CheckpointMetadata
|
|
) -> Checkpoint:
|
|
"""Snapshot all channel values."""
|
|
|
|
return {
|
|
"channel_values": {
|
|
k: v.checkpoint() for k, v in channels.items()
|
|
},
|
|
"channel_versions": versions,
|
|
"metadata": metadata,
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## PregelTask Structure
|
|
|
|
### From `types.py`
|
|
|
|
```python
|
|
class PregelExecutableTask(NamedTuple):
|
|
"""A single executable task in the graph."""
|
|
|
|
name: str # Node name
|
|
path: str # Task path
|
|
input: Any # Input to node
|
|
proc: Callable # Node function
|
|
writes: list[Send] # Dynamic sends
|
|
triggers: list[str] # Channels that trigger this
|
|
interrupt_after: bool # Interrupt after execution
|
|
interrupt_before: bool # Interrupt before execution
|
|
```
|
|
|
|
---
|
|
|
|
## Send (Dynamic Edges)
|
|
|
|
### What is Send?
|
|
|
|
`Send` enables dynamic node spawning — a node can spawn multiple tasks:
|
|
|
|
```python
|
|
from langgraph.types import Send
|
|
|
|
def splitter(state):
|
|
messages = state["messages"]
|
|
return [
|
|
Send("process_email", {"email": email})
|
|
for email in messages
|
|
]
|
|
```
|
|
|
|
### Send Implementation
|
|
|
|
```python
|
|
class Send(NamedTuple):
|
|
"""Dynamic edge - spawn a task for another node."""
|
|
|
|
node: str # Target node name
|
|
arg: Any # Input to target node
|
|
```
|
|
|
|
---
|
|
|
|
## Interrupt (Human-in-the-Loop)
|
|
|
|
### How Interrupts Work
|
|
|
|
From `types.py`:
|
|
|
|
```python
|
|
class Interrupt(NamedTuple):
|
|
"""Pause execution for human input."""
|
|
|
|
value: Any # Data to show human
|
|
when: str # "during" or "after"
|
|
```
|
|
|
|
### Interrupt Flow
|
|
|
|
```
|
|
1. Node calls interrupt(data)
|
|
│
|
|
▼
|
|
2. PregelLoop pauses
|
|
│
|
|
▼
|
|
3. Returns to caller with interrupt value
|
|
│
|
|
▼
|
|
4. Caller (human) provides input
|
|
│
|
|
▼
|
|
5. Resume with Command(resume=data)
|
|
```
|
|
|
|
### Resume with Command
|
|
|
|
```python
|
|
from langgraph.types import Command
|
|
|
|
# Resume with new data
|
|
graph.invoke(
|
|
None, # No new input
|
|
config=RunnableConfig(
|
|
configurable={"checkpoint_id": "abc123"},
|
|
resume={"feedback": "looks good"}
|
|
)
|
|
)
|
|
```
|
|
|
|
---
|
|
|
|
## Stream Mode
|
|
|
|
### From `types.py`
|
|
|
|
```python
|
|
StreamMode = Literal[
|
|
"values", # Full state after each step
|
|
"updates", # Node-specific updates
|
|
"checkpoints", # Checkpoint snapshots
|
|
"tasks", # Task start/complete
|
|
"debug", # Debug info
|
|
"messages", # Message streams
|
|
"custom", # Custom streams
|
|
]
|
|
```
|
|
|
|
---
|
|
|
|
## Retry Policy
|
|
|
|
### From `types.py`
|
|
|
|
```python
|
|
class RetryPolicy:
|
|
"""Configuration for node retry behavior."""
|
|
|
|
max_attempts: int = 3
|
|
initial_interval: float = 1.0
|
|
backoff_factor: float = 2.0
|
|
max_interval: float = 100.0
|
|
```
|
|
|
|
---
|
|
|
|
## Error Handling
|
|
|
|
### From `errors.py`
|
|
|
|
```python
|
|
class GraphInterrupt(GraphRuntimeException):
|
|
"""Raised when graph is interrupted."""
|
|
pass
|
|
|
|
class InvalidUpdateError(GraphRuntimeException):
|
|
"""Raised when channel update is invalid."""
|
|
pass
|
|
|
|
class EmptyInputError(GraphRuntimeException):
|
|
"""Raised when graph input is empty."""
|
|
pass
|
|
```
|
|
|
|
---
|
|
|
|
## Key Insight: Synchronous Barrier
|
|
|
|
Unlike purely event-driven systems (like OpenClaw), LangGraph uses **synchronous supersteps**:
|
|
|
|
1. All triggered nodes in a superstep run **in parallel**
|
|
2. All writes are **applied together** after all nodes complete
|
|
3. Next superstep starts only after current completes
|
|
|
|
This simplifies reasoning about state but limits flexibility compared to event-driven models.
|
|
|
|
---
|
|
|
|
*Generated from source code analysis*
|